Greedy Algorithm for Multiway-Cut in Trees
1 The Multiway Cut Problem
The Multiway Cut (MC) problem is a generalization of the known Min-Cut problem. The input is a triple
where
is an undirected graph with a set
of
vertices and a set
of
edges,
is the set of terminals, and
is a function that assign positive weight to each edge in the graph. A valid cut
is a set of edges such that any two vertices
are disconnected over the sub-graph
. The weight of a given cut
is the sum of wights of its edges, that is,
. A solution for the MC problem is a valid cut of minimum weight.
Let
. Note that when
the problem is reduced to the regular Min-Cut problem. For general graphs, any
makes the problem NP-Hard. However, there are efficient algorithms for Multiway-Cut in trees. A tree is a connected acyclic graph. It is easy to see that if our graph is a tree, any two vertices in it must be connected by exactly one path. Moreover,
. We will use these properties later when we prove that the algorithm gives an optimal solution.
2 Efficient Algorithm for MC in Trees
Let
be the input to the algorithm, and
is a tree. To simplify the analysis later on, we will work slightly differently; instead of trying to find a small-weight set
to discard from the graph, we will try to find a large-weight set
to keep in the graph. Note that obviously
and
. Therefore, finding
with maximum weight is equivalent to finding
with minimum weight.
The idea is to start with a trivial (and probably not optimal) solution
, that is, we don’t keep any edge in the graph such that all the terminals are separated, and we will improve
iteratively.
We start off with some definitions:
A connected component (CC) of a vertex
over a graph
is the set of all vertices that can be reached from
in
. Note that if
then there are at most
CCs in
(each vertex has its own CC, which is the case when the graph has no edges).
A vertex
inside some CC is a master node if it is a terminal vertex. We then say that all the other vertices in that CC are dominated by
.
Now it is time to describe the algorithm:
- Start by sorting the edges of in descending order according to their weights. Let be the ordered sequence such that .
- Initialize a trivial solution . Maintain a set of all the CCs in the resulting graph (each vertex is its own CC).
-
For each
do the following:
- Consider the endpoints of the edge and let be the connected components of respectively. If at least one of them do not has a master node, update and let be the same as but with merged into a new single connected component. Otherwise and .
- Return the set .
3 The Solution is Correct and Optimal
It is easy to see that
is a valid solution; we started with all the terminals separated in their own CC, and in each iteration we never connect two CCs that have different masters (i.e. terminals). Therefore, in the returned set, all the connected components has at most one master, meaning that the terminals are separated.
The fact that
has maximum weight follows by the two following claims:
- Any optimal solution has . Moreover, .
- For any iteration of the algorithm, there is an (possibly different) optimal solution such that .
So we conclude that there exists some optimal
such that
, thus
has maximum weight.
We move to state and prove those claims formally.
Let
be the set of all maximum-wight solutions for Multiway-Cut for the instance
and
is a tree. Then any
have
. Moreover,
.
Since we started from a graph that is a tree, the number of CCs in the final sub-graph is exactly
, for any
. Let
be some optimal solution. It left to show that the sub-graph
has exactly
connected components:
Assume that there are less than , then we must have two terminals which live in the same CC - invalidating the solution . If we assume that there are more than CCs, then we must have a connected component without a master, so we can safely connect it to some other CC with some edge that is not in (and such edge exists since is connected). That way, we get a new valid solution with greater weight than the weight of which is a contradiction.
Similarly, we prove that has CCs: If it has less than , there must be two terminals in the same CCs, and that contradicts the correctness of . If it has more than CCs after iterations, there must be a CC without a master, so we may connect it with some edge, let it be , to other CC without invalidating the solution. But that wasn’t picked when the algorithm considered it - meaning that it could cause a violation in the sub-graph of the th iteration, and since that sub-graph is contained in the sub-graph of the last iteration then we can’t add to our solution - a contradiction.
Assume that there are less than , then we must have two terminals which live in the same CC - invalidating the solution . If we assume that there are more than CCs, then we must have a connected component without a master, so we can safely connect it to some other CC with some edge that is not in (and such edge exists since is connected). That way, we get a new valid solution with greater weight than the weight of which is a contradiction.
Similarly, we prove that has CCs: If it has less than , there must be two terminals in the same CCs, and that contradicts the correctness of . If it has more than CCs after iterations, there must be a CC without a master, so we may connect it with some edge, let it be , to other CC without invalidating the solution. But that wasn’t picked when the algorithm considered it - meaning that it could cause a violation in the sub-graph of the th iteration, and since that sub-graph is contained in the sub-graph of the last iteration then we can’t add to our solution - a contradiction.
Note that since any optimal solution have exactly
CCs, we have that each CC must contains one terminal. So in the final sub-graph defined by the solution, any
is dominated by exactly one master mode
.
Let
be the set of all optimal solutions for the instance
and
in a tree. For any
let
be the set of optimal solutions such that
contained in them. Then, for any
,
.
For
we have
thus
which is not empty. Assume toward contradiction that the claim is false, and let
be the smallest such that
, so we know that
is not empty, and hence
. In other words, during the
th iteration, the algorithm added the edge
to
, causing
to be empty.
Fix some . Denote and the subgraphs of . From the update rule of the algorithm, at least one of is not dominated by a master over , w.l.o.g assume it is . Let be the master that dominates over , and consider the path connecting , note that any edge along that path belongs to . Since the path does not exists in and in particular in , there is at least one edge that is part of the path and not in , let be such one with minimum weight.
Observe that it is impossible that because it means that the algorithm already rejected on the th iteration and since it was a violation for and then it is also a violation for therefore does not belongs to any optimal solution in , in contradiction that .
So suppose . Note that in the sub-graph induced by , must be dominated my some terminal . Because is a sub-graph of a tree, if then we can close a cycle by adding to , thus . Consider the sub-graph . The edge connects now the terminals and , and that is the only violation in . Since is a sub-graph of a tree there is only one path connecting and and it must contains the path , so removing the edge will break the connection between leading to a valid solution . By the assumption that the weight of can only increase (in fact, the optimality of forces ) and we get an optimal solution that contains , therefore , contradiction.
Fix some . Denote and the subgraphs of . From the update rule of the algorithm, at least one of is not dominated by a master over , w.l.o.g assume it is . Let be the master that dominates over , and consider the path connecting , note that any edge along that path belongs to . Since the path does not exists in and in particular in , there is at least one edge that is part of the path and not in , let be such one with minimum weight.
Observe that it is impossible that because it means that the algorithm already rejected on the th iteration and since it was a violation for and then it is also a violation for therefore does not belongs to any optimal solution in , in contradiction that .
So suppose . Note that in the sub-graph induced by , must be dominated my some terminal . Because is a sub-graph of a tree, if then we can close a cycle by adding to , thus . Consider the sub-graph . The edge connects now the terminals and , and that is the only violation in . Since is a sub-graph of a tree there is only one path connecting and and it must contains the path , so removing the edge will break the connection between leading to a valid solution . By the assumption that the weight of can only increase (in fact, the optimality of forces ) and we get an optimal solution that contains , therefore , contradiction.
4 Running Time and Similar Algorithms
Note that
, so step 1 of the algorithm takes
. Step 2 is order of
, times the cost of the operations inside the loop. There are data structures such as Union-Find that allow us to track the CCs of the current sub-graph in amortize small running time that is smaller than
. Therefore, the total running time is bounded by
.
Note that the algorithm presented in section 2, is similar to Kruskal’s algorithm for finding the minimum spanning tree. In both cases, the greedy approach leads to an optimal solution.