I decided this semester to take an algorithms course, in an attempt to finally learn some discrete math. As of late, we have been talking about graph algorithms. The problem is that, having never thought much about graphs, I don’t have a great deal of intuition for them. So I want to devote a post to try to help myself understand this material better.
The problem we are currently concerned with is the following. Let be an undirected graph, such that each edge has a “weight” or “cost” associated with it. The problem is to find a minimal spanning tree. That is, a subtree containing all the vertices and such that is minimal. The naive approach of listing every possible tree and calculating the weights for each of them is horribly inefficient here, as the number of trees on a complete graph of size grows super-exponentially.
To start with, I will recall the definition of a tree. All graphs will be undirected.
Definition 1 A tree is a connected graph containing no cycles.
Intuitively, one has the standard picture of a node branching out into several nodes, each of which branch into several other nodes, and so on. In fact, with the above definition one can show in fact the precise analog of the preceding statement: the “geometric realization” of a tree (when considered as a simplicial complex) is contractible. The above definition is precise, and furthermore is equivalent to the following. An easy consequence of the definition, that we will use, is that a connected subgraph of a tree is tree.
Given a graph , we are interested in the following problem: Is there a subgraph which is a tree and which contains every vertex of ? Such a subgraph is called a spanning tree. Physically, there are reasonable instances where one might wish to find one, such as in building a pipeline system. The claim is that a spanning tree always exists. To see this, we shall use the following characterization of trees:
Proposition 2 A graph is a tree if and only if it is connected and removing any edge would make it disconnected.
So trees are precisely the minimal connected graphs.
Proof: Let be a tree, and let be an edge in from to . The claim is that the graph , which is just the pair , is disconnected. For if there were a path between and that did not pass through , then we could consider and the reversal of to get a cycle. This proves one direction.
For the other, suppose is a minimal connected graph. Then if had a cycle , where , then we can remove the edge without destroying any connectedness—indeed, all the are still connected to each other by the rest of the cycle. From this we can see that spanning trees exist:
Proposition 3 Let be a connected graph. Then has a spanning tree.
Proof: Indeed, we can take , and choose a subgraph of that is minimal with respect to being connected (i.e. such that removing any edge would disconnect it). We have seen that this is a tree, and it is clearly spanning.
Example 1 In fact, we now see an algorithm for determining a spanning tree in a graph. Namely, we construct a sequence of graphs . The algorithm starts with . Given , we try removing each of the edges, one at a time, and test whether the resulting smaller graph is connected. If none of these graphs (for an edge) is connected, then we stop at , which we have seen is a tree. If we get something that is connected, we let to be with this edge removed. It is obvious that this algorithm is correct from the above discussions. But it is not necessarily efficient. To determine whether a graph is connected, we could for instance perform a breadth-first search, whose time is linear in the size of the graph. This graph has at most steps, each of which may involve breadth-first searches, each of which takes time. So the algorithm runs in cubic time.
We have the following additional simple characterization of trees.
Proposition 4 A connected graph is a tree if and only if
Proof: The idea is that adding an edge to a graph can connect two connected components. So adding an edge can cut down the number of components by at most one. Thus if has one less edge than vertex, if we remove one of its edges, then it will be disconnected as there will be too few edges to make the graph connected. So if we have , then is a minimal connected graph.
Conversely, suppose is a tree. Then if we remove an edge from , we find that the new graph is disconnected—so it is a disjoint union of trees. It must be the union of two trees or the edge could not join them into one connected graph . If we induct on the size of the tree, we can compute the vertex and edge counts of by those of the two subcomponents. From this the result becomes clear.
2. Minimum spanning trees
There is a more serious problem. In practice, the graphs for which we want a spanning tree may be weighted, with each edge assigned to a cost, and we will want a spanning tree that minimizes the cost.
Definition 5 A weight on a graph is a function . A spanning tree is said to be minimal if the sum
is minimized, over spanning trees.
If we take the identity weight on our graph, then any spanning tree is a minimum spanning tree. Indeed, this is immediate because any two spanning trees have the same cardinality (namely, ).
We would now like an algorithm to obtain a minimum spanning tree. The algorithms we shall use will be greedy; that is, they shall add edges starting with the smallest weight (possibly subject to some condition). To approach the idea of building a minimum spanning tree from nothing, we shall need some induction. Namely, we will start with a subgraph that is known to be contained in a minimal spanning tree, and we will want to grow that to a bigger subgraph that is still contained in a minimal spanning tree. We will keep doing this until all the vertices of have been incorporated, at which point we will necessarily have a minimal spanning tree.
To make things convenient, we fix some notation. Let be a graph.
Definition 6 A subgraph is said to cut into sets if (disjoint union) and no edge of is between and .
In other words, if we add all the vertices of to to get a new graph , then and are just unions of disjoint connected components of .
Proposition 7 (The cut property) Let be a graph, and let be a subgraph of which is contained in a minimal spanning tree. Suppose cuts into a disjoint union . Then if is an edge from to (in ) of minimal weight, then is contained in a minimal spanning tree.
Of course, there is a guarantee that .
Proof: Let us suppose that is contained in the minimum spanning tree . Then contains all the vertices of , and it is connected, so there is an edge from to . This edge does not belong to by hypothesis.
The claim is that is still a minimum spanning tree, which clearly contains . To see this, we note that necessarily has a cycle. This cycle has an edge going from to , so it must contain another edge going from to .
Now is connected (since removing on edge in a cycle does not change connectedness), and it is thus a tree because
The claim is that it is a minimal spanning tree. But the weight of can be at most the weight of , since has at least the same weight as . It follows that is a minimum spanning tree as well.
3. Kruskal’s algorithm
Based on the above “cut property,” we can define an efficient way of finding minimum spanning trees. As described, we will grow them, step by step. Let us now describe an algorithm due to Kruskal.
We will start the tree with a graph that is not a tree—namely, the empty graph on the vertices of . Successively we will add edges to this graph to join connected components, making a sequence of graphs . To do this, we will need an auxiliary data structure .
In detail, here is the algorithm. We start by placing each of the vertices of into its own set. So we have a collection of sets. At each stage , we will have a collection of sets that contains the connected components of the graph ; since is the empty graph, it is clear that is the set of connected components of .
We now construct the inductively. We start by generating a sorted list of the edges , sorted by weight in ascending order. We list the edges . We now add edge to the graph , making a new graph .
We then consider the components in that contain the vertices of and join them to make the new collection of sets; clearly is the collection of connected components of . To decide whether to add to the graph , we look through the collection and determine whether joins two connected components of . If it does, we add ; if not, we don’t.
Inductively, the construction is as follows. At stage , let us suppose inductively that has been constructed, and that the vertices of are partitioned into a collection of sets that represents the connected components of . Then we split into two steps:
- If connects two connected components of (as we can check by a look at ), then we let , and we merge the two connected components in corresponding to the endpoints of .
- If the endpoints of lie in the same connected component of the partial graph then we just take .
The algorithm stops when edges have been added.
Theorem 8 Given a connected weighted graph , Kruskal’s algorithm always produces a minimal spanning tree.
Proof: This will be a straightforward argument from the cut property. Let us inductively suppose that is contained in a minimal spanning tree; then we shall show that is too.
As above, there were two cases to the above algorithm: in the second, we just had , and there was nothing to prove. We need to show that is contained in a minimal spanning tree in the first case. To see this, let us apply the cut property to . By assumption, there are two connected components of that the edge joins; thus we can partition such that no edge of connects and such that does connect .
We now need to show that, among the edges connecting , is the one with minimal weight. But if did not have minimal weight among edges connecting , there would have been an edge that also connected vertices in (which cannot be in ). However, would then have been added to the graph at stage for joining two connected components of , which is a contradiction.