I decided this semester to take an algorithms course, in an attempt to finally learn some discrete math. As of late, we have been talking about graph algorithms. The problem is that, having never thought much about graphs, I don’t have a great deal of intuition for them. So I want to devote a post to try to help myself understand this material better.
The problem we are currently concerned with is the following. Let be an undirected graph, such that each edge
has a “weight” or “cost”
associated with it. The problem is to find a minimal spanning tree. That is, a subtree
containing all the vertices and such that
is minimal. The naive approach of listing every possible tree and calculating the weights for each of them is horribly inefficient here, as the number of trees on a complete graph of size
grows super-exponentially.
1. Trees
To start with, I will recall the definition of a tree. All graphs will be undirected.
Definition 1 A tree is a connected graph containing no cycles.
Intuitively, one has the standard picture of a node branching out into several nodes, each of which branch into several other nodes, and so on. In fact, with the above definition one can show in fact the precise analog of the preceding statement: the “geometric realization” of a tree (when considered as a simplicial complex) is contractible. The above definition is precise, and furthermore is equivalent to the following. An easy consequence of the definition, that we will use, is that a connected subgraph of a tree is tree.
Given a graph , we are interested in the following problem: Is there a subgraph
which is a tree and which contains every vertex of
? Such a subgraph is called a spanning tree. Physically, there are reasonable instances where one might wish to find one, such as in building a pipeline system. The claim is that a spanning tree always exists. To see this, we shall use the following characterization of trees:
Proposition 2 A graph is a tree if and only if it is connected and removing any edge would make it disconnected.
So trees are precisely the minimal connected graphs.
Proof: Let be a tree, and let
be an edge in
from
to
. The claim is that the graph
, which is just the pair
, is disconnected. For if there were a path
between
and
that did not pass through
, then we could consider
and the reversal of
to get a cycle. This proves one direction.
For the other, suppose is a minimal connected graph. Then if
had a cycle
, where
, then we can remove the edge
without destroying any connectedness—indeed, all the
are still connected to each other by the rest of the cycle. From this we can see that spanning trees exist:
Proposition 3 Let
be a connected graph. Then
has a spanning tree.
Proof: Indeed, we can take , and choose a subgraph of
that is minimal with respect to being connected (i.e. such that removing any edge would disconnect it). We have seen that this is a tree, and it is clearly spanning.
Example 1 In fact, we now see an algorithm for determining a spanning tree in a graph. Namely, we construct a sequence of graphs
. The algorithm starts with
. Given
, we try removing each of the edges, one at a time, and test whether the resulting smaller graph is connected. If none of these
graphs (for
an edge) is connected, then we stop at
, which we have seen is a tree. If we get something that is connected, we let
to be
with this edge removed. It is obvious that this algorithm is correct from the above discussions. But it is not necessarily efficient. To determine whether a graph is connected, we could for instance perform a breadth-first search, whose time is linear in the size of the graph. This graph has at most
steps, each of which may involve
breadth-first searches, each of which takes
time. So the algorithm runs in cubic time.
We have the following additional simple characterization of trees.
Proposition 4 A connected graph
is a tree if and only if
Proof: The idea is that adding an edge to a graph can connect two connected components. So adding an edge can cut down the number of components by at most one. Thus if has one less edge than vertex, if we remove one of its edges, then it will be disconnected as there will be too few edges to make the graph connected. So if we have
, then
is a minimal connected graph.
Conversely, suppose is a tree. Then if we remove an edge from
, we find that the new graph
is disconnected—so it is a disjoint union of trees. It must be the union of two trees or the edge
could not join them into one connected graph
. If we induct on the size of the tree, we can compute the vertex and edge counts of
by those of the two subcomponents. From this the result becomes clear.
2. Minimum spanning trees
There is a more serious problem. In practice, the graphs for which we want a spanning tree may be weighted, with each edge assigned to a cost, and we will want a spanning tree that minimizes the cost.
Definition 5 A weight on a graph
is a function
. A spanning tree
is said to be minimal if the sum
is minimized, over spanning trees.
If we take the identity weight on our graph, then any spanning tree is a minimum spanning tree. Indeed, this is immediate because any two spanning trees have the same cardinality (namely,
).
We would now like an algorithm to obtain a minimum spanning tree. The algorithms we shall use will be greedy; that is, they shall add edges starting with the smallest weight (possibly subject to some condition). To approach the idea of building a minimum spanning tree from nothing, we shall need some induction. Namely, we will start with a subgraph that is known to be contained in a minimal spanning tree, and we will want to grow that to a bigger subgraph that is still contained in a minimal spanning tree. We will keep doing this until all the vertices of have been incorporated, at which point we will necessarily have a minimal spanning tree.
To make things convenient, we fix some notation. Let be a graph.
Definition 6 A subgraph
is said to cut
into sets
if
(disjoint union) and no edge of
is between
and
.
In other words, if we add all the vertices of to
to get a new graph
, then
and
are just unions of disjoint connected components of
.
Proposition 7 (The cut property) Let
be a graph, and let
be a subgraph of
which is contained in a minimal spanning tree. Suppose
cuts
into a disjoint union
. Then if
is an edge from
to
(in
) of minimal weight, then
is contained in a minimal spanning tree.
Of course, there is a guarantee that .
Proof: Let us suppose that is contained in the minimum spanning tree
. Then
contains all the vertices of
, and it is connected, so there is an edge
from
to
. This edge does not belong to
by hypothesis.
The claim is that is still a minimum spanning tree, which clearly contains
. To see this, we note that
necessarily has a cycle. This cycle has an edge
going from
to
, so it must contain another edge
going from
to
.
Now is connected (since removing on edge in a cycle does not change connectedness), and it is thus a tree because
The claim is that it is a minimal spanning tree. But the weight of can be at most the weight of
, since
has at least the same weight as
. It follows that
is a minimum spanning tree as well.
3. Kruskal’s algorithm
Based on the above “cut property,” we can define an efficient way of finding minimum spanning trees. As described, we will grow them, step by step. Let us now describe an algorithm due to Kruskal.
We will start the tree with a graph that is not a tree—namely, the empty graph on the vertices of
. Successively we will add edges to this graph to join connected components, making a sequence of graphs
. To do this, we will need an auxiliary data structure
.
In detail, here is the algorithm. We start by placing each of the vertices of into its own set. So we have a collection
of sets. At each stage
, we will have a collection
of sets that contains the connected components of the graph
; since
is the empty graph, it is clear that
is the set of connected components of
.
We now construct the inductively. We start by generating a sorted list of the edges
, sorted by weight in ascending order. We list the edges
. We now add edge
to the graph
, making a new graph
.
We then consider the components in that contain the vertices of
and join them to make the new collection
of sets; clearly
is the collection of connected components of
. To decide whether to add
to the graph
, we look through the collection
and determine whether
joins two connected components of
. If it does, we add
; if not, we don’t.
Inductively, the construction is as follows. At stage , let us suppose inductively that
has been constructed, and that the vertices of
are partitioned into a collection of sets
that represents the connected components of
. Then we split into two steps:
- If
connects two connected components of
(as we can check by a look at
), then we let
, and we merge the two connected components in
corresponding to the endpoints of
.
- If the endpoints of
lie in the same connected component of the partial graph
then we just take
.
The algorithm stops when edges have been added.
Theorem 8 Given a connected weighted graph
, Kruskal’s algorithm always produces a minimal spanning tree.
Proof: This will be a straightforward argument from the cut property. Let us inductively suppose that is contained in a minimal spanning tree; then we shall show that
is too.
As above, there were two cases to the above algorithm: in the second, we just had , and there was nothing to prove. We need to show that
is contained in a minimal spanning tree in the first case. To see this, let us apply the cut property to
. By assumption, there are two connected components of
that the edge
joins; thus we can partition
such that no edge of
connects
and such that
does connect
.
We now need to show that, among the edges connecting ,
is the one with minimal weight. But if
did not have minimal weight among edges connecting
, there would have been an edge
that also connected vertices in
(which cannot be in
). However,
would then have been added to the graph at stage
for joining two connected components of
, which is a contradiction.
February 12, 2011 at 9:56 am
Since you mentioned the topological point of view, the topological point of spanning trees is that one can always contract a graph along a spanning tree. The result is a wedge of circles, which shows rather directly that the fundamental group of a graph is free. The existence of spanning trees is also a quick way to deduce Euler’s formula: see http://www.ics.uci.edu/~eppstein/junkyard/euler/interdig.html .
February 12, 2011 at 10:47 am
Interesting; that’s a nice argument.