Suppose you run a cable company and you are planning where to place cables so that you reach every house, but you'd like to minimize the amount of cable that you use.
Definition. A spanning tree is a subset of edges of a graph that form a tree and that connect all the vertices in the graph.
Definition. Given an edge-weighted graph $G$, the minimum spanning tree problem is to find a spanning tree of $G$ whose total weight is less or equal to any other spanning tree. We write $T_1 \le T_2$ when the total weight of the edges in tree $T_1$ is less-than or equal to the total weight of the edges in tree $T_2$.
The following are two different minimum spanning trees for the same graph.
There are two popular algorithms for MST, I'll cover the first. The second is in the textbook.
The two algorithms have a lot in common.
But how do we find a safe edge?
We'll need some definitions to lead up to the answer.
Definitions. Suppose $G = (V,E)$ is a graph with weighted edges.
Theorem 23.1 (Light Edges are Safe Edges) Let $A$ be a subset of some MST of $G=(V,E)$. Let $(S,V-S)$ be a cut of $G$ that respects $A$. Let $(u,v)$ be a light edge wrt. the cut. Then $(u,v)$ is a safe edge.
Proof. Let $T$ be a MST of $G$ that includes $A$.
Case $(u,v)$ in $T$: Then trivially, $A \cup \{(u,v)\} \subseteq T$.
Case $(u,v)$ not in $T$:
We're going to construct another MST $T'$ such that $A \cup \{(u,v)\} \subseteq T'$.
Because $T$ is spanning, $u$ and $v$ both are in $T$, so there is already a path from $u$ to $v$ in $T$. Thus, $(u,v)$ completes a cycle. A cycle with a crossing edge must have another crossing edge as shown in the following diagram. Let that other crossing edge be $(x,y)$ .
u - - - x
| |
-----------------
| |
v - - - y
We form the new MST $T'$ by removing $(x,y)$ and adding $(u,v)$:
$T' = (T - \{(x,y)\}) \cup \{(u,v)\}$
Now we need to show that $T'$ is an MST. We know that $T' \le T$ because $(u,v)$ is a light edge, so its weight is less-or-equal to that of $(x,y)$. So $T'$ is an MST.
It remains to show that $(u,v)$ is a safe edge, that is,
$A \cup \{(u,v)\} \subseteq (T - \{(x,y)\}) \cup \{(u,v)\}$
We had $A \subseteq T$, so we need to prove that $(x,y)$ not in $A$, but we have that because the cut respects $A$ and edge $(x,y)$ crosses the cut.
QED
We can think of MST algorithms as maintaining a forest of trees, where all the tree edges are in the set $A$.
Initially, each vertex is in it's own tree because $A=\{\}$.
At each step, we merge two trees into a single tree by identifying the lightest edge connecting them (such an edge is safe).
Main idea: process all the edges from lightest to heaviest.
Demo of Kruskal's algorithm on the example graph:
Edges sorted by weight:
A-1-B, B-2-E, C-2-F, A-3-E, D-4-E, A-4-D, D-4-B, C-4-E, E-7-F
Process A-1-B, union {A} and {B}:
A--B C
D E F
edges: B-2-E, C-2-F, A-3-E, D-4-E, A-4-D, D-4-B, C-4-E, E-7-F
Process B-2-E, union {A,B} and {E}
A--B C
|
D E F
edges: C-2-F, A-3-E, D-4-E, A-4-D, D-4-B, C-4-E, E-7-F
Process C-2-F, union {C} and {F}
A--B C
| |
D E F
edges: A-3-E, D-4-E, A-4-D, D-4-B, C-4-E, E-7-F
Process A-3-E, do nothing
edges: D-4-E, A-4-D, D-4-B, C-4-E, E-7-F
Process D-4-E, union {A,B,E} and {D}
A--B C
| |
D--E F
edges: A-4-D, D-4-B, C-4-E, E-7-F
Process A-4-D, do nothing
edges: D-4-B, C-4-E, E-7-F
Process D-4-B, do nothing
edges: C-4-E, E-7-F
Process C-4-E, union {A,B,D,E} and {C,F}
sort the edges by increasing weight to make it easy to process lighter edges before heavier edges.
use the DisjointSets
data structure to keep track of each tree as a
separate set. Recall the DisjointSets operations:
from ipynb.fs.full.DisjointSets import *
from ipynb.fs.full.Graph import *
def kruskal_mst(g, weigth):
A = []
ds = DisjointSets(g.num_vertices())
for v in g.vertices():
ds.make_set(v)
E = list(g.edges())
E = sorted(E, key=lambda e: weight[e])
for e in E:
if not (ds.find(e.source) == ds.find(e.target)):
A.append(e)
ds.union(e.source, e.target)
return A
The following shows the result of applying Kruskal's algorithm to the example graph.
id = {'A':0,'B':1,'C':2,'D':3,'E':4,'F':5}
name = ['A','B','C','D','E','F']
named_weight = {('A','B'):1, ('A','D'):4, ('A','E'): 3,
('B','D'):4, ('B','E'):2,
('C','E'):4, ('C','F'):2,
('D','E'):4,
('E','F'):7}
weight = {UEdge(id[e[0]],id[e[1]]):w for (e,w) in named_weight.items()}
edges = weight.keys()
G = UndirectedAdjList(6, edges)
MST = kruskal_mst(G, weight)
print("MST: ", end="")
print([(name[e.source],name[e.target]) for e in MST], sep=", ")
total = 0
for e in MST:
total += weight[e]
print("weight of MST: " + str(total))
MST: [('A', 'B'), ('B', 'E'), ('C', 'F'), ('A', 'D'), ('C', 'E')] weight of MST: 13
The dominating cost is the sorting. So the overall time complexity is $O(m \log m)$ which can be instead stated as $O(m \log n)$ because $m < n²$ and therefore $\log m < 2 \log n$.
Apply Kruskal's to the following graph
Solution:
sorted edges:
A-1-E, G-1-H, B-1-D, D-1-H
F-2-G, A-2-B,
A-3-C,
E-4-F, C-4-H, D-4-G
C-5-D,
B-8-F
An MST of weight 11:
Main idea: grow a single tree, at each step choosing the lightest edge that goes from the tree to a vertex not in the tree.
Why does this work? Consider the cut that separates the growing tree with the rest of the nodes in the graph. By theorem 23.1, the lightest edge across this cut is a safe edge.
Consider again the following example graph:
We randomly choose to start at vertex A.
Of the three out-edges from A, A-1-B
is the lightest.
A--B C
D E F
The edges that leave the current tree:
A-4-D, A-3-E, B-4-D, B-2-E
The lightest is B-2-E
.
A--B C
|
D E F
The edges that leave the current tree:
A-4-D, B-4-D, E-4-C, E-4-D, E-7-F
One of the lightest is A-4-D
.
A--B C
| |
D E F
The edges that leave the current tree:
E-4-C, E-7-F
The lightest is E-4-C
.
A--B C
| | /
D E F
The edges that leave the current tree:
E-7-F, C-2-F
The lightest is C-2-F
, which completes the minimum spanning tree:
Similar to Dijkstra's Shortest Paths, we need to choose the "best" among the neighbors of the current tree, so let's use a PriorityQueue
again.
from ipynb.fs.full.PriorityQueue import PriorityQueue
def create_priority_queue(num_vertices, distance):
def compare_distance(u, v):
return distance[u] > distance[v]
position = [None for i in range(0, num_vertices)]
def get_pos(v):
return position[v]
def update_pos(v, pos):
position[v] = pos
return PriorityQueue(compare_distance, update_pos, get_pos)
Unlike Dijkstra's, we don't care about the distance from the source to the neighbors, just the weight of the last edge to the neighbor. So we set the distance
to be the weight of the lightest edge to the neighbor.
def relax(e, distance, queue, parent, weight):
if distance[e.target] > weight[e]:
distance[e.target] = weight[e]
parent[e.target] = e.source
queue.increase_key(e.target)
Prim's algorithm maintains a queue of neighbors of the current tree. It repeatedly performs the following steps:
import sys
def prim_minimum_spanning_tree(g, source, weight):
parent = [i for i in range(0, g.num_vertices())]
distance = [sys.maxsize for i in range(0, g.num_vertices())]
visited = [False for i in range(0, g.num_vertices())]
queue = create_priority_queue(g.num_vertices(), distance)
distance[source] = 0
queue.push(source)
visited[source] = True
MST = []
while not queue.empty():
u = queue.pop()
if u != source:
MST.append(UEdge(parent[u], u))
for e in g.out_edges(u):
if not visited[e.target]:
queue.push(e.target)
visited[e.target] = True
relax(e, distance, queue, parent, weight)
return MST
The following shows the output of Prim's algorithm on the example graph.
MST = prim_minimum_spanning_tree(G, 0, weight)
print("MST: ", end="")
print([(name[e.source],name[e.target]) for e in MST], sep=", ")
print("weight of MST: " + str(sum([weight[e] for e in MST])))
MST: [('A', 'B'), ('B', 'E'), ('A', 'D'), ('E', 'C'), ('C', 'F')] weight of MST: 13