Graph Algorithms
Graph Algorithms
Spanning tree of a graph(G) is a sub graph(S) which is a tree, containing all the vertices of graph
G. A graph may have several spanning tree. The spanning tree, whose sum of cost of all the
edges are minimum is called minimum spanning tree.
GenericMST(G,W)
1. S = Ø
2. while S does not form a spanning tree
3. find an edge (u,v) that is safe for S
4. S = S U (u,v)
5. return S
Safe edge: The edge, which is light weight and by adding it in partial MST ,it does not form a
cycle, is called safe edge.
1. S = Ø
2. for each vertex u in V[G]
3. MAKESET(u)
4. sort all the edges in E in non-decreasing order by weights
5. for each edge (u,v) in E // taken in order
6. if FINDSET(u) ≠ FINDSET(v)
7. S = S U (u,v)
8. UNION(u,v)
Problem: Using Kruskal’s algorithm, Find the cost of MST in following graph.
b 8 c 7 d 9
4
11 2 14
a i 4 e
7 6
8
1 10
h g 2 f
Solution
Initial Configuration: Create a forest i,e one set for each node
b c d
{a} {b} {c} {d} {e} {f} {g} {h} {i} a i e
h g f
Sorted E ={ (g,h), (c,i), (f,g), (a,b), (c,f), (g,i),(c,d), (h,i), (a,h), (b,c), (d,e), (e,f), (b,h), (d,f) }
Edge Disjoint set forest Partial MST
selected
(g,h) {a} {b} {c} {d} {e} {f} {g,h} {i}
b c d
a i e
h g f
(d,e) {a, b, c, d, f, g, h, i, e}
b c d
a i e
h g f
(e,f) Not Safe
(b,h) Not Safe
(d,f) Not Safe
Now the obtained tree is a minimum cost of spanning tree. The edges in spanning tree set S are
{ (g,h), (c,i), (f,g), (a,b), (c,f), (c,d), (a,h),(d,e) }. To find the cost of MST, add the weight of all the
edges in the spanning tree.
Cost = 1 + 2 +2 + 4 + 4 + 7 + 8 + 9 = 37.
Analysis of Kruskal’s Algorithm
Each makeset() takes θ(1) time. There are |V| number of call s to makeset() in the loop in line
number 2.So altogether it takes θ(V) times
Line number 4 sorts the set E edges , takes θ(ElogE) times.
In loop from line number 5 to 8, there are 2E number of findset() operation and |V|-1 union
operations. Using efficient implementation of disjoint data structure, both findset and union
operation altogether takes O(E α(V)) , where α is the very slowly growing function.Upper bound
of α(V)) can be expressed by O(logV).
So total running time of kruskal’s algorithm is O(V) + O(ElogE) + O(ElogV) = O(ElogE). As logE is
O(logV), the running time can also be expressed as O(ElogV)
Prim’s Algorithm for MST
In this algorithm, edges in set S always form a single tree. The tree starts from any arbitrary
root vertex and grows by adding a lightweight edge at each step, until the tree spans all the
vertices.
It uses two data structures key and p.
//The graph G=(U,V) is a connected graph with weighted matrix W and a root vertex r. The
//algorithm finds a spanning tree that has minimum cost.
1. for each vertex u ∈ V[G]
2. key[u] = ∞
3. p[u]= NIL
4. key[r] = 0
5. Q = V[G] //Insert all vertices in Priority Queue Q
6. while Q
7. u=EXTRACT_MIN(Q)//Delete a vertex with smallest key from Q
8. for each vertex v Adj[u]
9. if v Q and W(u,v) < key[v]
10. key[v] = W[u,v]
11. p[v] = u
Problem: Using Kruskal’s algorithm, Find the cost of MST in following graph.
b 8 c 7 d 9
4
11 2 14
a i 4 e
7 6
8
1 10
h g 2 f
Solution:
b c d
Step-6: Extract g , Update h
4 8 8 7 7
a b c d e f g h i a 4 i
9 e
Key 0 4 8 7 10 4 2 1 2 11 2 14
0 2 4 10
p N a b c f c f g c 7 6
8
Queue: { d,e, h} 1 2 10
h 7 2 4
g f
Step-7: Extract h , Update nothing b c d
4 8 8 7 7
a b c d e f g h i a 4 i
9 e
Key 0 4 8 7 10 4 2 1 2 11 2 14
0 4 10
p N a b c f c f g c 7 2 6
8
Queue: { d, e} 1 2 10
h 7 2 4
g f
4 b 8 c 7 d 9 Cost =4 + 8 + 7 + 4 +2 + 9 + 2 + 1
2
a i 4 e = 37
1 2 f
h g
Analysis:
Line number 1 to 4 takes O(V) time to set the key and P array.
To insert |V| vertices in a min-priority Queue Q in line number 5 takes O(V) times.
Edge Relaxation: Relaxing an edge (u,v) is testing whether we can improve the shortest path
from vertex ‘u’ to vertex ‘v’ found so far and updating the distance vector and predecessor
vector. s
8
20 32
Relax v
u 8
20 28
Relax(u,v,W)
// Relax the distance from vertex u to v, W is a weighted matrix.
1. if d[u] + w(u,v) < d[v]
2. d[v] = d[u] + w(u,v)
3. p[v] = u
It runs in O(1) time , but it depends on implementation
Algorithm: DIJKSTRA(G,W,s)
//Solves single source shortest path problem on a graph G with Weighted matrix W and a
//source vertex s
1. for each vertex v V [G]
2. d[v] = ∞
3. p[v] = Nil
4. d[s] = 0
5. Q = V[G] // Insert all vertices in Min-Priority Queue Q
6. while Q ≠Ø
7. u= EXTRACT_MIN(Q)//Delete a vertex with smallest d value from Q
8. for each vertex v Adj[u]
9. RELAX(u,v,W)
10.return
Problem: Apply Dijkstra’s algorithm in following graph , assuming source vertex ‘s’
1
10 A B
9
3
2 6 4
S
5 C D
2
Solution:
QUEUE : A,B,C,D
QUEUE : Nil
Analysis:
The loop in line number 1 takes O(V) times.
Inside the while loop, Extract_Min(Q) is called |V|times. Each Extract_Min takes O(logV) times.
So a total of O(VlogV) times is required for all extract_Min operations.
The relax() operation internally calls to the decrease_key() operation to decrease the ‘d’ value
of the vertices. Each decrease_key operation on a Min-Priority Queue takes O(logV) times. As
there are O(E) calls to relax(), it takes a total of O(ElogV) time altogether. Therefore, the
running time of the algorithm is –
O(V) + O(VlogV) + O(ElogV) = O(ElogV)
Bellman-Ford
This algorithm solves single source shortest path problem in which there is one or
more negative weight edge.
Give a weighted directed graph G=(V,E) , a source ‘s’ and a weighted vector W,
the algorithm returns a Boolean value indicating whether or not there is a -ve
weight cycle reachable from source. If there is a –ve weight cycle exists, the
algorithm return 0 indicating ‘No solution exist’. Else, the algorithm produces the
shortest path.
Algorithm: BELLMAN-FORD(G,W,s)
//Solves single source shortest path problem on a graph G with Weighted matrix
//W and a source vertex s
1. for each vertex v V [G]
2. d[v] = ∞
3. p[v] = Nil
4. d[s] = 0
5. for i =1 to |V[G]| -1
6. for each edge(u,v) E[G]
7. RELAX(u,v,W)
8. for each edge(u,v) E[G]
9. if d[v] > d[u] + W(u,v)
10. return False
11. return True
Analysis : Loop in line 1-3takes O(V) times. For loop in line-5 has a inner loop that
runs O(E) times for all the vertices in V., which contributes O(VE) times. Loop in
line-8 takes O(E) times. S the total running time = O(V) + O(VE) +O(E) = O(VE).
Illustration of Bellman-Fold algorithm on following graph.
A 1 B
10 8 9
9
2 3 6 4
S 0
5 5 2 7
C D
Huffman Code
Huffman code is a technique used for data compression i,e the reduction in size of
data or file.
Huffman’s greedy algorithm used a table of occurrences of characters to build an
optimal way of representing each character as binary strings. Use of Huffman
code always creates ambiguity free data compression as the generated code is a
prefix code.
Prefix code: the code in which no codeword is a prefix of some other codeword is
called prefix code. It is highly required in order to avoid any ambiguity in
differentiating the characters.
Two types of Huffman code
Fixed length code: It uses a fixed number of bits to represent all the
characters in the file. It is less efficient.
Variable length code: It is an optimal approach, where frequently occurred
characters are assigned shorter length codeword and infrequent characters
are assigned longer codeword. These will reduce the size of the data file to
a great extent.
Decoding schemes:
The code words are obtained from leaves. The codeword for a character is
interpreted by combining the symbols from the root to that character, where ‘0’
indicates the left edge and ‘1’ indicates the right edge.
Problem: Following table represents the characters and their frequency of
occurrences. Construct the Huffman codes. Also, find the compression ratio.
Character a b c d e f
Frequency 25 10 15 5 42 3
(in thousands)
Solution:
Size of the original file = number of character × bit length of each character
= 1,00,000 × 8 = 8,00,000
Construction of Huffman code using Fixed length code
0 100 1
55 0 45
0 1
0 35 1 20 1
0 0 145
a:25 b:10 c:15 d:5 e:42 f:3
After collecting the symbols the codeword for each character are given below.
Character Original code Fixed length Huffman code
a 01100001 000
b 01100010 001
c 01100011 010
d 01100100 011
e 01100101 100
f 01100110 101
Length of each codeword is 3. So size of entire file = 1,00,000 × 3 = 3,00,000
The compression ratio is 8,00,000/3,00,000 = 2.67
58
33
18
8
e:42 0 58
1
a:25 33
0 1
c:15 0 18
1
8 b:10
0 1
f:3 d:5
After collecting the symbols the codeword for each character are given below.
Huffman(C)
1. n = |C|
3. for i=1 to n - 1
4. create a node Z
5. left[Z] = x =Extract_Min(Q)
6. right[Z] = y =Extract_Min(Q)
8. Insert(Q,z)
9. return Extract_Min(Q)
Analysis:
Each Extract_Min() ins inside the loop takes O(logn) times. As the loop runs n-1
times. So time spent in the loop is O(nlogn) time.
As a fractional knapsack problem, the fraction of items can be taken, i,e 0≤xi≤1.
n = 3, M = 20,
Item X1 X2 X3
Weight 18 15 10
Value 25 24 15
Solution:
Arrange the items in most valuable to least valuable order. (i.e. value/weight)
Item X2 X3 X1
Weight(W) 15 10 18
Value(V) 24 15 25
V/W 1.6 1.5 1.4
Knapsack_Greedy(n,M,W,V)
1. Arrange the items in ascending order by Vi/Wi.
2. For i = 1 to n
3. X[i] = 0
4. total = 0
5. for i= 1 to n
6. if W[i] <=M
7. X[i] = 1
8. M = M – W[i]
9. total = total + V[i]
10. else,
11. X[i] = M/W[i]
12. total = total + V[i]×M/W[i]
13. M = 0
14. return X , total