0% found this document useful (0 votes)
53 views72 pages

Greedy Algorithm and Exchange Argument

The document discusses the Greedy Algorithm, which constructs solutions through iterative myopic decisions, and highlights its features, including ease of analysis but difficulty in establishing correctness. It explains the exchange trick for proving the correctness of greedy algorithms and provides examples such as job scheduling and the fractional knapsack problem. Additionally, it outlines a flow for developing greedy algorithms and addresses the limitations of certain greedy approaches.

Uploaded by

fakertoolzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views72 pages

Greedy Algorithm and Exchange Argument

The document discusses the Greedy Algorithm, which constructs solutions through iterative myopic decisions, and highlights its features, including ease of analysis but difficulty in establishing correctness. It explains the exchange trick for proving the correctness of greedy algorithms and provides examples such as job scheduling and the fractional knapsack problem. Additionally, it outlines a flow for developing greedy algorithms and addresses the limitations of certain greedy approaches.

Uploaded by

fakertoolzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Greedy Algorithm

Construct a solution iteratively, via


sequence of myopic decisions, and hope
that everything works out in the end.

Prepared by Pawan K. Mishra


Template
Greedy Algorithm(A,n) Sorting / or do something to rank
Candidates = rank (A) Initialize solution
solution = ø Some greedy choice
for i = 1 to n Check feasibility before adding
c= findbest(Candidates) Remove the selected element
solution= solution U {c} Update the set candidates (optional)
candidates= candidates \ {c}
candidates= revaluate(candidates)
return solution

Prepared by Pawan K. Mishra


Features and Bugs of the Greedy paradigm
• Easy to come up with one or more greedy algorithms

• Easy to analyse the running time

• Hard to establish correctness

• Warning: Most greedy algorithms are not always


correct.
Prepared by Pawan K. Mishra
Exchange trick, to prove correctness (Worked
often, but not always)
Let A be the greedy algorithm that we are trying to prove correct, and
A(I) the output of A on some input I.
Let O be an optimal solution on input I that is not equal to A(I).

The goal in exchange argument is to show how to modify O to create a new solution O’ with the following
properties:
• 1. O’ is at least as good of solution as O (or equivalently O’ is also optimal), and
• 2. O’ is “more like” A(I) than O.

Prepared by Pawan K. Mishra


Exchange trick, to prove correctness (Worked
often, but not always)
Let A be the greedy algorithm that we are trying to prove correct, and
A(I) the output of A on some input I.
Let O be an optimal solution on input I that is not equal to A(I).

The goal in exchange argument is to show how to modify O to create a new solution O’ with the following
properties:
• 1. O’ is at least as good of solution as O (or equivalently O’ is also optimal), and
• 2. O’ is “more like” A(I) than O.

THIS IS THE CREATIVE PART - different for each algorithm/problem.

Prepared by Pawan K. Mishra


Two ways to proceed
First way. Contradiction
Theorem: The algorithm A solves the problem.
Proof by contradiction: Algorithm A doesn’t solve the problem.
• Hence, there must be some input I on which A does not produce an optimal solution. Let the output
produced by A be A(I).
Fact: Let O be the optimal solution that is most like A(I).
• If we can show how to modify O to create a new solution O’ with the following properties:
1. O’ is at least as good of solution as O (and hence O’ is also optimal), and
2. O’ is more like A(I) than O.

Then we have a contradiction to the choice of O. Thus, the theorem.

Prepared by Pawan K. Mishra


Two ways to proceed
Second way. Constructive way
Theorem: The algorithm A solves the problem.

Let I be an arbitrary instance. Let O be arbitrary optimal solution for I. Assume that we can show how to
modify O to create a new solution O’ with the following properties
1. O’ is at least as good of solution as O (and hence O’ is also optimal), and
2. O’ is more like A(I) than O.

Then consider the sequence O’; O’’; O’’’; O’’’’; ….


Each element of this sequence is optimal, and more like A(I) than the proceeding element. Hence, ultimately
this sequence must terminate with A(I).
Hence, A(I) is optimal.

Prepared by Pawan K. Mishra


P1: Job Scheduling

Time in the system for a job: Waiting time


All jobs arrived at time ZERO
IP: Set of n jobs={j1,j2,…jn} with processing time P(j1), P(j2), …P(jn), and a
single resource.
OP: Schedule jobs on one resource s.t. it minimizes the total waiting
time in the system.

Prepared by Pawan K. Mishra


Example
Job 1- 5 units , Job 2- 10 units, Job 3- 4 units
Order waiting time
[1,2,3]- 0+(5)+(5+10)= 20
[3,1,2]- 0+(4)+(4+5)=13

Prepared by Pawan K. Mishra


Example
Job 1- 5 units , Job 2- 10 units, Job 3- 4 units
Order waiting time
[1,2,3]- 0+(5)+(5+10)= 20
[3,1,2]- 0+(4)+(4+5)=13… this order is the optimal

Prepared by Pawan K. Mishra


Developing Intuition
Some arbitrary order of jobs…
Finish time of job 1 = t1
Finish time of job 2 = t1+ t2
Finish time of job 3 = t1+ t2+t3
.
.
.
Finish time of job n = t1+ t2+…+tn
Total Finishing Time = nt1+(n-1)t2+…+tn
Prepared by Pawan K. Mishra
Developing Intuition
Some arbitrary order of jobs…
Finish time of job 1 = t1
Finish time of job 2 = t1+ t2
Finish time of job 3 = t1+ t2+t3
Can you guess the greedy choice ?
.
.
.
Finish time of job n = t1+ t2+…+tn
Total Finishing Time = nt1+(n-1)t2+…+tn
Prepared by Pawan K. Mishra
Developing Intuition
Some arbitrary order of jobs…
Finish time of job 1 = t1
Finish time of job 2 = t1+ t2
Finish time of job 3 = t1+ t2+t3
Can you guess the greedy choice ?
.
. Shortest Job First

.
Finish time of job n = t1+ t2+…+tn
Total Finishing Time = nt1+(n-1)t2+…+tn
Prepared by Pawan K. Mishra
The only schedule that minimizes the total waiting time in
the system is one that schedules jobs in nondecreasing order
by processing time.

Use exchange trick…..with contradiction


Assume the statement is not true.
Fact: OPT is an optimal solution.

Note that in OPT, there exists two consecutive jobs X and Y, s.t. X is being served before Y, and P(x) > P(y).
Else OPT will be same schedule which follows the shortest job sequence order.
Now suppose we interchange X and Y in OPT, and else remain same. What happens?

Prepared by Pawan K. Mishra


The only schedule that minimizes the total waiting time in
the system is one that schedules jobs in nondecreasing order
by processing time.

Use exchange trick…..with contradiction


Assume the statement is not true.
Fact: OPT is an optimal solution.

Note that in OPT, there exists two consecutive jobs X and Y, s.t. X is being served before Y, and P(x) > P(y).
Else OPT will be same schedule which follows the shortest job sequence order.
Now suppose we interchange X and Y in OPT, and else remain same. What happens?

Order in OPT …………………. XY………..………..


New order OPT’ …………………. YX………………….

Prepared by Pawan K. Mishra


The only schedule that minimizes the total waiting time in
the system is one that schedules jobs in nondecreasing order
by processing time.

Use exchange trick…..with contradiction


Assume that it is not true.
Fact: OPT is an optimal solution.

Note that in OPT, there exists two consecutive jobs X and Y, s.t. X is being served before Y, and P(x) > P(y).
Else OPT will be same schedule which follows the shortest job sequence order.
Now suppose we interchange X and Y in OPT, and else remain same. What happens?

Order in OPT …………………. XY………..……….. No need to worry about these jobs (think why?? )
New order OPT’ …………………. YX………………….

Prepared by Pawan K. Mishra


The only schedule that minimizes the total waiting time in
the system is one that schedules jobs in nondecreasing order
by processing time..

Use exchange trick…..with contradiction


Assume that it is not true.
Fact: OPT is an optimal solution.

Note that in OPT, there exists two consecutive jobs X and Y, s.t. X is being served before Y, and P(x) > P(y).
Else OPT will be same schedule which follows the shortest job sequence order.
Now suppose we interchange X and Y in OPT, and else remain same. What happens?

Order in OPT …………………. XY………..……….. In OPT, to do Job Y, need to wait till Job X being served.
New order OPT’ …………………. YX…………………. In OPT’, to do Job X, need to wait till Job Y being served.

Prepared by Pawan K. Mishra


The only schedule that minimizes the total waiting time in
the system is one that schedules jobs in nondecreasing order
by processing time.
Use exchange trick…..with contradiction
Assume that it is not true.
Fact: OPT is an optimal solution.

Note that in OPT, there exists two consecutive jobs X and Y, s.t. X is being served before Y, and P(x) > P(y).
Else OPT will be same schedule which follows the shortest job sequence order.
Now suppose we interchange X and Y in OPT, and else remain same. What happens?

Order in OPT …………………. XY………..……….. In OPT, to do Job Y, need to wait till Job X being served.
New order OPT’ …………………. YX…………………. In OPT’, to do Job X, need to wait till Job Y being served.

Total wait Time (OPT’)= Total wait Time(OPT)-P(X)+P(Y)

Prepared by Pawan K. Mishra


The only schedule that minimizes the total waiting time in
the system is one that schedules jobs in nondecreasing order
by processing time.

Use exchange trick…..with contradiction


Assume that statement is not true.
Fact: OPT is an optimal solution.
Note that in OPT, there exists two consecutive jobs X and Y, s.t. X is being served before Y, and P(x) > P(y).
Else OPT will be same schedule which follows the shortest job sequence order.
Now suppose we interchange X and Y in OPT, and else remain same. What happens?

Order in OPT …………………. XY………..……….. In OPT, to do Job Y, need to wait till Job X being served.
New order OPT’ …………………. YX…………………. In OPT’, to do Job X, need to wait till Job Y being served.

Total wait Time (OPT’)= Total wait Time(OPT)-P(X)+P(Y) . Now, we know that P(x) > P(y).
Thus, Total wait Time (OPT’) < Total wait Time(OPT)
Prepared by Pawan K. Mishra CONTADICTION TO THE OPTIMALITY
Job Scheduling with m resources,
(minimize waiting time)

GENERALIZE THE PREVIOUS


IDEA TO SOLVE THIS
PROBLEM.

Prepared by Pawan K. Mishra


Flow to solve a problem
Given a problem P,

Try to come up with a greedy algorithm


a) then try to construct a counter example,
b) if you can construct, GOTO step 1.
c) if you cannot construct a counter example, ask your friend.
d) if your friend also fail to find one, ask me.
e) If we cannot construct one, we will post it in math.stackexchange or mathoverflow.net, and wait for few days..
f) if no one replies, then try to prove that your greedy algorithm works.
g) if we can prove, then alright.
h) if we are unable to prove, time to tune our greedy choice based on the difficulty we face during the proof.
TRY

Algorithm TUNE Proof LOOP_eternity


i) I will discuss this step in the last class.
Prepared by Pawan K. Mishra
Minimum vertex cover on tree
The greedy algorithm discussed in the class for Minimum vertex cover on
tree was wrong. So, I am correcting it.
Steps:
0. S = Null;
1. Find a leaf node n,
take neighbor node of the leaf node n, and add to the solution set S.
2. Repeat Step 1.

ˆ Try to write the correctness of this algorithm.

ˆ The algorithm discussed in class will not give you an optimal result on
tree.

ˆ But you can prove something beautiful about the algorithm discussed in
class: It gives you result which is log n-factor away from the optimal.
It means if we compare the algorithm result with an optimal result, the
algorithm gives you result not more than (log n * optimal result) in the
worst case. (statement holds for graph)

1
Fractional knapsack
Intuition and the correctness

Prepared by Pawan K. Mishra


Fractional Knapsack Problem
I/P: n items, each item i has profit pi and size si
: a bag with capacity B
O/P: maximize the profit without violating the capacity constraint.
Need to fit items in bag, here we can use fraction of items…
Find x1 for item ii , x2 for item i2 , ….. , xn for item in s.t.
x1 s1+ x2 s2+ ….+ xn sn ≤ B
Each 0 ≤ xi ≤1
GOAL: Max x1 p1+ x2 p2+ ….+ xn pn

Prepared by Pawan K. Mishra


Lets find a better greedy choice….
Let suppose that bag is full.
Can we remove some fraction of an item i by adding some fraction of item j
(bag must be full after doing changes in fractions of two items I and j)
Such that profit increases?

Prepared by Pawan K. Mishra


Lets find a better greedy choice….
Let suppose that bag is full.
Can we remove some fraction of an item k by adding some fraction of item l
Such that profit increases?

let y1 for item ii , y2 for item i2 , ….. , yn for item in

y1 s1+ y2 s2+ ….+ yn sn = B (by assumption)

y1 y2 yk yl yn
y1 y2 (yk – e’) (yl+e) yn

y1 s1+ y2 s2+ … +(yk – e’)sk + ….. + (yl+e)sl + yn sn = B

Prepared by Pawan K. Mishra


Lets find a better greedy choice….
Let suppose that bag is full.
Can we remove some fraction of an item i by adding some fraction of item j
Such that profit increases?

y1 for item ii , y2 for item i2 , ….. , yn for item in

y1 s1+ y2 s2+ ….+ yn sn = B (by assumption)

y1 y2 yk yl yn
y1 y2 (yk – e’) (yl+e) yn

y1 s1+ y2 s2+ … +(yk – e’)sk + ….. + (yl+e)sl + yn sn = B


y1 s1+ y2 s2+ ….+ yn sn+ (– e’sk + esl )= B
B+ (– e’sk + esl )= B
𝑠𝑙 𝑒 ′
esl = e’sk => 𝑠 = ⅇ
𝑘
Prepared by Pawan K. Mishra
Lets find a better greedy choice….
Let suppose that bag is full.
Can we remove some fraction of an item i by adding some
• Profit
fraction of item j
New profit
Such that profit increases?
y1 for item ii , y2 for item i2 , ….. , yn for item in = y1 p1+ y2 p2+ … +(yk – e’)pk + ….. + (yl+e)pl +
y1 s1+ y2 s2+ ….+ yn sn = B (by assumption) yn pn
= old profit – e’pk + epl
y1 y2 yk yl yn
= old profit + epl – e’pk
y1 y2 (yk – e’) (yl+e) yn
New profit > old profit ??
y1 s1+ y2 s2+ … +(yk – e’)sk + ….. + (yl+e)sl + yn sn = B if epl – e’pk > 0
y1 s1+ y2 s2+ ….+ yn sn+ (– e’sk + esl )= B pl/pk > e’/ e
B+ (– e’sk + esl )= B pl/pk > e’/ e = sl/sk

esl =e’sk => 𝑠𝑙 𝑒
= pl/ sl > pk / sk
𝑠𝑘 ⅇ
Prepared by Pawan K. Mishra
Algorithm
For each item i, scorei = profiti / sizei

Order items by decreasing scores.

Pick them in this order till bag is filled

Prepared by Pawan K. Mishra


Algorithm
For each item i, scorei = profiti / sizei

Order items by decreasing scores.

Pick them in this order till bag is filled

Fractions of items by our method : 1, 1 , 1, 1, 1, α , 0, 0, 0, 0, 0


Last item may not be picked full (0, 1)
Prepared by Pawan K. Mishra
Proof of correctness
Exchange argument… using contradiction
Suppose for contradiction that solution proposed by greedy algorithm
X1, X2,..., Xn in not an optimal solution.
Fact: Take an optimal solution fractions Z1, Z2,..., Zn which differ at some
position with greedy solution.
Let i be there first index at which Xi≠ Zi.By the design of our algorithm, it
must be that Xi > Zi.
By the optimality of OPT, there must exist an item j > i such that Zj> Xj.
Consider a new solution O’={Z1’, Z2’,..., Zn’ } where Zk’= Zk, except at k ≠ i,j.
Suppose we increase Zi’=Zi’+e1 and decrease Zj’=Zj-e2. Here e1 , e2 > 0.

Prepared by Pawan K. Mishra


Proof of correctness
Exchange argument… using contradiction
Suppose for contradiction that solution proposed by greedy algorithm
X1, X2,..., Xn in not an optimal solution.
Fact: Take an optimal solution fractions Z1, Z2,..., Zn which differ at some
position with greedy solution.
Let i be there first index at which Xi≠ Zi.By the design of our algorithm, it
must be that Xi > Zi.
By the optimality of OPT, there must exist an item j > i such that Zj> Xj.
Consider a new solution O’={Z1’, Z2’,..., Zn’ } where Zk’= Zk, except at k ≠ i,j.
Suppose we increase Zi’=Zi’+e1 and decrease Zj’=Zj-e2. ➔ e1si=e2sj.

Prepared by Pawan K. Mishra


Proof of correctness
Exchange argument… using contradiction
Suppose for contradiction that solution proposed by greedy algorithm X1, X2,..., Xn in not an optimal solution.
Fact: Take an optimal solution fractions Z1, Z2,..., Zn which differ at some position with greedy solution.
Let i be there first index at which Xi≠ Zi.By the design of our algorithm, it must be that Xi > Zi.
By the optimality of OPT, there must exist an item j > i such that Zj> Xj.
Consider a new solution O’={Z1’, Z2’,..., Zn’ } where Zk’= Zk, except at k ≠ i,j.
Suppose we increase Zi’=Zi’+e1 and decrease Zj’=Zj-e2. ➔ e1si=e2sj.
O’ has better profit than O?

Prepared by Pawan K. Mishra


Proof of correctness
Exchange argument… using contradiction
Suppose for contradiction that solution proposed by greedy algorithm X1, X2,..., Xn in not an optimal solution.
Fact: Take an optimal solution fractions Z1, Z2,..., Zn which differ at some position with greedy solution.
Let i be there first index at which Xi≠ Zi.By the design of our algorithm, it must be that Xi > Zi.
By the optimality of OPT, there must exist an item j > i such that Zj> Xj.
Consider a new solution O’={Z1’, Z2’,..., Zn’ } where Zk’= Zk, except at k ≠ i,j.
Suppose we increase Zi’=Zi’+e1 and decrease Zj’=Zj - e2. ➔ e1si=e2sj.
Do O’ has better profit than O?

new profit by O’= Z1‘p1+ Z2‘ p2+ … + (Zi‘ + e1 )pi + ….. + (Zj‘ – e2 )pj+ ….+ Zn‘ pn
= old profit by O + e1 pi – e2 pj
= old profit by O + e1 pi – (e1 si/sj) pj = old profit by O + e1 pi – (e1 si) pj/sj
= old profit by O + sie1 (pi/si) – (e1 si) pj/sj = old profit by O + si e1 (pi/si - pj/sj )

Prepared by Pawan K. Mishra


Proof of correctness
Exchange argument… using contradiction
Suppose for contradiction that solution proposed by greedy algorithm X1, X2,..., Xn in not an optimal solution.
Fact: Take an optimal solution fractions Z1, Z2,..., Zn which differ at some position with greedy solution.
Let i be there first index at which Xi≠ Zi.By the design of our algorithm, it must be that Xi > Zi.
By the optimality of OPT, there must exist an item j > i such that Zj> Xj.
Consider a new solution O’={Z1’, Z2’,..., Zn’ } where Zk’= Zk, except at k ≠ i,j.
Suppose we increase Zi’=Zi’+e1 and decrease Zj’=Zj - e2. ➔ e1si=e2sj.
Do O’ has better profit than O?
because of our algo
new profit by O’= Z1‘p1+ Z2‘ p2+ … + (Zi‘ + e1 )pi + ….. + (Zj‘ – e2 )pj+ ….+ Zn‘ pn this is > 0
= old profit by O + e1 pi – e2 pj
= old profit by O + e1 pi – (e1 si/sj) pj = old profit by O + e1 pi – (e1 si) pj/sj
= old profit by O + sie1 (pi/si) – (e1 si) pj/sj = old profit by O + si e1 (pi/si - pj/sj )

Prepared by Pawan K. Mishra


Proof of correctness
Exchange argument… using contradiction
Suppose for contradiction that solution proposed by greedy algorithm X1, X2,..., Xn in not an optimal solution.
Fact: Take an optimal solution fractions Z1, Z2,..., Zn which differ at some position with greedy solution.
Let i be there first index at which Xi≠ Zi.By the design of our algorithm, it must be that Xi > Zi.
By the optimality of OPT, there must exist an item j > i such that Zj> Xj.
Consider a new solution O’={Z1’, Z2’,..., Zn’ } where Zk’= Zk, except at k ≠ i,j.
Suppose we increase Zi’=Zi’+e1 and decrease Zj’=Zj - e2. ➔ e1si=e2sj.
Do O’ has better profit than O?
because of our algo
new profit by O’= Z1‘p1+ Z2‘ p2+ … + (Zi‘ + e1 )pi + ….. + (Zj‘ – e2 )pj+ ….+ Zn‘ pn this is > 0
= old profit by O + e1 pi – e2 pj
= old profit by O + e1 pi – (e1 si/sj) pj = old profit by O + e1 pi – (e1 si) pj/sj
= old profit by O + sie1 (pi/si) – (e1 si) pj/sj = old profit by O + si e1 (pi/si - pj/sj )

new profit by O’ > old profit by O . Hence the contradiction to the optimality
Prepared by Pawan K. Mishra
Huffman code

Prepared by Pawan K. Mishra


P9. Huffman code

Kleinberg and Tardos: Algorithm design

Prepared by Pawan K. Mishra


Example
S={a,b,c,d,e} with f(a)= 0.32 , f(b)= 0.25 , f(c)= 0.20 ,
f(d)= 0.18, f(e)= 0.05

Prepared by Pawan K. Mishra


Example
S={a,b,c,d,e} with f(a)= 0.32 , f(b)= 0.25 , f(c)= 0.20 ,
f(d)= 0.18, f(e)= 0.05

CBE: Constant bit encoding, 3 bits,


Average number of bits per letter
= 0.32*3+ 0.25*3 +0.20*3+ 0.18*3+ 0.05*3=3
λ1:Prefix code: a=11, b=01, c=001, d=10,e=000,
Average number of bits per letter
= 0.32*2+0.25*2+0.20*3+0.18*2+0.05*3=2.25
λ2 :Prefix code: a=11, b=01, c=01, d=001,e=000,
Average number of bits per letter
=0.32*2+0.25*2+0.20*2+0.18*2+0.05*3=2.23
Prepared by Pawan K. Mishra
Example
S={a,b,c,d,e} with f(a)= 0.32 , f(b)= 0.25 , f(c)= 0.20 ,
f(d)= 0.18, f(e)= 0.05

CBE: Constant bit encoding, 3 bits,


ABL(CBE)= 0.32*3+ 0.25*3 +0.20*3+ 0.18*3+ 0.05*3=3
λ1:Prefix code: a=11, b=01, c=001, d=10,e=000,
ABL(λ1)=0.32*2+0.25*2+0.20*3+0.18*2+0.05*3=2.25
λ2 :Prefix code: a=11, b=01, c=01, d=001,e=000,
ABL(λ2 )=0.32*2+0.25*2+0.20*2+0.18*2+0.05*3=2.23

Prepared by Pawan K. Mishra


Given a set S={x1, x2,…, xn} of n characters and
frequency of each character xi is f(xi).
And f(x1)+f(x2)+…+f(xn)=1

Find a prefix code of S, λ, such that


λ(x1)f(x1)+ λ(x2)f(x2) +…+ λ(xn)f(xn) is
minimized.
ABL(λ) = λ(x1)f(x1)+ λ(x2)f(x2)
+…+ λ(xn)f(xn)
Prepared by Pawan K. Mishra
Representing prefix code as binary tree….
Binary tree, where
number of leaves
equal to the number of
characters, and we
label each leaf node as
a character of S. Such
labelled tree, naturally
describes a prefix code
of S.

Prepared by Pawan K. Mishra


Another way to define the same problem

Prefix tree is equivalent to binary tree with n leaves,


where n is the number of character.

goal is to minimize h(x1)f(x1)+ h(x2)f(x2) +…+ h(xn)f(xn),


where h(.) is the height/depth of a node in the binary
tree.

Prepared by Pawan K. Mishra


INTUTION to proceed further

Prepared by Pawan K. Mishra


Observation 1. The encoding of S constructed
from T is prefix code.

Prepared by Pawan K. Mishra


Observation 2. Similarly, given a prefix code we
can construct binary tree recursively.

Prepared by Pawan K. Mishra


Observation 3. The binary tree corresponding to
the optimal prefix code is full (each node that is not
a leaf has two children).

Prepared by Pawan K. Mishra


Observation 3. The binary tree corresponding to
the optimal prefix code is full (each node that is not
a leaf has two children).
Let binary tree T denote
optimal prefix code, and
suppose it contains a
node u with exactly one
child v.

u
v

Prepared by Pawan K. Mishra


Observation 3. The binary tree corresponding to
the optimal prefix code is full (each node that is not
a leaf has two children).
Let binary tree T denote
optimal prefix code, and
suppose it contains a
node u with exactly one
child v.

v
EXCHANGE u AND v to get T’
This will give new optimal prefix tree T’, contradiction to the optimality
o f T.

Prepared by Pawan K. Mishra


Observation 4. Suppose that u and v are two leaf nodes of optimal prefix tree T*,
such that depth(u) < depth(v). Further suppose that in labelling of T* corresponding
to an optimal prefix code, leaf node u is labelled with character y of S and leaf node
v is labelled with z of S, then f(y) > f(z).

Prepared by Pawan K. Mishra


Observation 5. At the highest depth, there will
be at least two leaf nodes

Prepared by Pawan K. Mishra


Observation 6. At the highest depth, there will be
at least two leaf nodes in optimal prefix tree, and
there frequency will be low, compare to any leaf
node with less depth. T*

dv Highest depth

w v
Prepared by Pawan K. Mishra
Observation 6. At the highest depth, there will be
at least two leaf nodes in optimal prefix tree, and
there frequency will be low, compare to any leaf
node with less depth. T*

dv Highest depth

m u

w v
Prepared by Pawan K. Mishra
Observation 6. At the highest depth, there will be
at least two leaf nodes in optimal prefix tree, and
there frequency will be low, compare to any leaf
node with less depth. T* We know that in optimal Prefix binary tree,
depth of the lowest Frequency is the highest
(previous result)
And this result gives us that in maximum depth
there are two nodes (two lowest frequencies…)

dv Highest depth

w u

m v
Prepared by Pawan K. Mishra
Greedy Algorithm:
1. Consider all pairs: <frequency, symbol>.

2. Choose the two lowest frequencies, and make


them siblings, with the root having the
combined frequency.

3. Iterate.

Prepared by Pawan K. Mishra


Greedy Algorithm Example:

Alphabet: A, B, C, D, E, F

Frequency table: We can have frequency in values also

A B C D E F
10 20 30 40 50 60

Total File Length: 210

Prepared by Pawan K. Mishra


Algorithm Run:
A 10 B 20 C 30 D 40 E 50 F 60

Prepared by Pawan K. Mishra


Algorithm Run:
X 30 C 30 D 40 E 50 F 60

A 10 B 20

Prepared by Pawan K. Mishra


Algorithm Run:
Y 60 D 40 E 50 F 60

X 30 C 30

A 10 B 20

Prepared by Pawan K. Mishra


Algorithm Run:
D 40 E 50 Y 60 F 60

X 30 C 30

A 10 B 20

Prepared by Pawan K. Mishra


Algorithm Run:
Z 90 Y 60 F 60

D 40 E 50 X 30 C 30

A 10 B 20

Prepared by Pawan K. Mishra


Algorithm Run:
Y 60 F 60 Z 90

X 30 C 30 D 40 E 50

A 10 B 20

Prepared by Pawan K. Mishra


Algorithm Run:
W 120 Z 90

Y 60 F 60 D 40 E 50

X 30 C 30

A 10 B 20

Prepared by Pawan K. Mishra


Algorithm Run:
Z 90 W 120

D 40 E 50 Y 60 F 60

X 30 C 30

A 10 B 20

Prepared by Pawan K. Mishra


Algorithm Run:
V 210
0 1

Z 90 W 120
1
0 0 1
D 40 E 50 Y 60 F 60
1
0

X 30 C 30
0 1

A 10 B 20

Prepared by Pawan K. Mishra


The Huffman encoding:
V 210
A: 1000 0 1
B: 1001
Z 90
C: 101 1
W 120
0 0 1
D: 00
E: 01 D 40 E 50 Y 60 F 60
1
F: 11 0

X 30 C 30
0 1

A 10 B 20

File Size: 10x4 + 20x4 + 30x3 + 40x2 + 50x2 + 60x2 =


40 + 80 + 90 + 80 + 100 + 120 = 510 bits
Prepared by Pawan K. Mishra
Note the savings:
The Huffman code:
Required 510 bits for the file.

Fixed length code:


Need 3 bits for 6 characters.
File has 210 characters.

Total: 630 bits for the file.


Prepared by Pawan K. Mishra
Note also:
For uniform character distribution:

The Huffman encoding will be equal to the


fixed length encoding.

Why?

Assignment.
Prepared by Pawan K. Mishra
Formally, the algorithm:
Initialize trees of a single node each.

Keep the roots of all subtrees in a priority


queue.

Iterate until only one tree left:


Merge the two smallest frequency
subtrees into a single subtree with two
children, and insert into priority queue.
Prepared by Pawan K. Mishra
Algorithm time:
Each priority queue operation (e.g. heap):
O(log n)

In each iteration: one less subtree.

Initially: n subtrees.

Total: O(n log n) time.

Prepared by Pawan K. Mishra

You might also like