Greedy Algorithms
Part - 4 : Huffman Coding
Huffman Codes
• Widely used technique for data compression
• Assume the data to be a sequence of characters
• Looking for an effective way of storing the data
• Binary character code
– Uniquely represents a character by a binary string
2
Two Ways to Compress data
• Fixed-Length Codewords
• Variable-Length Codewords
3
Example
E.g.:
Suppose you have to compress a data file containing
100,000 characters
The data file only has 6 characters (a, b, c, d, e, f)
a b c d e f
Frequency (thousands) 45 13 12 16 9 5
4
Fixed-Length Codes
E.g.: Data file containing 100,000 characters
a b c d e f
Frequency (thousands) 45 13 12 16 9 5
• 3 bits needed
• a = 000, b = 001, c = 010, d = 011, e = 100, f = 101
• Requires: 100,000 X 3 = 300,000 bits
5
Huffman Codes
• Idea:
– Variable Length Codes
– Use the frequencies of occurrence of characters to
build a optimal way of representing each character
6
Constructing a Huffman Code
• Idea:
– Arranging the characters in Ascending order of their
frequencies
f: 5 e: 9 c: 12 b: 13 d: 16 a: 45
– Add Two minimum frequencies and place them to the
right position
7
Example
f: 5 e: 9 c: 12 b: 13 d: 16 a: 45 c: 12 b: 13 14 d: 16 a: 45
0 1
f: 5 e: 9
14 d: 16 25 a: 45 25 30 a: 45
0 1 0 1 0 1 0 1
f: 5 e: 9 c: 12 b: 13 c: 12 b: 13 14 d: 16
0 1
f: 5 e: 9
a: 45 55 0 100 1
0 1
a: 45 55
25 30 0 1
0 1 0 1
25 30
c: 12 b: 13 14 d: 16 0 1 0 1
0 1 c: 12 b: 13 d: 16
14
f: 5 e: 9 0 1
f: 5 e: 9 8
Variable-Length Codes
E.g.: Data file containing 100,000 characters
a b c d e f
Frequency (thousands) 45 13 12 16 9 5
Variable length
0 101 100 111 1101 1100
Codewords
• Assign short codewords to frequent characters and
long codewords to infrequent characters
• ((45 * 1) + (13 * 3) + (12 * 3) + (16 * 3) + (9 * 4) + (5 * 4)) X 1,000
= 224,000 bits 9
Comparison Ration
a b c d e f
Frequency (thousands) 45 13 12 16 9 5
Fixed Length
000 001 010 011 100 101
Codewords
Variable length
0 101 100 111 1101 1100
Codewords
300000 – 224000
X 100
300000
= 25.33%
Saves 20% to 90% cost. 10
Building a Huffman Code
Alg.: HUFFMAN(C) Running time: O(nlgn)
1. n C
2. QC O(n)
3. for i 1 to n – 1
4. do allocate a new node z
5. left[z] x EXTRACT-MIN(Q)
O(nlgn)
6. right[z] y EXTRACT-MIN(Q)
7. f[z] f[x] + f[y]
8. INSERT (Q, z)
9. return EXTRACT-MIN(Q)
11
Encoding and Decoding
0 100 1 Character Codeword
a: 45 55
0 1 a 0
25 30 b 101
0 1 0 1
c: 12 b: 13 14 d: 16 c 100
0 1
d 111
f: 5 e: 9
e 1101
f 1100
Encode : abcde
Decode: 101100011111011100
12
Dynamic Programming
Introduction
Dynamic Programming
• An algorithm design technique for optimization problems
(similar to divide and conquer)
• Dynamic Programming solves every subproblems just once
and store the answer in a table; next time when the result is
needed , it simply use the data from the table.
• Examines the previously solved subproblems and combines
their solutions to give the best solution for the whole problem.
14
Dynamic Programming
Part – 1: Longest increasing subsequence(LIS)
Longest increasing subsequence(LIS)
•
The Longest Increasing Subsequence (LIS) problem
is to find the length of the longest subsequence of a
given sequence such that all elements of the
subsequence are sorted in increasing order.
• For example,
{10, 22, 9, 33, 21, 50, 41, 60, 80}
LIS is {10, 22, 33, 50, 60, 80}
Length of LIS = 6
16
Increasing subsequences
• e.g. You have given the following array
9 2 5 3 7 11 8 10 13 6
Now Find The increasing subsequences.
17
Increasing subsequences
• e.g. 9 2 5 3 7 11 8 10 13 6
2 3 7
5 7 10 13 are increasing subsequences.
2 5 7 8 10 13
2 3 7 8 10 13
We want to find a longest one.
So LIS = 2 3 7 8 10 13 OR 2 5 7 8 10 13
And Length of LIS = 6
9 7 11 are not increasing subsequences.
5 3 11 13
18
More Example
• Input : arr[] = {3, 10, 2, 1, 20}
• Output : Length of LIS = 3
• The longest increasing subsequence is 3, 10, 20
• Input : arr[] = {3, 2}
• Output : Length of LIS = 1
• The longest increasing subsequences are {3} and {2}
• Input : arr[] = {50, 3, 10, 7, 40, 80}
• Output : Length of LIS = 4
• The longest increasing subsequence is {3, 7, 40, 80}
19
A naive approach for LIS
• Let LIS[i] be the length of a longest increasing
subsequence ending at position i.
LIS[i] = 1 + max j = 0..i-1{L[j] | aj < ai}
(use a dummy a0 = minimum, and L[0]=0)
i = 1; j = 0 to i-1
Index 0 1 2 3 4 5 6 7 8 9
Input 9 2 5 3 7 11 8 10 13 6
Length
Prev
20
A naive approach for LIS
Index 0 1 2 3 4 5 6 7 8 9
Input 9 2 5 3 7 11 8 10 13 6
Length 1 1 2 2 3 4 4 5 6 3
Prev -1 -1 1 1 2 4 4 6 7 2
The longest increasing subsequence 2, 5, 7, 8, 10, 13
Length of LIS = 6
21
Psudocode or LIS
LIS( int arr[], int n )
• for (i = 0; i < n; i++ )
• lis[i] = 1;
•
• /* Compute optimized LIS values in bottom up manner */
• for (i = 1; i < n; i++ )
• for (j = 0; j < i; j++ )
• if ( arr[i] > arr[j] && lis[i] < lis[j] + 1)
• lis[i] = lis[j] + 1;
•
• /* Pick maximum of all LIS values */
• for (i = 0; i < n; i++ )
• if (max < lis[i])
• max = lis[i];
•
• return max;
22