Huffman Tree and Coding
Thursday, December 1, 2022 10:58 PM
Huffman Codes
• (i) Data can be encoded efficiently using Huffman Codes.
• (ii) It is a widely used and beneficial technique for compressing data.
• (iii) Huffman's greedy algorithm uses a table of the frequencies of occurrences of each character
to build up an optimal way of representing each character as a binary string.
Suppose we have 105 characters in a data file. Normal Storage: 8 bits per character (ASCII) - 8 x 105
bits in a file. But we want to compress the file and save it compactly. Suppose only six characters
appear in the file:
How can we represent the data in a Compact way?
(i) Fixed length Code: Each letter represented by an equal number of bits. With a fixed length code, at
least 3 bits per character:
For example:
a 000
b 001
c 010
d 011
e 100
f 101
For a file with 10 characters, we need 3 x 105 bits.
5
(ii) A variable-length code: It can do considerably better than a fixed-length code, by giving many
characters short code words and infrequent character long codewords.
For example:
a 0
b 101
c 100
d 111
e 1101
f 1100
Number of bits = (45 x 1 + 13 x 3 + 12 x 3 + 16 x 3 + 9 x 4 + 5 x 4) x 1000 = 224000 = 2.24 x 105bits
Thus, 224,000 bits to represent the file, a saving of approximately 25%.This is an optimal character
code for this file.
Prefix Codes:
• The prefixes of an encoding of one character must not be equal to complete encoding of another
character, e.g., 1100 and 11001 are not valid codes because 1100 is a prefix of some other code
word is called prefix codes.
• Prefix codes are desirable because they clarify encoding and decoding. Encoding is always simple
for any binary character code; we concatenate the code words describing each character of the
file. Decoding is also quite comfortable with a prefix code. Since no codeword is a prefix of any
other, the codeword that starts with an encoded data is unambiguous.
Greedy Algorithm for constructing a Huffman Code:
Huffman invented a greedy algorithm that creates an optimal prefix code called a Huffman Code.
Greedy Technique Page 1
• The algorithm builds the tree T analogous to the optimal code in a bottom-up manner. It starts
with a set of |C| leaves (C is the number of characters) and performs |C| - 1 'merging' operations to
create the final tree. In the Huffman algorithm 'n' denotes the quantity of a set of characters, z
indicates the parent node, and x & y are the left & right child of z respectively.
Huffman (C)
1. n=|C|
2. Q ← C
3. for i=1 to n-1
4. do
5. z= allocate-Node ()
6. x= left[z]=Extract-Min(Q)
7. y= right[z] =Extract-Min(Q)
8. f [z]=f[x]+f[y]
9. Insert (Q, z)
10. return Extract-Min (Q)
There are mainly two major parts in Huffman Coding
1. Build a Huffman Tree from input characters.
2. Traverse the Huffman Tree and assign codes to characters.
Steps to build Huffman Tree
Input is an array of unique characters along with their frequency of occurrences and output is Huffman
Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is
used as a priority queue. The value of frequency field is used to compare two nodes in min heap.
Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make
the first extracted node as its left child and the other extracted node as its right child. Add this
node to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node
and the tree is complete.
Greedy Technique Page 2