T
Stage 1 (First occurrence of t )
r /\ 0 t(1)
Order: 0,t(1)
* r represents the root * 0 represents the null node * t(1) denotes the occurrence of T with a frequency of 1
2
TE
Stage 2 (First occurrence of e)
r / \ 1 t(1) / \ 0 e(1)
Order: 0,e(1),1,t(1)
TEN
Stage 3 (First occurrence of n ) r / \ 2 t(1) / \ 1 e(1) / \ 0 n(1)
Order: 0,n(1),1,e(1),2,t(1)
: Misfit
4
Reorder: TEN
r
/ \ t(1) 2 / \ 1 e(1) / \ 0 n(1)
Order: 0,n(1),1,e(1),t(1),2
5
TENN
Stage 4 ( Repetition of n ) r / \ t(1) 3 / \ 2 e(1) / \ 0 n(2)
Order: 0,n(2),2,e(1),t(1),3
: Misfit
6
Reorder: TENN
r
/ \ n(2) 2 / \ 1 e(1) / \ 0 t(1)
Order: 0,t(1),1,e(1),n(2),2 t(1),n(2) are swapped
7
TENNE
Stage 5 (Repetition of e ) r / \ n(2) 3 / \ 1 e(2) / \ 0 t(1)
Order: 0,t(1),1,e(2),n(2),3
8
TENNES
Stage 6 (First occurrence of s)
r / \ n(2) 4 / \ 2 e(2) / \ 1 t(1) / \ 0 s(1)
Order: 0,s(1),1,t(1),2,e(2),n(2),4
9
TENNESS
Stage 7 (Repetition of s)
r / \ n(2) 5 / \ 3 e(2) / \ 2 t(1) / \ 0 s(2)
Order: 0,s(2),2,t(1),3,e(2),n(2),5
: Misfit
10
Reorder: TENNESS
r / \ n(2) 5 / \ 3 e(2) / \ 1 s (2) / \ 0 t(1)
Order : 0,t(1),1,s(2),3,e(2),n(2),5 s(2) and t(1) are swapped
11
TENNESSE
Stage 8 (Second repetition of e )
r / \ n(2) 6 / \ 3 e(3) / \ 1 s(2) / \ 0 t(1)
Order : 0,t(1),1,s(2),3,e(3),n(2),6 : Misfit
12
Reorder: TENNESSE
r / \ e(3) 5 / \ 3 n(2) / \ 1 s(2) / \ 0 t(1)
Order : 1,t(1),1,s(2),3,n(2),e(3),5 N(2) and e(3) are swapped
13
TENNESSEE
Stage 9 (Second repetition of e )
r 0/ \1 e(4) 5 0/ \1 3 n(2) 0/ \1 1 s(2) 0/ \1 0 t(1)
Order : 1,t(1),1,s(2),3,n(2),e(4),5
14
ENCODING
The letters can be encoded as follows:
e:0 n : 11 s : 101 t : 1001
15
Average Code Length
Average code length = i=0,n (length*frequency)/ i=0,n frequency = { 1(4) + 2(2) + 3(2) + 1(4) } / (4+2+2+1) = 18 / 9 = 2
16
ENTROPY
Entropy = -i=1,n (pi log2 pi)
= - ( 0.44 * log20.44 + 0.22 * log20.22 + 0.22 * log20.22 + 0.11 * log20.11 ) = - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11) / log2 = 1.8367
17
Ordinary Huffman Coding
TENNESSE ENCODING
E:1 S : 00
9 / \1 5 e(4) 0/ \1 s(2) 3 0/ \1 t(1) n(2)
0
T : 010
N : 011
Average code length = (1*4 + 2*2 + 2*3 + 3*1) / 9 = 1.89
18
SUMMARY The average code length of ordinary Huffman coding seems to be
better than the Dynamic version,in this exercise. But, actually the performance of dynamic coding is better. The problem with static coding is that the tree has to be constructed in the transmitter and sent to the receiver. The tree may change because the frequency distribution of the English letters may change in plain text technical paper, piece of code etc. Since the tree in dynamic coding is constructed on the receiver as well, it need not be sent. Considering this, Dynamic coding is better. Also, the average code length will improve if the transmitted text is bigger.
19