Unit 2 - Source Coding-4
Unit 2 - Source Coding-4
Table 6. One Method of instantaneous code Table 7. The Second Method of instantaneous code
s1 0 s1 00
s2 10 s2 01
s3 110 s3 10
s4 1110 s4 110
s5 1111 s5 111
Kraft inequality
• A necessary and sufficient condition for the existence of an instantaneous code with word
lengths l1 , l2 ,..., lq is that
q
r
i 1
li
1
where r is the number of different symbols in the code alphabet. For the binary case, li must
satisfy the equation
q
1
2 li
i 1
4
• Now, we calculate 2
i 1
li
for each of these codes.
Kraft inequality
• For code a we see that
4
2 li
i 1
2 2
2 2
2 2
2 2
1
• The word lengths of the code are acceptable as the word length of an instantaneous code.
• Kraft inequality does not tell us code a is an instantaneous code. For particular this
example, the Kraft inequality tell that this is an instantaneous code each of length 2.
• For the code B 4
2
i 1
li
21 23 23 23
7
1
8
• Again we see that the word lengths of this code are acceptable as the word length of an
instantaneous code.
Kraft inequality
• For the code C
4
2 li
i 1
2 1
2 2
2 3
2 3
1
• By inspection, we find that the Code c is an instantaneous code.
• Code D also satisfy the Kraft inequality, but by inspection, we find that the fourth code
word is the prefix of the third code word. Thus, it is not an instantaneous code.
• Finally, for the code e, 4
2
i 1
li
21 22 23 22
1
1
8
• For this code, there is no further investigation is required.
Kraft inequality Table 9. Binary Code for the Decimal Digits
1
2 li
i 0
8 2 l 1
1 1
2 4
l 5
• We can not find the instantaneous code if l < 5. The Kraft inequality only tells us that l =
5, but it does not describe that how we can construct this code.
Kraft inequality
• We are going to prove that inequality is sufficient for the existence of an instantaneous
code. The inequality is
q
1
r li
i 1
1
• Assume we are given word length l1 , l2 ,..., lq satisfying the above Equation, and we are
required to construct an instantaneous code with these word lengths. The required world
length may or may not be the distinct.
• Let us define n1 be the number of words in our code of length 1; n2 be the number of
words of length 2; etc. if the largest of li is equal to l , we have
l
n
i 1
i q 2
Kraft inequality
• We may use the n1 to rewrite the Eq. (1). The summation of Eq. (1) contains n1 terms of
the form r 1 , n2 terms of the form r 2 , etc. it may then be written as
l
• Multiplying r l
i 1
n r
i 1
i
3
i
n r
i 1
l i
r l
4
........................................................
n3 r 3 n1r 2 n2 r 5c
n2 r 2 n1r 5d
n1 r 5e
• We are required to form n1 words of length 1. There are r possible such words that we
may form, using an r- symbol code alphabet.
• Since n1 r , we can select these n1code symbols arbitrarily. We are then left with r n1
permissible prefixes of length 1-namely, just those prefixes which are not used as code
words.
Kraft inequality
• By adding one symbol at the end of each of these permissible prefixes, we may form as
many as
r n1 r r 2 n1r
word of length 2. Eq. (5d) assures us that we shall need no more than this number of
words of length 2. As before, we choose our n2 words from among our r 2 n1r choices; we
are then left with
r 2 n1r n2
unused prefixes of length 2, from which we may form
1 2
r 2
n r n r r 3
n1r 2
n2 r
In the same manner we choose n3 words from among our r 3 n1r 2 n2 r choices. We are
left with
r 3
n1r 2 n2 r n3
Unused prefixes of length 3 and so on.
Kraft inequality
• We proceed in the same fashion until we have formed all the words our code. Eqs.5 assure
us at each stage that we have enough prefixes left. Thus, the Eq. (1) is sufficient for the
costruction of an instantaneous code with word length l1 , l2 ,..., lq .
• Consider a zero-memory source S, with symbols s1 , s2 ,..., sq and symbol probabilities P1 , P2 ,..., Pq ,
respectively. Let a block code encode these symbol into a code alphabet of r symbols, and let the length of
the word corresponding to si be li . Then the entropy
q
of this zero memory source is
H s Pi log Pi 2
q
Q
i 1
• Let Q1 , Q2 ,..., Qq be any q numbers such that Qi 0 for all i and i 1. By the inequality we know
that q q i 1
1 1
P log P P log Q
i 1
i
i 1
i 3
i i
Source Coding
i 1
q l j
q q
and obtain H S Pi log r Pi log r
li
i 1 i 1 j 1
q
q l j
log r Pli i log r 6
i 1 j 1
q l j
L log r log r
j 1
Source Coding
• If we add the requirement that our code be instantaneous, the Kraft inequality tells us that the
argument of the second algorithm of the right of above Equation must be less than or equal to 1.
the logarithm is therefore, less than or equal to zero, and
H S L log r 7a
or,
H S
L 7b
log r
H S in Equation 7 is measured in bits. Recall that L is the average number of r-ary symbols we use
to encode S. If, we measure the unit in r-ary units also, then Equation 7b may be written as
Hr S L 7c
For an instantaneous code and a zero memory source, L must be greater than or equal to H r S . This
Equation is constitute a milestone in our study of information theory. With this Equation, we have
started to present the justification for our information measure.
Source Coding
r
l j
1 8
j 1
• Then retracing our steps to Equation 4, we see that a necessary and sufficient condition for the
equality is
r li
Pi Qi q
r li for all i. 9a
r
l j
j 1
or 1
log r li for all i. 9b
Pi
Source Coding
• For an instantaneous code and a zero memory source, L must be greater than or equal to H r S .
Furthermore, L can achieve this lower bound if and only if we can choose the word lengths li equal
to log r 1 Pi for all i. For the equality, therefore, log r 1 Pi must be an integer for each i.
• We see that for equality the symbol probabilities Pi must allbe of the form 1 r i , where i is an
integer. As an additional, we note that if these conditions are met, we have derived the word
lengths of compact code. We simply choose li equal to i .
Example 1. we can calculate the Entropy and Average length of zero-memory source, which source
alphabet and code alphabet are given in table 10.
Table 11
The Entropy of this source is q
1
H s Pi log 2 bits/symbol
i 1 Pi
It is impossible to code the symbols from this source into
an uniquely decodable binary code with average length L
less than 2 binits per symbol. Each symbol of the source
has a probability of 1 4 1 2 2 , and so by Equation 9b,
a compact code must have four words of length 2.
Source Coding
The symbols of the source are encoded with equal length Table 12
of 2 binary digits as given in Table 12.
• We define another zero-memory source with probability
of symbol of the source as given in Table 13.
The entropy of this source is
q
1 3
H s Pi log 1 bits/symbol
i 1 Pi 4
Table 13
All symbol probabilities of this source are of the form 1 r 1 2
i i
i
P
i 1
r li
i 1
q 4
1 r li
i 1
• Equation 2, therefore, defines an acceptable set of li for an instantaneous code. If we multiply
Equation 2 by Pi and sum over all I, we find
q
1 q
q 1 q
Pi log r Pli i Pi log r Pi
i 1 Pi i 1 i 1 Pi i 1 5
Hr S L Hr S 1
Source Coding
Source Coding
• Coding without extension.
• We use Equation 2 to construct a binary instantaneous code for the zero memory source
defined in Table 16. Assume that we wish to encode this source directly without going to the
second or other higher extension. What is the smallest average length we can achieve without
extension?
• We first calculate the word length using Equation 2 as shown in the third column of Table 16.
The Equation 2 is 1 1
log r li log r 1
Pi Pi
• We have listed li in the fourth column. Code a shown in column 5 is then an acceptable
instantaneous code with these word lengths. The average length of code a is
q
2 2 1
La Pli i 1 3 4 1.78 binits/source symbol
i 1 3 9 9
• The entropy of the source is
3
1
H S Pi log 2 1.22 bits/source symbol
i 1 Pi
Source Coding
Example 3. In Figure 1, we start with a source of six symbols and successively reduce this
source to a source of two symbols.
• Example 4. We illustrate the synthesis of a binary compact code for the source of Figure
1 in Figure 2.
• If we complement the j th digit of every word of the code and obtain another compact code.
• If we complement the first and last digits of the code obtained in Figure 2, we have the new
compact code.
• This method producing a new compact code.
• Example 5. we construct a different code for the source of example 4 in figure 3.
• In Figure 2 and 3, we obtain the two compact codes. In the first compact code, the word lengths are
1, 2, 3, 4, 5, 5. In the second compact code, the word lengths are 1, 2, 4, 4, 4, 4. The average length
of these two compact codes are obtained identical.
q
La Pli i 0.4 1 0.3 2 0.1 3 0.1 4 0.06 5 0.04 5 2.2 binits/source symbol
i 1
q
La Pli i 0.4 1 0.3 2 0.1 4 0.1 4 0.06 4 0.04 4 2.2 binits/source symbol
i 1
• We cannot construct an instantaneous code for this source with a smaller average length.
Source Coding
• The average length of this code is 27/16 binits. The entropy is 2H(s). So
H 2 S 2 0.81116
2 0.961
L2 27
Source Coding
• Coding the third and fourth extensions of S, we find efficiencies
3 0.985
and
4 0.991
As we encode higher and higher extensions of the original source S, the efficiency must approach 1.
Source Coding
Example 9. We are going to examine the efficiency with respect to r-ary form. We consider a zero-
memory source S with 13 symbols and symbol probabilities as shown in Table 17. In the same table,
we list the compact (Huffman) code for code alphabets comprised of 2 to 13 symbols.
• Example: The given task is to construct Shannon codes for the given set of symbols
using the Shannon-Fano lossless compression technique.
Notes. Figures and contents of Shannon-Fano coding are taken from the website:
https://www.geeksforgeeks.org/shannon-fano-algorithm-for-data-compression/
Source Coding
• Symbols are arranged in two parts in such a way that addition of probabilities of symbols in each
group are almost equal. We divide as {B, D} and {A, C, E} and assign them the values 0 and 1
respectively.
2. Now in {B, D} group P(D)=0.30, P(B)=0.28, which means that P(D) almost equal to P(B). So
divide D and B in two groups and assign 0 to D and 1 to B, respectively.
Notes. Figures and contents of Shannon-Fano coding are taken from the website:
https://www.geeksforgeeks.org/shannon-fano-algorithm-for-data-compression/
Source Coding
3. In {A, C, E} group.
P(A)=0.22, P(C)+P(E)=0.20
So the group is divided into {A} and {C, E}, and they are assigned values 0 and 1 respectively.
4. In {C, E} group
P(C)=0.15 and P(E)=0.05
So this group is divided into {C} and {E}, and assigned 0 to {C} and 1 to { E}, respectively.
Notes. Figures and contents of Shannon-Fano coding are taken from the website:
https://www.geeksforgeeks.org/shannon-fano-algorithm-for-data-compression/
Source Coding
Symbol A B C D E
Probability 0.22 0.28 0.15 0.30 0.05
Shannon Code 10 01 110 00 111
C E
0.15 0.05
Source Coding
• Code can designed by using another code tree, i.e., code tree
given by Fano in 1961. For example, consider the binary code
and its corresponding tree.
• How many different codes exist for a source with q symbols can
be phrased in term of a code trees.
• For q=2, there is only one possible tree corresponding to word Figure. 1
lengths.
• For q=3, there is again only one possible tree corresponding to
word lengths.
Figure. 2
Figure. 3
Figure. 4
Source Coding
Figure. 5
Class Test
Answer the following questions