Ch3:source Code
Ch3:source Code
Source Coding
3.1 Introduction
An important problem in digital communications and computer storage is the effi-
cient transmission and storage of information. Furthermore transmission and storage
of digital data require the information to be represented in a digital or binary form.
Thus there is a need to perform source coding, the process of encoding the informa-
tion source message to a binary code word, and lossless and faithful decoding from
the binary code word to the original information source message. The goal of source
coding for digital systems is two-fold:
1. Uniquely map any arbitrary source message to a binary code and back again
2. Efficiently map a source message to the most compact binary code and back
again
This is shown in Figure 3.1 for a source transmitting arbitrary text through a binary
(digital) communication channel. The source encoder efficiently maps the text to
a binary representation and the source decoder performs the reverse operation. It
should be noted that the channel is assumed noiseless. In the presence of noise,
channel codes are also needed and these are introduced in Chapter 5.
FIGURE 3.1
Noiseless communication system.
We first present the notation and terminology for the general source coding problem
and establish the fundamental results and properties of codes for their practical use
105
106 Fundamentals of Information Theory and Coding Design
and implementation. We then show how entropy is the benchmark used to determine
the efficiency of a particular coding strategy and then proceed to detail the algorithms
of several important binary coding strategies.
Consider a mapping from the source alphabet, , of size to the code alphabet, ,
of size and define the source coding process as follows:
EXAMPLE 3.1
Consider the source alphabet,
½ , and binary code, .
The following are three possible binary source codes.
Source Code A Code B Code C
0 0 00
11 11 01
00 00 10
11 010 11
We note that:
A code that is uniquely decodable but not instantaneous requires the past and current
code symbols to be buffered at the receiver in order to uniquely decode the source
message. An instantaneous code allows the receiver to decode the current code word
to the correct source message immediately upon receiving the last code symbol of
that code word (e.g., the decoding is performed “on the fly”).
EXAMPLE 3.2
Consider the following three binary codes for the source ½ :
Source Code A Code B Code C
0 0 0
10 01 01
110 011 011
1110 0111 111
The bars under and over each code symbol highlight each of the encoded source
symbols. It is apparent that the “0” code symbol acts as a code word terminator or
separator. Hence the moment a “0” is received it represents the last code symbol
of the current code word and the receiver can immediately decode the word to the
correct source symbol; thus code A is instantaneous.
Now consider the following encoded string from code B:
Here too the “0” acts as a code word separator but, unlike code A, the “0” now
represents the first code symbol of the next code word. This means the current code
word cannot be decoded until the first symbol of the next code word, the symbol “0,”
is read. For example assume we have received the string . We cannot decode this
to the symbol until we see the “0” of the next code word since if the next character
we read is in fact a “1” then that means the current code word () is , not .
Although we can verify that the code is uniquely decodable (since the “0” acts as a
separator) the code is not instantaneous.
Source Coding 109
Surprisingly we cannot yet decode this sequence! In fact until we receive a “0”
or an EOF (end-of-file or code string termination) the sequence cannot be decoded
since we do not know if the first code word is 0, 01 or 011. Furthermore once we
are in a position to decode the sequence the decoding process is itself not at all as
straightforward as was the case for code A and code B. Nevertheless this code is
uniquely decodable since the “0” still acts as a separator.
From Example 3.2 one can see that instantaneous codes are so named because de-
coding is a very fast and easy process (i.e., instantaneous!). However the practicality
of instantaneous codes also lie in the prefix condition which allows such codes to be
efficiently analysed and designed.
Since the prefix condition is both necessary and sufficient the instantaneous codes
are also referred to as prefix codes since such codes obey the above prefix condition.
EXAMPLE 3.3
Consider the codes from Example 3.2. We can now immediately state whether the
codes are instantaneous or not:
Code A is instantaneous since no code word is a prefix of any other code word (code
A obeys the prefix condition). Since it is instantaneous it is also uniquely
decodable.
Code B is not instantaneous since ½ is a prefix of ¾ , ¾ is a prefix
of ¿ , etc. However it is uniquely decodable since the “0” acts as a
separator.
110 Fundamentals of Information Theory and Coding Design
Figure 3.2 graphically depicts the universe of all codes and how the different classes
of codes we have described are subsets of one another. That is, the class of all
instantaneous codes is a subset of the class of all uniquely decodable codes which in
turn is a sub-class of all non-singular codes.
All Codes
Uniquely Decodable
Codes
Non−singular Codes
Instantaneous (Prefix)
Codes
FIGURE 3.2
Classes of codes.
The prefix condition not only makes it easy to determine whether a given code is
instantaneous or not but it can also be used to systematically design an instantaneous
code with specified lengths for the individual code words. The problem can be stated
as follows. For a source with symbols it is required to design the individual
code words with specified code word lengths of ½ ¾ such that the code is
instantaneous (i.e., it satisfies the prefix condition). To design the code the code
Source Coding 111
word lengths are sorted in order of increasing length. The code words are derived
in sequence such that at each step the current code word does not contain any of the
other code words as a prefix. A systematic way to do this is to enumerate or count
through the code alphabet.
EXAMPLE 3.4
For the next two code words of length 3, we count to 11, form 110, and then start
counting from the right most symbol to produce the complete code:
112 Fundamentals of Information Theory and Coding Design
Since an instantaneous code has the property that the source symbol or message can
be immediately decoded upon reception of the last code character in the current code
word the decoding process can be fully described by a decoding tree or state machine
which can be easily implemented in logic.
EXAMPLE 3.5
Figure 3.3 is the decoding tree that corresponds to the binary code:
Source Code
½ 00
01
10
110
111
0
Read 1
0
Start
Read
1
0
Read
1 0
Read
1
FIGURE 3.3
Decoding tree for an instantaneous code.
Source Coding 113
The receiver simply jumps to the next node of the tree in response to the current
code character and when the leaf node of the tree is reached the receiver produces
the corresponding source symbol, , and immediately returns to the root of the tree.
Thus there is no need for the receiver to buffer any of the received code characters in
order to uniquely decode the sequence.
Property 3 Decoding based on a decoding tree is fast and requires no memory stor-
age.
Property 4 Instantaneous codes are uniquely decodable codes and where the length
of a code word is the main consideration in the design and selection of codes
there is no advantage in ever considering the general class of uniquely decod-
able codes which are not instantaneous.
The instantaneous codes we have been discussing are variable-length codes since the
code words can be of any length. Variable-length codes should be contrasted with
block codes of length that restrict code words to be of the same length . As we
will see variable-length codes provide greater flexibility for sources to be encoded
more efficiently. However a serious drawback to the use of variable-length codes is
their sensitivity to bit or code symbol errors in the code sequence. If a single bit
error causes the decoder to interpret a shorter or longer code word than is the case
then this will create a synchronisation error between the first bit of the code word
generated by the encoder and the root of the decoding tree. Subsequent code words
will be incorrectly decoded, including the possibility of insertion errors, until the
synchronisation is re-established (which may take a very long time). With block
codes a single bit error will only affect the current block code and the effect will not
propagate to subsequent code words. In fact block codes only suffer this problem
initially when the transmission is established and the decoder has to “lock” onto the
first bit of the code and where there are unmanaged timing discrepancies or clock
skew between the transmitter and receiver.
114 Fundamentals of Information Theory and Coding Design
EXAMPLE 3.6
Consider the following variable-length and block instantaneous binary codes for the
source ½ .
Source Variable-length code Block code
0 00
10 01
110 10
111 11
Assume the code sequence is now transmitted through an information channel which
introduces a bit error in the 2nd bit. The source decoder sees:
and generates the decoded message sequence:
The single bit error in the encoded sequence produces both a substitution error (2nd
symbol is substituted by ) and insertion error ( is inserted after and subse-
quent source symbols appear as ) in the decoded message sequence and 7 rather
than 6 source symbols are produced.
Now assume there is a bit error in the 6th bit. The source decoder sees:
and generates the decoded message sequence:
The single bit error now causes two isolated substitution errors ( 3rd symbol sub-
stituted by and 5th symbol substituted by ). Now consider the corresponding
block code sequence:
Source Coding 115
From Example 3.6 it is apparent that variable-length codes are very sensitive to bit
errors, with errors propagating to subsequent code words and symbols being inserted
as well as being substituted. If the main goal of the source coder is to map the
source symbols to the code alphabet then a block coding scheme should be used. For
example, the ASCII code [1] is a 7-bit (or 8-bit in the case of extended ASCII) block
code mapping letters of the English alphabet and keyboard characters to a binary
block code of length 7 or 8. Furthermore, a channel coder (see Chapter 5) with an
appropriate selection of an error-correcting or error-detecting code (see Chapters 6
to 9) is mandatory to ensure the final code sequence is close to error-free and must
always be present when using variable-length codes.
The loss of synchronisation arising from code symbol errors with variable-length
codes is a specific example of the more general problem of word synchronisation
between the source and receiver which is discussed by Golomb et al. [5].
If the individual code word lengths are specified there is no guarantee that an in-
stantaneous code can be designed with those lengths (see Example 3.4). The Kraft
Inequality theorem [8] provides a limitation on the code word lengths for the design
of instantaneous codes. Although not of real use in practice (the coding strategies we
will later discuss will guarantee codes that are instantaneous) the Kraft Inequality is
a precursor to the more important McMillan’s Theorem [10] which states that where
code word lengths are the only consideration, an instantaneous code will always be
as good as any uniquely decodable code which is not necessarily instantaneous.
116 Fundamentals of Information Theory and Coding Design
Conversely, given a set of code word lengths that satisfy this inequality, then there
exists an instantaneous code with these word lengths.
The proof of the Kraft Inequality is interesting in that it is based on a formal de-
scription of how instantaneous codes are constructed (see Section 3.2.1) given the
code word lengths, where the Kraft Inequality needs to be satisfied for the code to be
successfully constructed.
PROOF Let be the th source message or symbol and the corresponding
code word of length . The proof requires the code word lengths to be arranged in
ascending order, that is, we assume that the code word lengths are arranged such that
. The number of possible code words for is . To ensure
the code is instantaneous we need to consider the number of permissible code words
for such that the prefix condition is satisfied.
Consider the shortest code word, Then the number of permissible code words
for is simply ½ . Next consider code word The number of possible
with as a prefix is given by the expression ¾ ½ (since with as the
prefix the first symbols are fixed and one can choose the remaining symbols
arbitrarily). To ensure the prefix condition is satisfied the number of permissible code
words for is the number of possible code words, ¾ less those code words
which have as a prefix, ¾ ½ . That is, the number of permissible code words
for is ¾ ¾ ½ . Similarly for the number of permissible code words
is the number of possible code words, ¿ , less those code words which have
as a prefix, ¿ ¾ and less those code words which have as a prefix, ¿ ½
that is ¿ ¿ ¾ ¿ ½ . For code word, , the expression for the number of
permissible code words is:
½ ¾ ½
(3.2)
To be able to construct the code we want to ensure that there is at least one permissible
code word for all source messages, That is we require the following
inequalities to be simultaneously satisfied:
½
¾
¾ ½
Source Coding 117
..
.
½
½ (3.3)
¾ ½
..
.
½ ½ (3.4)
We note that if the last inequality holds then all the preceding inequalities will also
hold. Thus the following inequality expression must hold to ensure we can design an
instantaneous code:
½ ½
(3.5)
which is the Kraft Inequality.
The Kraft Inequality of Equation 3.1 only indicates whether an instantaneous code
can be designed from the given code word lengths. It does not provide any indica-
tion of what the actual code is, nor whether a code we have designed which satisfies
Equation 3.1 is instantaneous, but it does tell us that an instantaneous code with the
given code word lengths can be found. Only by checking the prefix condition of the
given code can we determine whether the code is instantaneous. However if a code
we have designed does not satisfy Equation 3.1 then we know that the code is not
instantaneous and that it will not satisfy the prefix condition. Furthermore, we will
not be able to find a code with the given code word lengths that is instantaneous. A
less apparent property of the Kraft Inequality is that the minimum code word lengths
for the given alphabet size and number of code words that can be used for designing
an instantaneous code is provided by making as close as possible to 1. Obvi-
ously shorter overall code word lengths intuitively yield more efficient codes. This
is examined in the next section.
EXAMPLE 3.7
Consider the following binary ( ) codes
Source Code A Code B Code C
0 0 0
100 100 10
110 110 110
111 11 11
118 Fundamentals of Information Theory and Coding Design
Code A satisfies the prefix condition and is hence instantaneous. Calculating the
Kraft Inequality yields:
which implies that an instantaneous code is possible with the given code word lengths.
Thus a different code can be derived with the same code word lengths as code B which
does satisfy the prefix condition. One example of an instantaneous code with the same
code word lengths is:
Code C does not satisfy the prefix condition since is a prefix of ; hence the code
is not instantaneous. Calculating the Kraft Inequality yields:
As we will discuss in the next section, the use of shorter code word lengths creates
more efficient codes. Since the class of uniquely decodable codes is larger than the
class of instantaneous codes, one would expect greater efficiencies to be achieved
considering the class of all uniquely decodable codes rather than the more restric-
tive class of instantaneous codes. However, instantaneous codes are preferred over
uniquely decodable codes given that instantaneous codes are easier to analyse, sys-
tematic to design and can be decoded using a decoding tree (state machine) structure.
McMillan’s Theorem assures us that we do not lose out if we only consider the class
of instantaneous codes.
Source Coding 119
½ ¾
(3.7)
When written out, the quantity will consist of the
terms arising from the th
extension of the code, each of the form:
where we define .
Then is the length, , of the th code word in the th extension of the code and
is the length of the sequence of code words in the th extension of the code. Let
words. The minimum code word length is, of course, a length of 1. Then can
assume any value from to . Let be the number of terms of the form .
Then:
(3.9)
Thus represents the number of code word sequences in the th extension of the
code with a length of . If the code is uniquely decodable then the th extension of
the code must be non-singular. That is, must be no greater than , the number
of distinct sequences of length . Thus for any value of we must have:
(3.10)
120 Fundamentals of Information Theory and Coding Design
or:
(3.11)
For and we have that
.
Since the above inequality has to hold for all values of , then this will be true if:
(3.12)
EXAMPLE 3.8
It is required to design a uniquely decodable ternary ( ) code with code word
lengths 1, 1, 2, 2, 3, 3. Since a uniquely decodable code satisfies the Kraft Inequality
by McMillan’s Theorem we check whether this is the case. Calculating the Kraft
Inequality:
shows that a uniquely decodable code with these lengths can be found.
BUT the same Kraft Inequality is satisfied for instantaneous codes and since instan-
taneous codes can be systematically constructed following the procedure described
in Section 3.2.1 the following instantaneous code, which is uniquely decodable, is
designed:
Source Coding 121
When considering a collection of possible -ary codes for the same source a choice
needs to be made between the different codes by comparing the performance based
on a criteria of interest. For storage and communication purposes the main crite-
rion of interest is the average length of a code, where codes with smaller average
length are preferred. To calculate the average length the source symbol (or message)
probabilities are needed.
(3.13)
Consider all possible -ary codes for the same source, . The number of source
symbols, , and source probabilities, , are constant, but the code
word lengths vary with each code. The best code or compact
code will be the one with the smallest average length.
EXAMPLE 3.9
Consider the following two binary codes for the same source. Which code is better?
Source Code A Code B
0.5 00 1
0.1 01 000
0.2 10 001
0.2 11 01
The average length of code A is obviously bits per symbol. The average
length of code B is
bits per symbol.
122 Fundamentals of Information Theory and Coding Design
Code B is better than code A since . But is there another code, call it code
C, which has ? Or is code B the compact code for this source? And how
small can the average length get?
The fundamental problem when coding information sources and the goal of source
coding for data compression is to find compact codes. This obviously requires the
individual code word lengths to be made as small as possible, as long as we still
end up with a uniquely decodable (i.e., instantaneous) code. Intuitively the average
length will be reduced when the shorter length code words are assigned to the most
probable symbols and the longer code words to the least probable symbols. This
concept is shown by Example 3.9 where code B assigns the shortest code word (of
length 1) to the most probable symbol (with probability 0.5). Formally, the problem
of searching for compact codes is a problem in constrained optimisation. We state
the problem as follows.
From Chapter 1 the information content of a source is given by its entropy. When
using logarithms to base 2 the entropy is measured in units of “(information) bits per
source symbol.” Similarly when using binary source codes the average length is also
measured in units of “(code) bits per source symbol.” Since the entropy provides a
measure of the intrinsic information content of a source it is perhaps not surprising
that, for there to be no losses in the coding process, the average length must be at least
the value of the entropy (no loss of information in the coding representation) and
usually more to compensate for the inefficiencies arising from the coding process.
The following theorem establishes this lower bound on the average length of any
possible coding of the source based on the entropy of the source.
Source Coding 123
THEOREM 3.3
Every instantaneous -ary code of the source, ½ , will have an
average length, , which is at least the entropy, , of the source, that is:
(3.16)
PROOF Consider the difference between the entropy and the average
length and simplify the expression as follows:
(3.17)
(3.18)
code is instantaneous the code word lengths will obey the Kraft Inequality
Since the
so that and hence:
The definition of entropy in Chapter 1 now makes practical sense. The entropy is
a measure of the intrinsic amount of average information (in -ary units) and the
124 Fundamentals of Information Theory and Coding Design
average length of the code must be at least equal to the entropy of the code to ensure
no loss in coding (lossless coding). Thus the smallest average length, , for any code
we design will be . However there is no guarantee that a compact code
for a particular source and code alphabet can be found. Indeed, unless for
we will always have that , but the closer is to
then the better or more efficient the code.
DEFINITION 3.9 Code Efficiency The efficiency of the code is given by:
(3.19)
EXAMPLE 3.10
The entropy for the source in Example 3.9 is:
Ideally compact codes should exhibit efficiency. This requires that
, and from the condition for equality, , it implies
for . The problem lies in the fact that the have to be integer values.
If this is the case then we have a special source.
EXAMPLE 3.11
Consider the following 4-symbol source:
Source Coding 125
Source A
½ 0.125
0.25
0.5
0.125
We note that the symbol probabilities are of the form with , ,
, and . Thus a 100% efficient compact binary code can be designed with
code word lengths of 3, 2, 1, 3 with . For example:
Source A Code A
110
10
0
111
1/9
1/9
1/3
1/27
1/27
1/9
1/9
1/27
1/9
We note that the symbol probabilities are of the form
with
for . Thus a 100% efficient ternary code can be designed
with code word lengths of 2,2,1,3,3,2,2,3,2 with .
Equation 3.16 states that the average length of any instantaneous code is bounded
below by the entropy of the source. In this section we show that the average length,
, of a compact code is also bounded above by the entropy plus 1 unit of information
(1 bit in the case of a binary code). The lower bound was important for establishing
how the efficiency of a source code is measured. Of practical importance is the exis-
tence of both a lower and upper bound which, as we will see, leads to the important
126 Fundamentals of Information Theory and Coding Design
Shannon’s Noiseless Coding Theorem [11]. The theorem states that the coding effi-
ciency (which will always be less than 100% for compact codes that are not special)
can be improved by coding the extensions of the source (message blocks generated
by the source) rather than just the source symbols themselves.
THEOREM 3.4
Let be the average length of a compact -ary code for the source . Then:
(3.20)
The proof of the theorem as presented makes mention of a possible coding strategy
that one may adopt in an attempt to systematically design compact codes. Although
such a strategy, in fact, designs codes which are not compact it establishes the theo-
rem which can then be extended to the case of compact codes.
(3.21)
We can justify that this is a reasonable coding scheme by noting from Theorem 3.3
that a 100% efficient code with is possible if we select
and where does not equal an integer we “round up” to provide integer length
assignments, yielding the coding scheme just proposed. Multiplying Equation 3.21
by and summing over gives:
(3.22)
which yields, using the definitions for entropy and average length:
(3.23)
where is the average length using this sub-optimal coding scheme. Consider
a compact code for the same source with average length . Then by definition
and from Theorem 3.3 we have . Thus we must also have:
(3.24)
Source Coding 127
NOTE One coding scheme resulting in code word lengths given by Equation 3.21
is the Shannon code [9]. Due to the sub-optimal nature of this coding scheme it will
not be elaborated further upon.
Equation 3.20 indicates the average length of a compact code will be no more than 1
unit away from the average length of a 100% efficient code. However we now show
that by taking extensions using Equation 3.20 we can improve the efficiency of the
code.
(3.27)
NOTE The term is the average length of the code words per symbol when
coding the th extension which should not be confused with which is the average
length of the code words per symbol when coding the source . However can
be used as the average length of a coding scheme for the source (based on coding
the th extension of the source).
From Equation 3.27 the average length of a compact code for based on coding
the th extension of is no more than ½ units away from the average length of a
100% efficient code and this overhead decreases with larger extensions. Intuitively
we expect that coding the th extension of the source will provide more efficient
codes, approaching 100% efficiency as . We formally state these results in
the following theorem.
128 Fundamentals of Information Theory and Coding Design
Now let be the average length of a compact code for the th extension of , then:
and thus:
that is, in the limit the code becomes 100% efficient.
NOTE Since we can’t get something for nothing there is a price to be paid in
improving coding efficiency by taking extensions. The price is the increased cost of
encoding and decoding a code with code words, which represents an exponential
increase in complexity.
EXAMPLE 3.12
Taking extensions to improve efficiency is apparent when the source and code alpha-
bets are the same. Consider the case of a binary coding scheme for a binary source.
Without extensions we would have the trivial result:
Source Compact
Code
½ 0.8 0
¾ 0.2 1
The compact code for the second extension is as shown in the above table. The
average length of this code is ¾
Source Coding 129
bits per symbol pair and ¾¾ bits per symbol yielding a compact code for the
´ µ
efficient, a significant improvement
¾
second extension which is ¾
over 72% efficient.
The statement of Shannon’s Noiseless Coding Theorem was proven for zero-memory
sources. We now show that a similar statement applies to the general case of Markov
sources. Consider an th order Markov source, , and the corresponding entropy,
, based on the th order Markov model of the source. Define as the
entropy of the equivalent zero-memory model of the source (where is the adjoint
source). Since Equation 3.20 applies to zero-memory sources it also applies to the
adjoint source, hence:
(3.28)
and also:
(3.29)
where is the adjoint of the th extension of the source, (not to be confused
with which is the th extension of the adjoint source, ).
We know from our results from Chapter 1 that , or equivalently
that (for ). We also have that .
Substituting these into Equation 3.29 gives:
and
(3.31)
(3.32)
That is, the lower bound on is the entropy of the th order Markov model .
Consider a sequence of symbols from an unknown source that we are coding. The
entropy of the source depends on the model we use for the source. From Chapter 1 we
had that, in general, ½ where is used to indicate a Markov
source of order . Thus higher order models generally yield lower values for the
entropy. And the lower the entropy is, then from Equation 3.32, the smaller the
130 Fundamentals of Information Theory and Coding Design
average length we expect as we take extensions to the source. This realisation does
not affect how we design the compact codes for extensions to the source, but affects
how we calculate the corresponding th extension symbol probabilities (which are
based on the underlying model of the source) and the resulting code efficiency.
RESULT 3.1
When coding the th extension of a source, the probabilities should be derived based
on the “best” or “true” model of the source, and the efficiency of the code should
be based on the same model of the source.
EXAMPLE 3.13
01010010110101010101
If we assume a zero-memory model of the source then we can see that ,
and thus . This implies that the original binary source is 100%
efficient.
However the “true” model is the first order Markov model from Figure 3.4.
0.9
0.1 0 1 0.1
0.9
FIGURE 3.4
True model of source.
The “true” entropy is calculated to be and the original binary source
is, in fact, only efficient! To improve the efficiency we find a binary code for
the second extension. Noting that this gives the second
extension symbol probabilities:
Source Coding 131
Source Compact
Code
½ ½ 0.05 110
½ ¾ 0.45 0
¾ ½ 0.45 10
¾ ¾ 0.05 111
The compact code is shown in the above table. The average length of this code is
¾ bits per symbol pair or ¾¾ bits per symbol which improves the
coding efficiency to ¾´¾µ .
Consider the communication system shown in Figure 3.5 for the case of a noiseless
channel.
q−ary
source
message
code
entropy
H(X) Noiseless
Source Encoder r input Decoder Receiver
source Channel
entropy
H(S)
r−ary
code
message
FIGURE 3.5
Noiseless communication system.
The source entropy is interpreted as the average number of bits transmitted per
source symbol. Each source symbol is encoded to a code word of code symbols on
average. Thus the ratio ´ µ represents the average number of bits transmitted per
code symbol. By definition this gives the entropy of the encoded sequence or code
entropy, . That is:
(3.33)
132 Fundamentals of Information Theory and Coding Design
Since ¾
and
then we have:
(3.34)
This result has important implications when we consider the inclusion of a channel
code with the source code. As we will see in Chapter 5 one of the key assumptions
is that the (binary) inputs to the channel coder are equally likely, and with efficient
source coding this will definitely be the case.
EXAMPLE 3.14
Consider the binary source sequence:
0001100100
0 10 110 10 0
Source Coding 133
EXAMPLE 3.15
Consider the design of a binary code using the Fano coding scheme for a 8-symbol
source with the following ordered probability assignments:
134 Fundamentals of Information Theory and Coding Design
The regrouping and code symbol assignment in each step of the Fano coding scheme
are shown in Figure 3.6. The dashed lines represent the splitting of each set of symbols
into two equiprobable groups. For each division a 0 is appended to each symbol in
one group and a 1 is appended to each symbol in the other group. The number in
front of each dashed line indicates the level of grouping. Thus the level 1 grouping
shown in Figure 3.6(a) splits the original source into two groups with one group of
probability 1/2 containing ½ being assigned a 0 and the other group of probability
1/2 containing being assigned a 1. Each of these groups is
split into the level 2 groups as shown in Figure 3.6(b). This continues for each level
group until the level 5 grouping of Figure 3.6(e) when there are no more groups left
to split is reached. Figure 3.6(e) is the binary Fano code for this source. The average
(d) (e)
FIGURE 3.6
Binary Fano coding.
length of the Fano code is bits and since the Fano code
provides the compact code (with efficiency of 100%). This result can also be verified
by noting that the source probabilities are such that this is a special source for binary
codes.
Source Coding 135
In Example 3.15 the symbols are successively divided into equiprobable groups of
1/2, 1/4, 1/8, 1/16 and 1/32. In the majority of cases equal groupings are not possible,
so closest to equal groupings are used and since there may be many such “quasi”
equiprobable groups different Fano codes will result, and not all may yield compact
codes.
EXAMPLE 3.16
Consider the design of a binary code using the Fano coding scheme for the 5-symbol
source with the following probability assignment:
½
There are two possible Fano codes depending upon how we perform the level 1
grouping. One grouping creates a group of probability 4/9 containing and the
other group of probability of 5/9 containing as shown by Figure
3.7(a). The alternative grouping creates a group of probability 5/9 containing
and the other group of probability 4/9 containing as shown by Figure
3.7(b). Both groupings are equally valid. The average lengths of the two Fano codes
1
4/9 0 2
4/9 0 0
1/9 1 0 0 0 1/9 0 1
4 1
1/9 1 0 0 1 1/9 1 0 0
3 3
1/9 1 0 1 1/9 1 0 1
2 2
1/9 1 1 0 1/9 1 1 0
3 3
1/9 1 1 1 1/9 1 1 1
(a) (b)
FIGURE 3.7
Different binary Fano codes.
shown in Figure 3.7 are and with code (b) being the
compact code. Thus a sub-optimal code may result depending on how the symbols
are grouped.
Example 3.16 demonstrates that Fano codes are not necessarily compact and differ-
ent Fano codes may yield different average lengths. Thus Fano codes are of limited
use since such codes cannot be guaranteed to be compact.
136 Fundamentals of Information Theory and Coding Design
RESULT 3.3
Huffman codes are compact codes. That is, the Huffman algorithm produces a code
with an average length, , which is the smallest possible to achieve for the given
number of source symbols, code alphabet and source statistics.
We now proceed to outline the basic principles of the Huffman algorithm and then
detail the steps and provide examples for specific cases.
The Huffman algorithm operates by first successively reducing a source with sym-
bols to a source with symbols, where is the size of the code alphabet.
It should be noted that we will only be able to reduce a source to exactly symbols if
the original source has symbols where is a non-negative integer.
For a binary code this will hold for any value of . For non-binary codes
if is not an integer value then “dummy” symbols with zero probability
are appended to create a source with symbols, where is the
smallest integer greater than or equal to .
The trivial -ary compact code for the reduced source with symbols is then used
to design the compact code for the preceding reduced source as described by the
following result.
Source Coding 137
RESULT 3.4
Assume that we have a compact code for the reduced source . Designate the last
symbols from ½ , , as the symbols which were combined
to form the combined symbol, of . We assign to each symbol of ,
except the last symbols, the code word used by the corresponding symbol of .
The code words for the last symbols of are formed by appending
to the code word of of to form new code words.
The Huffman algorithm then operates by back-tracking through the sequence of re-
duced sources, , designing the compact code for each source, until the
compact code for the original source is designed.
For the design of binary Huffman codes the Huffman coding algorithm is as follows:
The operation of the binary Huffman coding algorithm is best shown by the following
example.
EXAMPLE 3.17
Consider a 5 symbol source with the following probability assignments:
by the combination of the last two symbols from ½ . Starting with the trivial
compact code of for and working back to a compact code is designed for
each reduced source following the procedure described by Result 3.4 and shown
in Figure 3.8. In each the code word for the last two symbols is produced by taking
the code word of the symbol pointed to by the arrow-head and appending a 0 and 1
to form two new code words. The Huffman coding algorithm of Figure 3.8 can be
depicted graphically as the binary Huffman coding tree of Figure 3.9. The Huffman
coding tree is of similar form to the decoding tree for instantaneous codes shown in
Figure 3.3 and can thus be used for decoding Huffman codes. The Huffman code
itself is the bit sequence generated by the path from the root to the corresponding leaf
node.
Ë Ë Ë Ë
× 0.4 1 0.4 1 0.4 1 0.6 0
× 0.2 01 0.2 01 0.4 00 0.4 1
× 0.2 000 0.2 000 0.2 01
× 0.1 0010 0.2 001
× 0.1 0011
FIGURE 3.8
Binary Huffman coding table.
The average length of the code is bits/symbol and the efficiency of the
Huffman code is
. Since the Huffman code is a compact
code then any other instantaneous code we may design will have an average length,
, such that .
From Figure 3.8 in Example 3.17 different reduced sources and are possible
by inserting the combined symbol in a different position when re-ordering.
Source Coding 139
root
1.0
1
0
0.6 0.4
1
0
0.4 0.2
1
0
0.2 0.2
1
0
0.1 0.1
FIGURE 3.9
Binary Huffman coding tree.
140 Fundamentals of Information Theory and Coding Design
This may or may not result in a compact code with different individual code word
lengths. Either way the compact code will have the same average length. However
the average length variance:
¾
(3.35)
may be different.
EXAMPLE 3.18
The average length variance for the Huffman code of Example 3.17 is .
Consider a different Huffman code tree for the same source shown by Figures 3.10
and 3.11.
Ë Ë Ë Ë
0.4 00 0.4 00 0.4 1 0.6 0
0.2 10 0.2 01 0.4 00 0.4 1
0.2 11 0.2 10 0.2 01
0.1 010 0.2 11
0.1 011
FIGURE 3.10
Different binary Huffman code table.
Although this different Huffman code for the same source possesses different indi-
vidual code word lengths the average length is still bits/symbol; however,
the average length variance is now .
Source Coding 141
root
1.0
1
0
0.6 0.4
1 1
0 0
1
0
0.1 0.1
FIGURE 3.11
Different binary Huffman code tree.
142 Fundamentals of Information Theory and Coding Design
The Huffman code from Example 3.18 has a smaller average length variance than
the code produced in Example 3.17. Codes with smaller average length variance are
preferable since they produce a more constant code bit rate. The following result
can be used to ensure the Huffman coding tree produces a compact code with the
smallest average length variance.
RESULT 3.5
If the combined symbol, ½ , is placed at the highest available position for that
probability assignment when the reduced source is re-ordered then the resulting
compact code will possess the smallest average length variance.
For the design of general -ary Huffman codes the Huffman coding algorithm is as
follows:
1. Calculate ´ ½µ
. If is a non-integer value then append “dummy” sym-
bols to the source with zero probability until there are
symbols.
2. Re-order the source symbols in decreasing order of symbol probability.
3. Successively reduce the source to ½ , then ¾ and so on, by combining
the last symbols of into a combined symbol and re-ordering the new set
of symbol probabilities for ·½ in decreasing order. For each source keep
track of the position of the combined symbol, ·½ . Terminate the source
reduction when a source with exactly symbols is produced. For a source with
symbols the reduced source with symbols will be .
4. Assign a compact -ary code for the final reduced source. For a source with
symbols the trivial code is .
Source Coding 143
5. Backtrack to the original source assigning a compact code for the th re-
duced source by the method described in Result 3.4. The compact code as-
signed to , minus the code words assigned to any “dummy” symbols, is the
-ary Huffman code.
The operation of the -ary Huffman code is demonstrated in the following example.
EXAMPLE 3.19
We want to design a compact quaternary code for a source with originally 11
symbols and the following probability assignments (re-ordered in decreasing
order of probability for convenience):
The average length of the code is quaternary units per symbol and the
Huffman code has an efficiency of .
144 Fundamentals of Information Theory and Coding Design
Ë Ë½ ˾ Ë¿
0.16 2 0.16 2 0.25 1 0.45 0
0.14 3 0.14 3 0.16 2 0.25 1
0.13 00 0.13 00 0.14 3 0.16 2
0.12 01 0.12 01 0.13 00 0.14 3
0.10 02 0.10 02 0.12 01
0.10 03 0.10 03 0.10 02
0.06 11 0.08 10 0.10 03
0.06 12 0.06 11
0.05 13 0.06 12
0.04 100 0.05 13
0.04 101
0.00 102
0.00 103
FIGURE 3.12
-ary Huffman code table.
Source Coding 145
root
1.0
0 1 2 3
0 1 2 3 0 1 2 3
0 1 2 3
FIGURE 3.13
-ary Huffman code tree.
146 Fundamentals of Information Theory and Coding Design
The source statistics are assumed stationary. If there are changes an adaptive
scheme is required which re-estimates the probabilities, and recalculates the
Huffman code table.
Encoding and decoding is performed on a per block basis; the code is not
produced until a block of symbols is received. For large this may require
the last segment to be padded with dummy data.
One solution to using Huffman coding on increasingly larger extensions of the source
is to directly code the source message to a code sequence using arithmetic coding
which we will describe here. Rather than deriving and transmitting a code table of
size , segmenting the incoming source message into consecutive blocks of length
, and encoding each block by consulting the table, arithmetic coding directly pro-
cesses the incoming source message and produces the code “on-the-fly.” Thus there
is no need to keep a code table, making arithmetic coding potentially much more
computationally efficient than Huffman coding. The basic concept of arithmetic cod-
ing can be traced back to the 1960’s; however, it wasn’t until the late 1970’s and mid
1980’s that the method started receiving much more attention, with the paper by
Witten et al. [12] often regarded as one of the seminal papers in the area.
Arithmetic coding, like Huffman coding, does require reliable source statistics to be
available. Consider the -length source message ½ ¾ where
are the source symbols and indicates that the th character in the mes-
sage is the source symbol . Arithmetic coding assumes that the probabilities,
Source Coding 147
. That is the length of each interval is
and the end of the interval is given by . The interval corresponding to the
symbol , that is , is then selected. Next we consider the second letter of
the message, . The individual symbols, , are now assigned the interval
where:
(3.36)
0 ½ ¾
½ ¾
FIGURE 3.14
Interval subdividing operation of arithmetic coding.
the trailing 0’s). Since the interval occurs with probability ½ ¾ then
approximately ¾ ½ ¾ bits are needed and where this becomes
equality for all message sequences a efficient coding is possible. A simple
demonstration of arithmetic coding is given in Example 3.20.
EXAMPLE 3.20
Consider the message ¾ ¾ ½ originating from the 3-symbol source with the fol-
lowing individual and cumulative probabilities:
Symbol
½ 0.2 0.2
¾ 0.5 0.7
¿ 0.3 1.0
We further assume that the source is zero-memory, thus ½ ¾ ½
. As shown by Figure 3.15 initially the probability line is divided into
three consecutive, adjoint intervals of , and corresponding
to ½ ,¾ and ¿ and the length ratios , respectively. The length ratios and
interval lengths both correspond to . When the first letter of the message, ¾ , is
received the interval is selected. The interval is further divided into
three consecutive, adjoint subintervals of length ratios , that is
and . The length ratios correspond to ¾ and
the subinterval lengths are equal to ¾ ¾ ¾ ¾ .
When the second letter of the message, ¾ , is received the subinterval is
selected. The interval is then further subdivided into the three subintervals
, and with length ratios and when
Source Coding 149
the third and last letter of the message, ½ , is received the corresponding and final
interval of length ¾ ¾ ½ ¾ ¾ ½ is selected.
The interval selection process can be summarised by the following table:
Next Letter Interval
¾ [0.2,0.7)
¾ [0.3,0.55)
½ [0.3,0.35)
The above description of arithmetic coding is still incomplete. For example, how
does one select the number that falls within the final interval so that it can be trans-
mitted in the least number of bits? Let denote the final interval. One
scheme described in [6] which works quite well in the majority of cases is to carry
out the binary expansion of and until they differ. Since , at
the first place they differ there will be a 0 in the expansion for and a 1 in the
expansion for . That is:
½ ¾ ½
½ ¾ ½
The number ½ ¾ ½ falls within the interval and requires the least number
of bits from any other number within the same interval; so it is selected and trans-
mitted as the -bit code sequence ½ ¾ ½ .
We have described how the message is encoded and transmitted but have not yet
discussed how the received code is decoded back to the original message. Figure
3.16 lists an algorithm for encoding a message using arithmetic coding to the binary
integer code, , and Figure 3.17 lists an algorithm for decoding the received
binary integer, , back to the original message.
NOTE In both the encoding and decoding algorithms mention is made of the
or end-of-file. Consider Example 3.20. The code sequence 0101 corresponds
to the binary fraction and decimal fraction . The number not
only falls within the final interval for the message ¾ ¾ ½ but also for the longer
150 Fundamentals of Information Theory and Coding Design
1.0
0.3 ¿
0.7 0.7
0.3(0.5) = 0.15 ¿
0.55 0.55
0.3(0.25) = 0.075 ¿
0.475
0.5 ¾
0.5(0.5) = 0.25 ¾
0.5(0.25) = 0.125 ¾
0.35
0.2(0.25) = 0.05 ½ ¾ ¾ ½
0.3 0.3
0.2(0.5) = 0.1 ½
0.2 0.2
0.2 ½
FIGURE 3.15
Arithmetic coding for Example 3.20.
Source Coding 151
Definitions
1. Let denote the distinct source symbols or letters
of the alphabet.
2. Let ½ denote the -length message that is to be encoded where
indicates that the th letter in the message is the symbol .
3. Assume for and are
available or can be estimated from the underlying source model.
Initialisation
Iteration
While do
1. get next letter of message,
2.
3.
4.
5.
6.
7.
done
Termination
Let and . Then transmit
the code as the binary integer .
FIGURE 3.16
Encoding algorithm for arithmetic codes.
152 Fundamentals of Information Theory and Coding Design
Definitions
1. Let denote the distinct source symbols or letters
of the alphabet.
2. Let denote the binary integer that is to be decoded to the original
message ½ , where indicates that the th letter in the message
is the symbol .
3. Assume for and are
available or can be estimated from the underlying source model.
Initialisation
get and store as the binary fraction .
Iteration
Repeat
EXAMPLE 3.21
Consider a zero-memory source with the following source alphabet and probabilities:
Symbol
6/15 6/15 [0.0, 6/15)
2/15 8/15 [6/15, 8/15)
2/15 10/15 [8/15, 10/15)
4/15 14/15 [10/15, 14/15)
1/15 15/15 [14/15, 1.0)
154 Fundamentals of Information Theory and Coding Design
and hence the integer code that uniquely identifies the interval that is transmitted is
the 16-bit .
EXAMPLE 3.22
and the decoded message is where is the used to terminate the
decoding process.
The encoding and decoding algorithms depicted in Figures 3.16 and 3.17 iteratively
reduce the length of the interval with each successive letter from the message. Thus
with longer messages, smaller intervals are produced. If the interval range becomes
too small, then underflow is a very real problem. In Example 3.21 the interval range
is already down to 1.011e-04 after only the 8th letter in the message and in the binary
representation of the final interval the first (most significant) 16 bits are
identical. With modern 64-bit CPUs the and will become indistinguishable
when the first 64 bits are identical. The practical implementation of the encoding
and decoding algorithms requires an added scaling operation. The scaling operation
rescales the current based on the pseudo-algorithm depicted in Figure
3.18.
The scaling operation not only solves the underflow problem but permits transmis-
sion of the coded value before the encoding operation is complete. Thus a long
message block can be encoded without having to wait until the end of the message
before transmitting the coded value. Similarly, the decoder can begin decoding the
Source Coding 155
Rescale
If ½ ¾ ½ ½¾ and ½ ¾ ½ ½ ¾ then
rescale to:
½¾
½ ¾
most significant bits of the coded value (which are transmitted first by the encoder
when rescaling the intervals) before all the bits have been received. Examples 3.23
and 3.24 illustrate the use of the scaling algorithm of Figure 3.18 for the encoding
and decoding of arithmetic codes.
EXAMPLE 3.23
The encoding from Example 3.21 is now repeated with the additional rescaling oper-
ation of Figure 3.18 to prevent underfl
ow from occurring. The encoding with scaling
is shown in the following table where indicates that the and
are expressed in base :
Next letter ½¼ ¾
Init - - [0.0, 1.0) [0.0, 1.0)
1.0 [6/15, 8/15) [.4, .533333) [.01100110, .10001000)
.133333 [0, 6/15) [.4, .453333) [.01100110, .01110100)
(rescale) [.2, .626667) [.00110..., .10100...) 011
.426667 [8/15, 10/15) [.427556, .484444) [.01101101, .01111100)
(rescale) [.420444, .875556) [.01101..., .11100...) 011
.455111 [10/15, 14/15) [.723852, .845215) [.10111001, .11011000)
(rescale) [.447704, .690430) [.0111001..., .1011000...) 1
.242726 [8/15, 10/15) [.577158, .609521) [.10010011, .10011100)
(rescale) [.234520, .752335) [.0011..., .1100...) 1001
.517815 [0, 6/15) [.234520, .441646) [.00111100, .01110001)
(rescale) [.469040, .883292) [.0111100..., .1110001...) 0
.414252 [6/15, 8/15) [.634741, .689975) [.10100010, .10110000)
(rescale) [.077928, .519797) [.00010..., .10000...) 101
.441869 [14/15, 1) [.490339, .519797) [.01111101, .10000101)
(terminate) 1
When the last or letter of the message is read a final is transmitted. Concatenat-
ing the intermediate bits that were output yields the final
156 Fundamentals of Information Theory and Coding Design
EXAMPLE 3.24
The decoding from Example 3.22 is repeated with the additional rescaling operation
of Figure 3.18. The decoding processing with scaling is shown in the following table:
½¼ ¾
- - [0.0, 1.0) [0.0, 1.0) 1.0 .434250
.434250 [.400000,.533333) [.01100110, .10001000) .133333 .434250
.256874 [.4, .453333) [.01100110, .01110100)
(rescale) [.2, .626667) [.001100..., .10100...) .426667 .473999
.642185 [.427556, .484444) [.01101101, .01111100)
(rescale) [.420444, .875556) [.01101..., .11100...) .455111 .791992
.816389 [.723852, .845215) [.10111001, .11011000)
(rescale) [.447704, .690430) [.0111001..., .1011000...) .242726 .583984
.561459 [.577158, .609521) [.10010011, .10011100)
(rescale) [.234520, .752335) [.0011..., .1100...) .517815 .343750
.210944 [.234520, .441646) [.00111100, .01110001)
(rescale) [.469040, .883292) [.0111100..., .1110001...) .414252 .687500
.527360 [.634741, .689975) [.10100010, .10110000)
(rescale) [.077928, .519797) [.00010..., .10000...) .441870 .500000
.955197
NOTE The scaling operation not only expands the interval but repositions it
around 0.5. That is, the rescaled interval has and . Thus,
rescaling is needed whenever and are both less than or greater than 0.5.
One pathological condition that may arise and defeat the scaling operation is when
is just below 0.5 and is just above 0.5. Consider:
Shannon’s noiseless coding theorem states that if one designs a compact binary code,
using Huffman coding, for the th extension of a source then the average length of
the code, , is no greater than ½ . A similar theorem outlined in [6] states
Source Coding 157
that for the arithmetic coding scheme just described the average length is also no
greater than ½
when encoding a message of length . Thus Huffman coding
and arithmetic coding exhibit similar performance in theory. However the arith-
metic code encoding and decoding algorithms with scaling can be implemented for
messages of any length . Huffman coding for the th extension of a source, how-
ever, becomes computationally prohibitive with increasing since the computational
complexity is of the order .
DEFINITION 3.12 th-order Modelling
In th-order modelling the joint probabilities ½ ¾
·½ of all possible
-length message sequences are given. Furthermore:
¯ Since
·½½ ¾ ½ ¾ ·½ (3.39)
½ ¾
then the stationary state and conditional probabilities for the th order
Markov model, where
, are also provided.
We need to consider what effect the model order has on the Huffman and arithmetic
158 Fundamentals of Information Theory and Coding Design
With higher-order Huffman coding we consider the case of deriving the Huffman
code for the th extension of the source, , using a th order model of the source,
.
In practice the -gram model probabilities ½ ¾ ·½ are directly
estimated from the frequency counts:
where ½ ¾ ·½ is the number of times the -length message,
½ ¾ ·½ , is seen and ½ ¾ ·½ is the sum of all -
length messages that have been seen so far.
When building the Huffman coding tree only the relative order of the th extension
probabilities is important. Thus a fixed scaling of the probabilities can be applied
without affecting the Huffman coding algorithm. Thus practical implementations use
the model counts, ½ ¾ ·½ , directly instead of the model probabilities,
½ ¾ ·½ .
Source Coding 159
EXAMPLE 3.25
After observing a typical length binary message the following nd order model
counts are produced:
The binary Huffman code for the rd extension of the source is to be designed based
on the above nd order model. The binary Huffman code tree is shown in Figure 3.19
where the counts are directly used for construction of the tree.
The Huffman code is given in the following table:
rd extension Huffman code
000 1
001 010
010 0010
011 0000
100 011
101 00011
110 0011
111 00010
If then ½ ¾ ½ ´½ ½ ¾ ¾ ½ ½ µ for
´ µ
root
100
1
0
54 47
1
000
0
29 24
1 1
0 0
15 14 12 12
1 1 001 100
0 0
8 7 7 7
4 3
111 101
FIGURE 3.19
¿rd order Huffman coding tree.
Source Coding 161
The th order probabilities are derived from the respective frequency counts of the
data seen thus far using Equation 3.40. Unlike Huffman coding, arithmetic cod-
ing requires the absolute probabilities so a count of both ½ ¾ ·½ and
½ ¾ ·½ are needed.
EXAMPLE 3.26
Consider performing arithmetic coding on the message 01011 based on the nd order
model counts from Example 3.25. From the counts the rd extension, nd extension
and single binary probabilities can be calculated as shown in Figure 3.20.
P(000)=0.47
P(00)=0.59
P(001)=0.12 P(0)=0.74
P(010)=0.07
P(01)=0.15
P(011)=0.08
P(100)=0.12 P(10)=0.15
P(101)=0.03 P(1)=0.26
P(110)=0.07 P(11)=0.11
P(111)=0.04
FIGURE 3.20
Joint probabilities up to 3rd order.
´ ½ ¾µ
Applying the encoding algorithm of Figure 3.16 where ¾ ½
´ ½ µ
and
´ ¾ ½ µ
¾ ½
´
¾
½µ
yields:
½ ¾ ½ ¾ ½
- - - - [0.0, 1.0)
0/- 1.0 [0.0, 0.74) [0.0, 0.74)
1/0 0.74 ´¼½µ
´¼µ [0.797, 1.0) [0.590, 0.740)
´¼½¼µ
0/01 0.15 ´¼½µ
[0.0, 0.467) [0.590, 0.660)
´½¼½µ
1/10 0.070 ´½¼µ
[0.800, 1.0) [0.646, 0.660)
´¼½½µ
1/01 0.014 ´¼½µ
[0.467, 1.0) [0.6525, 0.6600)
1 1
1
0.59 0.646
0.74 1 1
0.66 0.652
0 0
0
0 0
FIGURE 3.21
Subdivision of probability line.
Source Coding 163
3.10 Exercises
1. For the following binary codes, determine:
(a) Whether the code is uniquely decodable. If not, exhibit two source mes-
sages with the same code.
(b) Whether the code is instantaneous. If not, can you design an instanta-
neous code with the same lengths of code words?
2. Consider a block code based on an r-symbol code alphabet and designed for
a source with q possible messages. Derive the expression for the lower bound
on the block code word length.
3. We are going to devise a code for the decimal digits 0-9 using a binary code.
Analysis shows that we should use the shortest codes for 0 and 1. If we code:
digit 0 to code 00
digit 1 to code 11
find the minimum code word length for the remaining digits 2-9 assuming
we want them to be the same code word length.
Code Lengths
A 222444
B 11233
C 112222
(c) If neither a binary nor ternary code can be formed find the smallest num-
ber of code symbols that will allow a code to be formed. Give an example
of such a code.
*5. Find all possible combinations of the individual code word lengths ½
when coding the messages which result in uniquely decodable binary
codes with words not more than 3 bits long (i.e., ¿).
6. A 6-symbol source has the following statistics and suggested binary and ternary
codes:
Code A Code B
0.3 0 00
0.2 10 01
0.1 1110 02
0.1 1111 10
0.2 1100 11
0.1 1101 12
Symbol
(a) What is ?
(b) Derive a compact binary code using Huffman coding. What is the aver-
age length of your code?
(c) Can you find another compact code with different individual code word
lengths? If so, what is the average length of this code?
(d) What is the efficiency of the code?
(e) What coding strategy can be used to improve the efficiency?
Symbol ½
0.1 0.2 0.2 0.4 0.1
Design a compact binary code for the source with minimum variance between
code word lengths. What is the efficiency of your code? Now design the
compact binary code with maximum variance and compare.
9. Derive the compact ternary code using Huffman coding for the following
source and compute the efficiency:
Symbol
0.07 0.4 0.05 0.2 0.08 0.05 0.12 0.03
Symbol
0.1 0.9
(a) Find .
(b) What is the compact (indeed trivial!) binary code for this source? Com-
pare the average length with . What is the efficiency?
(c) Repeat (b) for (the extension of ) for . Find and
compare this with . What is happening?
Symbol
00 0.75
01 0.1
10 0.1
11 0.05
(a) Design the source encoder by devising a compact code for the source.
(b) What is the efficiency of the code you designed in (a)?
(c) What is the entropy of the source encoder output (the code entropy),
? Compare your answer with the source entropy, . Will this
always be the case?
*12. Suppose a long binary message contains half as many 1’s as 0’s. Find a binary
Huffman coding strategy which uses:
166 Fundamentals of Information Theory and Coding Design
What is the efficiency of the code you devised for the above cases?
16. You are required to design a ternary source encoder that is at least 97% ef-
ficient. Observations of the ternary source over a certain time interval reveal
that symbol ׽ occurred 15 times, ׾ occurred 6 times and ׿ occurred 9 times.
Design the coding scheme. What is the efficiency of your code?
17. A binary message has been modeled as a 1st order Markov source:
0.3
0.7 0 1 0.9
0.1
(a) Derive a binary compact code for the 2nd extension of the source. What
is the efficiency of your code?
(b) Repeat (a) for the 3rd extension of the source.
18. A binary source emits an equal number of 0’s and 1’s but emits the same
symbol as the previous symbol twice as often as emitting a different symbol
than the previous symbol. Derive a binary encoding which is at least 95%
efficient.
19. The output of an unknown binary information source transmitting at 100 bps
(bits per second) produces a different output (from the previous output) three
times as often as producing the same output. Devise a binary source coding
strategy for each of the following cases, so that the channel bit rate is:
Source Coding 167
*20. Analysis of a binary information source reveals that the source is three times
more likely to emit the same bit if the last two bits were the same; otherwise it
is equally likely to produce a 0 or a 1.
Symbol ½ ¾
0.1 0.9
NOTE Do not use any scaling. To convert from base 10 to base 2 (and vice
versa) you will need a number base converter that can deal with non-integer
values. One such tool available from the Internet is
http://www.math.com/students/converters/source/base.htm.
22. Perform arithmetic decoding of your coded output values from Qu. 21(a),(b),(c)
and confirm that the original 3 letter messages are produced.
23. Repeat Qu. 21 but this time use scaling.
24. Repeat Qu. 22 but this time use scaling.
*25. Huffman codes are guaranteed to be instantaneous codes. Can the same state-
ment be made about arithmetic codes?
26. Consider Example 3.26:
(a) Perform arithmetic coding of the sequence 011 and compare with the
corresponding Huffman code from Example 3.25.
(b) Repeat (a) for the sequence 000.
168 Fundamentals of Information Theory and Coding Design
(a) Perform arithmetic decoding of the coded binary integer value 10101
assuming the original message was 3 bits long.
(b) Repeat (a) but for the coded binary integer value of 1.
*28. After observing a typical sample of a binary source the following nd order
model counts are produced:
(a) Derive the binary Huffman code table for the 3rd extension of the source
such that there is minimum variance in the individual code word lengths.
What is the average length of your code?
(b) Perform arithmetic coding on the following 3 letter messages and com-
pare with the corresponding Huffman code:
i. 110
ii. 111
iii. 100
3.11 References
[1] ASCII, retrieved January 25, 2002 from
http://www.webopedia.com/TERM/A/ASCII.html