0% found this document useful (0 votes)

182 views64 pages

Ch3:source Code

Chapter 3 discusses source coding, which is essential for efficient transmission and storage of digital information by encoding source messages into binary code. It covers definitions of source coding, unique decodability, and instantaneous codes, emphasizing the importance of the prefix condition for efficient decoding. The chapter also provides examples and methods for constructing instantaneous codes and illustrates the decoding process using a decoding tree.

Uploaded by

zak zak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

182 views64 pages

Ch3:source Code

Uploaded by

zak zak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Chapter 3

Source Coding

3.1 Introduction
An important problem in digital communications and computer storage is the efﬁ-
cient transmission and storage of information. Furthermore transmission and storage
of digital data require the information to be represented in a digital or binary form.
Thus there is a need to perform source coding, the process of encoding the informa-
tion source message to a binary code word, and lossless and faithful decoding from
the binary code word to the original information source message. The goal of source
coding for digital systems is two-fold:

1. Uniquely map any arbitrary source message to a binary code and back again
2. Efﬁciently map a source message to the most compact binary code and back
again

This is shown in Figure 3.1 for a source transmitting arbitrary text through a binary
(digital) communication channel. The source encoder efﬁciently maps the text to
a binary representation and the source decoder performs the reverse operation. It
should be noted that the channel is assumed noiseless. In the presence of noise,
channel codes are also needed and these are introduced in Chapter 5.

abccde 010010 010010 abccde

Source BINARY Source
Source Receiver
Encoder CHANNEL Decoder

FIGURE 3.1
Noiseless communication system.

We ﬁrst present the notation and terminology for the general source coding problem
and establish the fundamental results and properties of codes for their practical use

105
106 Fundamentals of Information Theory and Coding Design

and implementation. We then show how entropy is the benchmark used to determine
the efﬁciency of a particular coding strategy and then proceed to detail the algorithms
of several important binary coding strategies.
Consider a mapping from the source alphabet, , of size to the code alphabet, ,
of size and deﬁne the source coding process as follows:

DEFINITION 3.1 Source Coding Let the information source be described

by the source alphabet of size , , and deﬁne
as the code alphabet of size . A source message of length
is an n-length string of symbols from the source alphabet, that is, ½ ¾ .
A code word, , is a ﬁnite-length string of, say, symbols from the code alphabet,
that is, ½ ¾ ¿ . Encoding is the mapping from a source symbol,
, to the code word, , and decoding is the reverse process of mapping a
code word, , to the source symbol, . A code table or simply source code
completely describes the source code by listing the code word encodings of all the
source symbols, .

DEFINITION 3.2 nth Extension of a Code The th extension of a code

maps the source messages of length n, , which are the symbols from , the
th extension of , to the corresponding sequence of code words,
½ ¾ , from the code table for source .
A source code is identiﬁed as:

a non-singular code if all the code words are distinct.

a block code of length if the code words are all of ﬁxed length .

All practical codes must be uniquely decodable.

DEFINITION 3.3 Unique Decodability A code is said to be uniquely decodable

if, and only if, the th extension of the code is non-singular for every ﬁnite value
of . Informally, a code is said to be uniquely decodable if there are no instances
of non-unique (i.e., ambiguous) source decodings for any and all possible code
messages.

NOTE The following observations can be made:

1. A block code of length which is non-singular is uniquely decodable

2. It is a non-trivial exercise to prove whether a code is uniquely decodable in

general
Source Coding 107

3. A code is proven to be NOT uniquely decodable if there is at least one instance

of a non-unique decoding

The code alphabet of most interest is the binary code,

, of size
since this represents the basic unit of storage and transmission in computers and
digital communication systems. The ternary code,
, of size
is also important in digital communication systems utilising a tri-level form of line

coding. In general we can speak of -ary codes to refer to code alphabets of arbitrary

size .

EXAMPLE 3.1
Consider the source alphabet,
½ , and binary code, .
The following are three possible binary source codes.
Source Code A Code B Code C
0 0 00
11 11 01
00 00 10
11 010 11

We note that:

Code A is NOT non-singular since .

Code B is non-singular but it is NOT uniquely decodable since the code se-

quence can be decoded as either or , that is

where
is from the second extension of the code.
Code C is a non-singular block code of length 2 and is thus uniquely decodable.

3.2 Instantaneous Codes

Although usable codes have to be at least uniquely decodable, there is a subclass of

uniquely decodable codes, called instantaneous codes, that exhibit extremely useful
properties for the design and analysis of such codes and the practical implementation
of the encoding and decoding processes.
108 Fundamentals of Information Theory and Coding Design

DEFINITION 3.4 Instantaneous Codes A uniquely decodable code is said to be

instantaneous if it is possible to decode each message in an encoded string without
reference to succeeding code symbols.

A code that is uniquely decodable but not instantaneous requires the past and current
code symbols to be buffered at the receiver in order to uniquely decode the source
message. An instantaneous code allows the receiver to decode the current code word
to the correct source message immediately upon receiving the last code symbol of
that code word (e.g., the decoding is performed “on the ﬂy”).

EXAMPLE 3.2
Consider the following three binary codes for the source ½ :
Source Code A Code B Code C
0 0 0
10 01 01
110 011 011
1110 0111 111

Codes A, B and C are uniquely decodable (since no instance of a non-unique decoding

can be found) but only code A is instantaneous. Why? Consider the following
encoded string from code A:

The bars under and over each code symbol highlight each of the encoded source
symbols. It is apparent that the “0” code symbol acts as a code word terminator or
separator. Hence the moment a “0” is received it represents the last code symbol
of the current code word and the receiver can immediately decode the word to the
correct source symbol; thus code A is instantaneous.
Now consider the following encoded string from code B:

Here too the “0” acts as a code word separator but, unlike code A, the “0” now
represents the ﬁrst code symbol of the next code word. This means the current code
word cannot be decoded until the ﬁrst symbol of the next code word, the symbol “0,”
is read. For example assume we have received the string . We cannot decode this
to the symbol until we see the “0” of the next code word since if the next character
we read is in fact a “1” then that means the current code word () is , not .
Although we can verify that the code is uniquely decodable (since the “0” acts as a
separator) the code is not instantaneous.
Source Coding 109

Now consider the following encoded string from code C:

Surprisingly we cannot yet decode this sequence! In fact until we receive a “0”
or an EOF (end-of-ﬁle or code string termination) the sequence cannot be decoded
since we do not know if the ﬁrst code word is 0, 01 or 011. Furthermore once we
are in a position to decode the sequence the decoding process is itself not at all as
straightforward as was the case for code A and code B. Nevertheless this code is
uniquely decodable since the “0” still acts as a separator.

From Example 3.2 one can see that instantaneous codes are so named because de-
coding is a very fast and easy process (i.e., instantaneous!). However the practicality
of instantaneous codes also lie in the preﬁx condition which allows such codes to be
efﬁciently analysed and designed.

DEFINITION 3.5 Preﬁx of a Code Let ½ ¾ be a code word

of length n. A sub-string of code characters from , ½ ¾ , where
, is called a preﬁx of the code word .

DEFINITION 3.6 Preﬁx Condition A necessary and sufﬁcient condition for a

code to be instantaneous is that no complete code word of the code be a preﬁx of
some other code word.

Since the prefix condition is both necessary and sufficient the instantaneous codes
are also referred to as prefix codes since such codes obey the above prefix condition.

NOTE Since an instantaneous code is uniquely decodable the preﬁx condition is

a sufﬁcient condition for uniquely decodable codes. However it is not a necessary
condition since a code can be uniquely decodable without being instantaneous.

EXAMPLE 3.3
Consider the codes from Example 3.2. We can now immediately state whether the
codes are instantaneous or not:

Code A is instantaneous since no code word is a prefix of any other code word (code
A obeys the prefix condition). Since it is instantaneous it is also uniquely
decodable.
Code B is not instantaneous since ½ is a prefix of ¾ , ¾ is a prefix
of ¿ , etc. However it is uniquely decodable since the “0” acts as a
separator.
110 Fundamentals of Information Theory and Coding Design

Code C is not instantaneous since ½ is a preﬁx of ¾ , etc. However it is

uniquely decodable since the “0” can be used as a separator.

Figure 3.2 graphically depicts the universe of all codes and how the different classes
of codes we have described are subsets of one another. That is, the class of all
instantaneous codes is a subset of the class of all uniquely decodable codes which in
turn is a sub-class of all non-singular codes.

All Codes

Uniquely Decodable
Codes

Non−singular Codes

Instantaneous (Prefix)
Codes

FIGURE 3.2
Classes of codes.

3.2.1 Construction of Instantaneous Codes

The prefix condition not only makes it easy to determine whether a given code is
instantaneous or not but it can also be used to systematically design an instantaneous
code with specified lengths for the individual code words. The problem can be stated
as follows. For a source with symbols it is required to design the individual
code words with specified code word lengths of ½ ¾ such that the code is
instantaneous (i.e., it satisfies the prefix condition). To design the code the code
Source Coding 111

word lengths are sorted in order of increasing length. The code words are derived
in sequence such that at each step the current code word does not contain any of the
other code words as a preﬁx. A systematic way to do this is to enumerate or count
through the code alphabet.

EXAMPLE 3.4

Problem 1: Design an instantaneous binary code with lengths of 3, 2, 3, 2, 2

Solution 1: The lengths are re-ordered in increasing order as 2, 2, 2, 3, 3. For the ﬁrst
three code words of length 2 a count from 00 to 10 is used:

For the next two code words of length 3, we count to 11, form 110, and then start
counting from the right most symbol to produce the complete code:

Problem 2: Design an instantaneous ternary code with lengths 2, 3, 1, 1, 2

Solution 2: The code word lengths are re-ordered as 1, 1, 2, 2, 3 and noting that a
ternary code has symbols we systematically design the code as follows:

Problem 3: Design an instantaneous binary code with lengths 2, 3, 2, 2, 2

Solution 3: The code word lengths are re-ordered as 2, 2, 2, 2, 3 and we immediately
see that an instantaneous code cannot be designed with these lengths since:

112 Fundamentals of Information Theory and Coding Design

3.2.2 Decoding Instantaneous Codes

Since an instantaneous code has the property that the source symbol or message can
be immediately decoded upon reception of the last code character in the current code
word the decoding process can be fully described by a decoding tree or state machine
which can be easily implemented in logic.

EXAMPLE 3.5

Figure 3.3 is the decoding tree that corresponds to the binary code:
Source Code
½ 00
01
10
110
111

Read 1
0

Start
Read
1
0
Read
1 0
Read
1

FIGURE 3.3
Decoding tree for an instantaneous code.
Source Coding 113

The receiver simply jumps to the next node of the tree in response to the current
code character and when the leaf node of the tree is reached the receiver produces
the corresponding source symbol, , and immediately returns to the root of the tree.
Thus there is no need for the receiver to buffer any of the received code characters in
order to uniquely decode the sequence.

3.2.3 Properties of Instantaneous Codes

Property 1 Easy to prove whether a code is instantaneous by inspection of whether

the code satisﬁes the preﬁx condition.

Property 2 The preﬁx code permits a systematic design of instantaneous codes

based on the speciﬁed code word lengths.

Property 3 Decoding based on a decoding tree is fast and requires no memory stor-
age.

Property 4 Instantaneous codes are uniquely decodable codes and where the length
of a code word is the main consideration in the design and selection of codes
there is no advantage in ever considering the general class of uniquely decod-
able codes which are not instantaneous.

Property 4 arises because of McMillan’s Theorem which will be discussed in the

next section.

3.2.4 Sensitivity to Bit Errors

The instantaneous codes we have been discussing are variable-length codes since the
code words can be of any length. Variable-length codes should be contrasted with
block codes of length that restrict code words to be of the same length . As we
will see variable-length codes provide greater flexibility for sources to be encoded
more efficiently. However a serious drawback to the use of variable-length codes is
their sensitivity to bit or code symbol errors in the code sequence. If a single bit
error causes the decoder to interpret a shorter or longer code word than is the case
then this will create a synchronisation error between the first bit of the code word
generated by the encoder and the root of the decoding tree. Subsequent code words
will be incorrectly decoded, including the possibility of insertion errors, until the
synchronisation is re-established (which may take a very long time). With block
codes a single bit error will only affect the current block code and the effect will not
propagate to subsequent code words. In fact block codes only suffer this problem
initially when the transmission is established and the decoder has to “lock” onto the
first bit of the code and where there are unmanaged timing discrepancies or clock
skew between the transmitter and receiver.
114 Fundamentals of Information Theory and Coding Design

EXAMPLE 3.6
Consider the following variable-length and block instantaneous binary codes for the
source ½ .
Source Variable-length code Block code
0 00
10 01
110 10
111 11

Consider the following source sequence:

where indicates that symbol is being transmitted at time . The corresponding

variable-length code sequence is:

Assume the code sequence is now transmitted through an information channel which
introduces a bit error in the 2nd bit. The source decoder sees:

and generates the decoded message sequence:

The single bit error in the encoded sequence produces both a substitution error (2nd
symbol is substituted by ) and insertion error ( is inserted after and subse-
quent source symbols appear as ) in the decoded message sequence and 7 rather
than 6 source symbols are produced.
Now assume there is a bit error in the 6th bit. The source decoder sees:

and generates the decoded message sequence:

The single bit error now causes two isolated substitution errors ( 3rd symbol sub-
stituted by and 5th symbol substituted by ). Now consider the corresponding
block code sequence:
Source Coding 115

½½ ¼¼½¼¼½½½¼½¼¼

For errors in both the 2nd and 6th bits, a bit error in the 2nd bit causes a single
substitution error (1st symbol substituted by ) and a bit error in the 6th bit also
causes a single substitution error (3rd symbol substituted by ). In fact any bit
errors will only affect the current code word and will not have any effect on subsequent
code words, and all errors will be substitution errors.

From Example 3.6 it is apparent that variable-length codes are very sensitive to bit
errors, with errors propagating to subsequent code words and symbols being inserted
as well as being substituted. If the main goal of the source coder is to map the
source symbols to the code alphabet then a block coding scheme should be used. For
example, the ASCII code [1] is a 7-bit (or 8-bit in the case of extended ASCII) block
code mapping letters of the English alphabet and keyboard characters to a binary
block code of length 7 or 8. Furthermore, a channel coder (see Chapter 5) with an
appropriate selection of an error-correcting or error-detecting code (see Chapters 6
to 9) is mandatory to ensure the ﬁnal code sequence is close to error-free and must
always be present when using variable-length codes.

The loss of synchronisation arising from code symbol errors with variable-length
codes is a speciﬁc example of the more general problem of word synchronisation
between the source and receiver which is discussed by Golomb et al. [5].

3.3 The Kraft Inequality and McMillan’s Theorem

3.3.1 The Kraft Inequality

If the individual code word lengths are speciﬁed there is no guarantee that an in-
stantaneous code can be designed with those lengths (see Example 3.4). The Kraft
Inequality theorem [8] provides a limitation on the code word lengths for the design
of instantaneous codes. Although not of real use in practice (the coding strategies we
will later discuss will guarantee codes that are instantaneous) the Kraft Inequality is
a precursor to the more important McMillan’s Theorem [10] which states that where
code word lengths are the only consideration, an instantaneous code will always be
as good as any uniquely decodable code which is not necessarily instantaneous.
116 Fundamentals of Information Theory and Coding Design

THEOREM 3.1 Kraft Inequality

A necessary and sufﬁcient condition for the existence of an instantaneous code with
alphabet size and code words with individual code word lengths of ½
is that the following inequality be satisﬁed:

(3.1)

Conversely, given a set of code word lengths that satisfy this inequality, then there
exists an instantaneous code with these word lengths.

The proof of the Kraft Inequality is interesting in that it is based on a formal de-
scription of how instantaneous codes are constructed (see Section 3.2.1) given the
code word lengths, where the Kraft Inequality needs to be satisﬁed for the code to be
successfully constructed.

PROOF Let be the th source message or symbol and the corresponding
code word of length . The proof requires the code word lengths to be arranged in
ascending order, that is, we assume that the code word lengths are arranged such that
. The number of possible code words for is . To ensure
the code is instantaneous we need to consider the number of permissible code words
for such that the prefix condition is satisfied.
Consider the shortest code word, Then the number of permissible code words
for is simply ½ . Next consider code word The number of possible
with as a prefix is given by the expression ¾ ½ (since with as the
prefix the first symbols are fixed and one can choose the remaining symbols
arbitrarily). To ensure the prefix condition is satisfied the number of permissible code
words for is the number of possible code words, ¾ less those code words
which have as a prefix, ¾ ½ . That is, the number of permissible code words
for is ¾ ¾ ½ . Similarly for the number of permissible code words
is the number of possible code words, ¿ , less those code words which have
as a prefix, ¿ ¾ and less those code words which have as a prefix, ¿ ½
that is ¿ ¿ ¾ ¿ ½ . For code word, , the expression for the number of
permissible code words is:

½ ¾ ½
(3.2)

To be able to construct the code we want to ensure that there is at least one permissible
code word for all source messages, That is we require the following
inequalities to be simultaneously satisﬁed:
½

¾
¾ ½

Source Coding 117
..
.
½
½ (3.3)

By multiplying the th equation by and rearranging we get:

¾ ½

..
.
½ ½ (3.4)

We note that if the last inequality holds then all the preceding inequalities will also
hold. Thus the following inequality expression must hold to ensure we can design an
instantaneous code:

½ ½
(3.5)

which is the Kraft Inequality.

The Kraft Inequality of Equation 3.1 only indicates whether an instantaneous code
can be designed from the given code word lengths. It does not provide any indica-
tion of what the actual code is, nor whether a code we have designed which satisfies
Equation 3.1 is instantaneous, but it does tell us that an instantaneous code with the
given code word lengths can be found. Only by checking the prefix condition of the
given code can we determine whether the code is instantaneous. However if a code
we have designed does not satisfy Equation 3.1 then we know that the code is not
instantaneous and that it will not satisfy the prefix condition. Furthermore, we will
not be able to find a code with the given code word lengths that is instantaneous. A
less apparent property of the Kraft Inequality is that the minimum code word lengths
for the given alphabet size and number of code words that can be used for designing
an instantaneous code is provided by making as close as possible to 1. Obvi-
ously shorter overall code word lengths intuitively yield more efficient codes. This
is examined in the next section.

EXAMPLE 3.7
Consider the following binary ( ) codes
Source Code A Code B Code C
0 0 0
100 100 10
110 110 110
111 11 11
118 Fundamentals of Information Theory and Coding Design

Code A satisﬁes the preﬁx condition and is hence instantaneous. Calculating the
Kraft Inequality yields:

which is as expected.

Code B does not satisfy the preﬁx condition since is a preﬁx of ; hence the code
is not instantaneous. Calculating the Kraft Inequality yields:

which implies that an instantaneous code is possible with the given code word lengths.
Thus a different code can be derived with the same code word lengths as code B which
does satisfy the prefix condition. One example of an instantaneous code with the same
code word lengths is:

Code C does not satisfy the prefix condition since is a prefix of ; hence the code
is not instantaneous. Calculating the Kraft Inequality yields:

which implies that an instantaneous code cannot be designed with these code word
lengths, and hence we shouldn’t even try.

3.3.2 McMillan’s Theorem

As we will discuss in the next section, the use of shorter code word lengths creates
more efﬁcient codes. Since the class of uniquely decodable codes is larger than the
class of instantaneous codes, one would expect greater efﬁciencies to be achieved
considering the class of all uniquely decodable codes rather than the more restric-
tive class of instantaneous codes. However, instantaneous codes are preferred over
uniquely decodable codes given that instantaneous codes are easier to analyse, sys-
tematic to design and can be decoded using a decoding tree (state machine) structure.
McMillan’s Theorem assures us that we do not lose out if we only consider the class
of instantaneous codes.
Source Coding 119

THEOREM 3.2 McMillan’s Theorem

The code word lengths of any uniquely decodable code must satisfy the Kraft
Inequality:

(3.6)

Conversely, given a set of code word lengths that satisfy this inequality, then there
exists a uniquely decodable code (Deﬁnition 3.3) with these code word lengths.

The proof of McMillan’s Theorem is presented as it is instructive to see the way it

uses the formal deﬁnition of unique decodability to prove that the inequality must be
satisﬁed. The proof presented here is based on that from [2].

PROOF Assume a uniquely decodable code and consider the quantity:

½ ¾
(3.7)

When written out, the quantity will consist of the

terms arising from the th
extension of the code, each of the form:

½ ¾

(3.8)

where we deﬁne .

Then is the length, , of the th code word in the th extension of the code and
is the length of the sequence of code words in the th extension of the code. Let

= be the maximum code word length over the code

words. The minimum code word length is, of course, a length of 1. Then can

assume any value from to . Let be the number of terms of the form .

Then:

(3.9)

Thus represents the number of code word sequences in the th extension of the

code with a length of . If the code is uniquely decodable then the th extension of
the code must be non-singular. That is, must be no greater than , the number

of distinct sequences of length . Thus for any value of we must have:

(3.10)
120 Fundamentals of Information Theory and Coding Design

or:

(3.11)

For and we have that
.

Since the above inequality has to hold for all values of , then this will be true if:

(3.12)

The implication of McMillan’s Theorem is that for every non-instantaneous uniquely

decodable code that we derive, an instantaneous code with the same code word
lengths will always be found since both codes satisfy the same Kraft Inequality.
Thus we can restrict ourselves to the class of instantaneous codes since we will not
gain any efﬁciencies based on code word lengths by considering the larger class of
uniquely decodable codes.

EXAMPLE 3.8
It is required to design a uniquely decodable ternary ( ) code with code word
lengths 1, 1, 2, 2, 3, 3. Since a uniquely decodable code satisﬁes the Kraft Inequality
by McMillan’s Theorem we check whether this is the case. Calculating the Kraft
Inequality:

shows that a uniquely decodable code with these lengths can be found.
BUT the same Kraft Inequality is satisﬁed for instantaneous codes and since instan-
taneous codes can be systematically constructed following the procedure described
in Section 3.2.1 the following instantaneous code, which is uniquely decodable, is
designed:

Source Coding 121

3.4 Average Length and Compact Codes

3.4.1 Average Length

When considering a collection of possible -ary codes for the same source a choice
needs to be made between the different codes by comparing the performance based
on a criteria of interest. For storage and communication purposes the main crite-
rion of interest is the average length of a code, where codes with smaller average
length are preferred. To calculate the average length the source symbol (or message)
probabilities are needed.

DEFINITION 3.7 Average Length Deﬁne as the individual

source symbol probabilities for a source with possible symbols. Deﬁne
as the length of the corresponding code words for a given source coding.

Then the average length of the code, , is given by:

(3.13)

Consider all possible -ary codes for the same source, . The number of source
symbols, , and source probabilities, , are constant, but the code
word lengths vary with each code. The best code or compact
code will be the one with the smallest average length.

DEFINITION 3.8 Compact Codes Consider a uniquely decodable code that

maps the symbols for a source to code words from an -ary code alphabet. The
code will be a compact code if its average length is less than or equal to the average
length of all other uniquely decodable codes for the same source and code alphabet.

EXAMPLE 3.9
Consider the following two binary codes for the same source. Which code is better?
Source Code A Code B
0.5 00 1
0.1 01 000
0.2 10 001
0.2 11 01

The average length of code A is obviously bits per symbol. The average
length of code B is
bits per symbol.
122 Fundamentals of Information Theory and Coding Design

Code B is better than code A since . But is there another code, call it code
C, which has ? Or is code B the compact code for this source? And how
small can the average length get?

The fundamental problem when coding information sources and the goal of source
coding for data compression is to ﬁnd compact codes. This obviously requires the
individual code word lengths to be made as small as possible, as long as we still
end up with a uniquely decodable (i.e., instantaneous) code. Intuitively the average
length will be reduced when the shorter length code words are assigned to the most
probable symbols and the longer code words to the least probable symbols. This
concept is shown by Example 3.9 where code B assigns the shortest code word (of
length 1) to the most probable symbol (with probability 0.5). Formally, the problem
of searching for compact codes is a problem in constrained optimisation. We state
the problem as follows.

REMARK 3.1 Given for a source with symbols the compact

-ary code is given by the set of integer-valued code word lengths
that minimise:

(3.14)

such that the Kraft Inequality constraint is satisﬁed:

(3.15)

3.4.2 Lower Bound on Average Length

From Chapter 1 the information content of a source is given by its entropy. When
using logarithms to base 2 the entropy is measured in units of “(information) bits per
source symbol.” Similarly when using binary source codes the average length is also
measured in units of “(code) bits per source symbol.” Since the entropy provides a
measure of the intrinsic information content of a source it is perhaps not surprising
that, for there to be no losses in the coding process, the average length must be at least
the value of the entropy (no loss of information in the coding representation) and
usually more to compensate for the inefﬁciencies arising from the coding process.
The following theorem establishes this lower bound on the average length of any
possible coding of the source based on the entropy of the source.
Source Coding 123

THEOREM 3.3
Every instantaneous -ary code of the source, ½ , will have an
average length, , which is at least the entropy, , of the source, that is:

(3.16)

with equality when for where is the probability of source

symbol ,
¾ is the entropy of the source using logarithms to the
base and is the entropy of the source using logarithms to the base 2.

PROOF Consider the difference between the entropy and the average
length and simplify the expression as follows:

(3.17)

Using the inequality (with equality when where

gives:

(3.18)

code is instantaneous the code word lengths will obey the Kraft Inequality
Since the
so that and hence:

with equality when

, or for .

The deﬁnition of entropy in Chapter 1 now makes practical sense. The entropy is
a measure of the intrinsic amount of average information (in -ary units) and the
124 Fundamentals of Information Theory and Coding Design

average length of the code must be at least equal to the entropy of the code to ensure
no loss in coding (lossless coding). Thus the smallest average length, , for any code
we design will be . However there is no guarantee that a compact code
for a particular source and code alphabet can be found. Indeed, unless for
we will always have that , but the closer is to
then the better or more efﬁcient the code.

DEFINITION 3.9 Code Efﬁciency The efﬁciency of the code is given by:

(3.19)

where if the code is efﬁcient.

EXAMPLE 3.10
The entropy for the source in Example 3.9 is:

Thus for this source we must have no matter what instantaneous binary
code we design. From Example 3.9 we have:

and code A has an efﬁciency of ½ .

and code B has an efficiency of .
Since code B is already at efficiency we can probably state with some confidence
that code B is a compact code for source . And even if it wasn’t, at best a compact
code would only provide us with no more than improvement in coding efficiency.

Ideally compact codes should exhibit efﬁciency. This requires that
, and from the condition for equality, , it implies
for . The problem lies in the fact that the have to be integer values.
If this is the case then we have a special source.

DEFINITION 3.10 Special Source A sourcewith symbol probabilities

such that are integers is a special source
for -ary codes since an instantaneous code with code word lengths

for
a code alphabet of size can be designed which is 100% efﬁcient with .

EXAMPLE 3.11
Consider the following 4-symbol source:
Source Coding 125

Source A
½ 0.125
0.25
0.5
0.125

We note that the symbol probabilities are of the form with , ,
, and . Thus a 100% efﬁcient compact binary code can be designed with
code word lengths of 3, 2, 1, 3 with . For example:
Source A Code A
110
10
0
111

Now consider the following 9-symbol source:

Source B

1/9
1/9
1/3
1/27
1/27
1/9
1/9
1/27
1/9

We note that the symbol probabilities are of the form
with
for . Thus a 100% efﬁcient ternary code can be designed
with code word lengths of 2,2,1,3,3,2,2,3,2 with .

3.5 Shannon’s Noiseless Coding Theorem

3.5.1 Shannon’s Theorem for Zero-Memory Sources

Equation 3.16 states that the average length of any instantaneous code is bounded
below by the entropy of the source. In this section we show that the average length,
, of a compact code is also bounded above by the entropy plus 1 unit of information
(1 bit in the case of a binary code). The lower bound was important for establishing
how the efﬁciency of a source code is measured. Of practical importance is the exis-
tence of both a lower and upper bound which, as we will see, leads to the important
126 Fundamentals of Information Theory and Coding Design

Shannon’s Noiseless Coding Theorem [11]. The theorem states that the coding efﬁ-
ciency (which will always be less than 100% for compact codes that are not special)
can be improved by coding the extensions of the source (message blocks generated
by the source) rather than just the source symbols themselves.

THEOREM 3.4
Let be the average length of a compact -ary code for the source . Then:

(3.20)

The proof of the theorem as presented makes mention of a possible coding strategy
that one may adopt in an attempt to systematically design compact codes. Although
such a strategy, in fact, designs codes which are not compact it establishes the theo-
rem which can then be extended to the case of compact codes.

PROOF Let ½ and consider the sub-optimal coding scheme

where the individual code word lengths are chosen according to:

(3.21)

We can justify that this is a reasonable coding scheme by noting from Theorem 3.3
that a 100% efﬁcient code with is possible if we select

and where does not equal an integer we “round up” to provide integer length
assignments, yielding the coding scheme just proposed. Multiplying Equation 3.21
by and summing over gives:

(3.22)

which yields, using the deﬁnitions for entropy and average length:

(3.23)

where is the average length using this sub-optimal coding scheme. Consider
a compact code for the same source with average length . Then by deﬁnition
and from Theorem 3.3 we have . Thus we must also have:

(3.24)
Source Coding 127

NOTE One coding scheme resulting in code word lengths given by Equation 3.21
is the Shannon code [9]. Due to the sub-optimal nature of this coding scheme it will
not be elaborated further upon.

Equation 3.20 indicates the average length of a compact code will be no more than 1
unit away from the average length of a 100% efﬁcient code. However we now show
that by taking extensions using Equation 3.20 we can improve the efﬁciency of the
code.

Let ½ ¾ be the th extension of the source ½ ¾

where ½ ¾ . Consider the compact code for with average length
. Then from Theorem 3.4 we have that:

(3.25)

The can be considered the joint entropy of independent sources since:

½ ¾ (3.26)

and from Section 1.18 we have the result that

which when substituted into Equation 3.25 and dividing by yields:

(3.27)

NOTE The term is the average length of the code words per symbol when
coding the th extension which should not be confused with which is the average
length of the code words per symbol when coding the source . However can
be used as the average length of a coding scheme for the source (based on coding
the th extension of the source).

From Equation 3.27 the average length of a compact code for based on coding
the th extension of is no more than ½ units away from the average length of a
100% efficient code and this overhead decreases with larger extensions. Intuitively
we expect that coding the th extension of the source will provide more efficient
codes, approaching 100% efficiency as . We formally state these results in
the following theorem.
128 Fundamentals of Information Theory and Coding Design

THEOREM 3.5 Shannon’s Noiseless Coding Theorem

Let be the average length of a compact code for the source , then:

Now let be the average length of a compact code for the th extension of , then:

and thus:

that is, in the limit the code becomes 100% efﬁcient.

NOTE Since we can’t get something for nothing there is a price to be paid in
improving coding efﬁciency by taking extensions. The price is the increased cost of
encoding and decoding a code with code words, which represents an exponential
increase in complexity.

EXAMPLE 3.12
Taking extensions to improve efﬁciency is apparent when the source and code alpha-
bets are the same. Consider the case of a binary coding scheme for a binary source.
Without extensions we would have the trivial result:
Source Compact
Code
½ 0.8 0
¾ 0.2 1

The efﬁciency of this compact code (which is no code at all!) is

´ µ
where
bits per symbol and yielding a compact code which is only
72% efﬁcient. Consider taking the second extension of the code where ¾
and ¾ for and :
Source Compact
Code
½ ½ 0.64 0
½ ¾ 0.16 10
¾ ½ 0.16 110
¾ ¾ 0.04 111

The compact code for the second extension is as shown in the above table. The
average length of this code is ¾
Source Coding 129

bits per symbol pair and ¾¾ bits per symbol yielding a compact code for the
´ µ
efficient, a significant improvement
¾
second extension which is ¾
over 72% efficient.

3.5.2 Shannon’s Theorem for Markov Sources

The statement of Shannon’s Noiseless Coding Theorem was proven for zero-memory
sources. We now show that a similar statement applies to the general case of Markov
sources. Consider an th order Markov source, , and the corresponding entropy,
, based on the th order Markov model of the source. Deﬁne as the
entropy of the equivalent zero-memory model of the source (where is the adjoint
source). Since Equation 3.20 applies to zero-memory sources it also applies to the
adjoint source, hence:

(3.28)

and also:

(3.29)

where is the adjoint of the th extension of the source, (not to be confused

with which is the th extension of the adjoint source, ).
We know from our results from Chapter 1 that , or equivalently
that (for ). We also have that .
Substituting these into Equation 3.29 gives:

(3.30)

and

(3.31)

and hence we can see that

(3.32)

That is, the lower bound on is the entropy of the th order Markov model .
Consider a sequence of symbols from an unknown source that we are coding. The
entropy of the source depends on the model we use for the source. From Chapter 1 we
had that, in general, ½ where is used to indicate a Markov
source of order . Thus higher order models generally yield lower values for the
entropy. And the lower the entropy is, then from Equation 3.32, the smaller the
130 Fundamentals of Information Theory and Coding Design

average length we expect as we take extensions to the source. This realisation does
not affect how we design the compact codes for extensions to the source, but affects
how we calculate the corresponding th extension symbol probabilities (which are
based on the underlying model of the source) and the resulting code efﬁciency.

RESULT 3.1
When coding the th extension of a source, the probabilities should be derived based
on the “best” or “true” model of the source, and the efﬁciency of the code should
be based on the same model of the source.

EXAMPLE 3.13

Consider the following typical sequence from an unknown binary source:

01010010110101010101

If we assume a zero-memory model of the source then we can see that ,
and thus . This implies that the original binary source is 100%
efﬁcient.

However the “true” model is the ﬁrst order Markov model from Figure 3.4.

0.9

0.1 0 1 0.1

0.9

FIGURE 3.4
True model of source.

The “true” entropy is calculated to be and the original binary source
is, in fact, only efficient! To improve the efficiency we find a binary code for
the second extension. Noting that this gives the second
extension symbol probabilities:
Source Coding 131

Source Compact
Code
½ ½ 0.05 110
½ ¾ 0.45 0
¾ ½ 0.45 10
¾ ¾ 0.05 111

The compact code is shown in the above table. The average length of this code is
¾ bits per symbol pair or ¾¾ bits per symbol which improves the
coding efﬁciency to ¾´¾µ .

3.5.3 Code Efﬁciency and Channel Capacity

Consider the communication system shown in Figure 3.5 for the case of a noiseless
channel.

q−ary
source
message

code
entropy
H(X) Noiseless
Source Encoder r input Decoder Receiver
source Channel
entropy
H(S)
r−ary
code
message

FIGURE 3.5
Noiseless communication system.

The source entropy is interpreted as the average number of bits transmitted per
source symbol. Each source symbol is encoded to a code word of code symbols on
average. Thus the ratio ´ µ represents the average number of bits transmitted per
code symbol. By deﬁnition this gives the entropy of the encoded sequence or code
entropy, . That is:

(3.33)

132 Fundamentals of Information Theory and Coding Design

Since ¾

and
then we have:

(3.34)

Thus if the code is 100% efﬁcient then

which represents the case

of maximum entropy for a source with symbols. Consider the -input noiseless
channel. From Chapter 2 we established that the mutual information of a noiseless
channel is
, where and
are the channel input and output,
respectively, and hence for a noiseless channel

.
RESULT 3.2
We make the following observations:
(a) A 100% efﬁcient source coding implies maximum code entropy and hence
equiprobable code symbols.
(b) A 100% efﬁcient source coding implies maximum or “best” use of the channel.

This result has important implications when we consider the inclusion of a channel
code with the source code. As we will see in Chapter 5 one of the key assumptions
is that the (binary) inputs to the channel coder are equally likely, and with efﬁcient
source coding this will deﬁnitely be the case.

EXAMPLE 3.14
Consider the binary source sequence:

0001100100

Assuming a zero-memory model of the source gives , and

. We ﬁnd the compact binary code of the second extension of the source
is as shown in the following table:
Compact
Code
00 0.49 0
01 0.21 10
10 0.21 110
11 0.09 111

The average length of the code is ¾ and

¾
. The code
entropy is . Consider the corresponding code sequence given the
above compact code and initial source sequence:

0 10 110 10 0
Source Coding 133

We assume a zero-memory model and this gives , and

. Although contrived, this still shows that the code sequence exhibits a higher
entropy than the source.

3.6 Fano Coding

We now describe a coding scheme for the design of instantaneous codes called the
Fano coding scheme, or Shannon-Fano code in reference to the fact that the scheme
was independently published by Shannon [11] and Fano [4], that can yield close to
optimal codes in most cases. Although mainly of historical interest because of its
lack of optimality, Fano coding can be considered the precursor to optimal Huff-
man coding discussed in the next section as it is based on the same principle but
implemented less rigidly.
Fano coding is predicated on the idea that equiprobable symbols should lead to code
words of equal length. We describe Fano’s method for the design of -ary instanta-
neous codes. Consider the source with symbols: and asso-
ciated symbol probabilities . Let the symbols be numbered
so that . In Fano’s method the symbols are di-
vided or split into equiprobable or close to equiprobable groups of symbols. Each
symbol in the th group is then assigned the code symbol and this is done for all
groups. Each of the groups is then further split into sub-groups
(or into less than sub-groups of one symbol if there are less than symbols in the
group) with each symbol in the th sub-group having the code symbol appended
to it for all sub-groups. The process is repeated until the groups can be
split no further.
For the important case of binary codes, Fano coding splits each group into two
equiprobable sub-groups appending a to one group and to the other group.
Groups are successively split until no more groups can be split (i.e., each group
has only one symbol in it).

EXAMPLE 3.15
Consider the design of a binary code using the Fano coding scheme for a 8-symbol
source with the following ordered probability assignments:

134 Fundamentals of Information Theory and Coding Design

The regrouping and code symbol assignment in each step of the Fano coding scheme
are shown in Figure 3.6. The dashed lines represent the splitting of each set of symbols
into two equiprobable groups. For each division a 0 is appended to each symbol in
one group and a 1 is appended to each symbol in the other group. The number in
front of each dashed line indicates the level of grouping. Thus the level 1 grouping
shown in Figure 3.6(a) splits the original source into two groups with one group of
probability 1/2 containing ½ being assigned a 0 and the other group of probability
1/2 containing being assigned a 1. Each of these groups is
split into the level 2 groups as shown in Figure 3.6(b). This continues for each level
group until the level 5 grouping of Figure 3.6(e) when there are no more groups left
to split is reached. Figure 3.6(e) is the binary Fano code for this source. The average

Prob Code Prob Code Prob Code

1/2 0 1/2 0 1/2 0
1 1 1
1/8 1 1/8 10 1/8 100
3
1/8 1 1/8 10 1/8 101
2 2
1/16 1 1/16 11 1/16 110
1/16 1 1/16 11 1/16 110
3
1/16 1 1/16 11 1/16 111
1/32 1 1/32 11 1/32 111
1/32 1 1/32 11 1/32 111
(a) (b) (c)

Prob Code Prob Code

1/2 0 1/2 0
1 1
1/8 100 1/8 100
3 3
1/8 101 1/8 101
2 2
1/16 1100 1/16 1100
4 4
1/16 1101 1/16 1101
3 3
1/16 1110 1/16 1110
4 4
1/32 1111 1/32 11110
5
1/32 1111 1/32 11111

(d) (e)

FIGURE 3.6
Binary Fano coding.

length of the Fano code is bits and since the Fano code
provides the compact code (with efﬁciency of 100%). This result can also be veriﬁed
by noting that the source probabilities are such that this is a special source for binary
codes.
Source Coding 135

In Example 3.15 the symbols are successively divided into equiprobable groups of
1/2, 1/4, 1/8, 1/16 and 1/32. In the majority of cases equal groupings are not possible,
so closest to equal groupings are used and since there may be many such “quasi”
equiprobable groups different Fano codes will result, and not all may yield compact
codes.

EXAMPLE 3.16

Consider the design of a binary code using the Fano coding scheme for the 5-symbol
source with the following probability assignment:

There are two possible Fano codes depending upon how we perform the level 1
grouping. One grouping creates a group of probability 4/9 containing and the
other group of probability of 5/9 containing as shown by Figure
3.7(a). The alternative grouping creates a group of probability 5/9 containing
and the other group of probability 4/9 containing as shown by Figure
3.7(b). Both groupings are equally valid. The average lengths of the two Fano codes

1
4/9 0 2
4/9 0 0
1/9 1 0 0 0 1/9 0 1
4 1
1/9 1 0 0 1 1/9 1 0 0
3 3
1/9 1 0 1 1/9 1 0 1
2 2
1/9 1 1 0 1/9 1 1 0
3 3
1/9 1 1 1 1/9 1 1 1
(a) (b)

FIGURE 3.7
Different binary Fano codes.

shown in Figure 3.7 are and with code (b) being the
compact code. Thus a sub-optimal code may result depending on how the symbols
are grouped.

Example 3.16 demonstrates that Fano codes are not necessarily compact and differ-
ent Fano codes may yield different average lengths. Thus Fano codes are of limited
use since such codes cannot be guaranteed to be compact.
136 Fundamentals of Information Theory and Coding Design

3.7 Huffman Coding

3.7.1 Huffman Codes

We now describe an important class of instantaneous codes known as Huffman codes

which are attributed to the pioneering work of Huffman [7]. Huffman codes arise
when using the Huffman algorithm for the design of instantaneous codes given the
source symbols, , and corresponding source probabilities, . The Huffman
algorithm attempts to assign each symbol a code word of length proportional to the
amount of information conveyed by that symbol. Huffman codes are important be-
cause of the following result.

RESULT 3.3
Huffman codes are compact codes. That is, the Huffman algorithm produces a code
with an average length, , which is the smallest possible to achieve for the given
number of source symbols, code alphabet and source statistics.

We now proceed to outline the basic principles of the Huffman algorithm and then
detail the steps and provide examples for speciﬁc cases.
The Huffman algorithm operates by ﬁrst successively reducing a source with sym-
bols to a source with symbols, where is the size of the code alphabet.

DEFINITION 3.11 Reduced Source Consider the source with symbols:

and associated symbol probabilities .
Let the symbols be renumbered so that ½ . By com-
bining the last symbols of , , into one symbol, ,
with probability, , we obtain a new source, termed a

reduced source of , containing only symbols, .

Call this reduced source . Successive reduced sources can be formed
by a similar process of renumbering and combining until we are left with a source
with only symbols.

It should be noted that we will only be able to reduce a source to exactly symbols if
the original source has symbols where is a non-negative integer.
For a binary code this will hold for any value of . For non-binary codes
if is not an integer value then “dummy” symbols with zero probability
are appended to create a source with symbols, where is the
smallest integer greater than or equal to .
The trivial -ary compact code for the reduced source with symbols is then used
to design the compact code for the preceding reduced source as described by the
following result.
Source Coding 137

RESULT 3.4
Assume that we have a compact code for the reduced source . Designate the last
symbols from ½ , , as the symbols which were combined
to form the combined symbol, of . We assign to each symbol of ,
except the last symbols, the code word used by the corresponding symbol of .
The code words for the last symbols of are formed by appending
to the code word of of to form new code words.

The Huffman algorithm then operates by back-tracking through the sequence of re-
duced sources, , designing the compact code for each source, until the
compact code for the original source is designed.

3.7.2 Binary Huffman Coding Algorithm

For the design of binary Huffman codes the Huffman coding algorithm is as follows:

1. Re-order the source symbols in decreasing order of symbol probability.

2. Successively reduce the source to , then and so on, by combining the
last two symbols of into a combined symbol and re-ordering the new set
of symbol probabilities for in decreasing order. For each source keep
track of the position of the combined symbol, . Terminate the source
reduction when a two symbol source is produced. For a source with symbols
the reduced source with two symbols will be .
3. Assign a compact code for the ﬁnal reduced source. For a two symbol source
the trivial code is .
4. Backtrack to the original source assigning a compact code for the th re-
duced source by the method described in Result 3.4. The compact code as-
signed to is the binary Huffman code.

The operation of the binary Huffman coding algorithm is best shown by the following
example.

EXAMPLE 3.17
Consider a 5 symbol source with the following probability assignments:

Re-ordering in decreasing order of symbol probability produces .

The re-ordered source is then reduced to the source with only two symbols as
shown in Figure 3.8, where the arrow-heads point to the combined symbol created in
138 Fundamentals of Information Theory and Coding Design

by the combination of the last two symbols from ½ . Starting with the trivial
compact code of for and working back to a compact code is designed for
each reduced source following the procedure described by Result 3.4 and shown
in Figure 3.8. In each the code word for the last two symbols is produced by taking
the code word of the symbol pointed to by the arrow-head and appending a 0 and 1
to form two new code words. The Huffman coding algorithm of Figure 3.8 can be
depicted graphically as the binary Huffman coding tree of Figure 3.9. The Huffman
coding tree is of similar form to the decoding tree for instantaneous codes shown in
Figure 3.3 and can thus be used for decoding Huffman codes. The Huffman code
itself is the bit sequence generated by the path from the root to the corresponding leaf
node.

Ë Ë Ë Ë
× 0.4 1 0.4 1 0.4 1 0.6 0
× 0.2 01 0.2 01 0.4 00 0.4 1
× 0.2 000 0.2 000 0.2 01
× 0.1 0010 0.2 001
× 0.1 0011

FIGURE 3.8
Binary Huffman coding table.

The binary Huffman code is:

Symbol Huffman
Code
0.2 01
0.4 1
0.1 0010
0.1 0011
0.2 000

The average length of the code is bits/symbol and the efﬁciency of the

Huffman code is

. Since the Huffman code is a compact
code then any other instantaneous code we may design will have an average length,
, such that .

From Figure 3.8 in Example 3.17 different reduced sources and are possible
by inserting the combined symbol in a different position when re-ordering.
Source Coding 139

root

1.0

1
0

0.6 0.4

1
0

0.4 0.2

1
0

0.2 0.2

1
0

0.1 0.1

FIGURE 3.9
Binary Huffman coding tree.
140 Fundamentals of Information Theory and Coding Design

This may or may not result in a compact code with different individual code word
lengths. Either way the compact code will have the same average length. However
the average length variance:

¾
(3.35)

may be different.

EXAMPLE 3.18
The average length variance for the Huffman code of Example 3.17 is .
Consider a different Huffman code tree for the same source shown by Figures 3.10
and 3.11.

Ë Ë Ë Ë
0.4 00 0.4 00 0.4 1 0.6 0
0.2 10 0.2 01 0.4 00 0.4 1
0.2 11 0.2 10 0.2 01
0.1 010 0.2 11
0.1 011

FIGURE 3.10
Different binary Huffman code table.

The binary Huffman code is now:

Symbol Huffman
Code
0.2 10
0.4 00
0.1 010
0.1 011
0.2 11

Although this different Huffman code for the same source possesses different indi-
vidual code word lengths the average length is still bits/symbol; however,
the average length variance is now .
Source Coding 141

root

1.0

1
0

0.6 0.4

1 1
0 0

0.4 0.2 0.2 0.2

1
0

0.1 0.1

FIGURE 3.11
Different binary Huffman code tree.
142 Fundamentals of Information Theory and Coding Design

The Huffman code from Example 3.18 has a smaller average length variance than
the code produced in Example 3.17. Codes with smaller average length variance are
preferable since they produce a more constant code bit rate. The following result
can be used to ensure the Huffman coding tree produces a compact code with the
smallest average length variance.

RESULT 3.5
If the combined symbol, ½ , is placed at the highest available position for that
probability assignment when the reduced source is re-ordered then the resulting
compact code will possess the smallest average length variance.

3.7.3 Software Implementation of Binary Huffman Coding

The binary Huffman coding algorithm can be implemented in software as a sequence

of merge and sort operations on a binary tree. The Huffman coding algorithm is a
greedy algorithm that builds the Huffman decoding tree (e.g., Figures 3.9 and 3.11)
by initially assigning each symbol as the root node of a single-node tree. The decod-
ing tree is then constructed by successively merging the last two nodes, labeling the
edges of the left-right child pairs of the merge operation with a 0 and 1, respectively,
and sorting the remaining nodes until only one root node is left. The code word for
is then the sequence of labels on the edges connecting the root to the leaf node of
. The details of the algorithm including a proof of the optimality of Huffman codes
can be found in [3].

3.7.4 -ary Huffman Codes

For the design of general -ary Huffman codes the Huffman coding algorithm is as
follows:

1. Calculate ´ ½µ
. If is a non-integer value then append “dummy” sym-
bols to the source with zero probability until there are
symbols.
2. Re-order the source symbols in decreasing order of symbol probability.
3. Successively reduce the source to ½ , then ¾ and so on, by combining
the last symbols of into a combined symbol and re-ordering the new set
of symbol probabilities for ·½ in decreasing order. For each source keep
track of the position of the combined symbol, ·½ . Terminate the source
reduction when a source with exactly symbols is produced. For a source with
symbols the reduced source with symbols will be .

4. Assign a compact -ary code for the ﬁnal reduced source. For a source with
symbols the trivial code is .
Source Coding 143

5. Backtrack to the original source assigning a compact code for the th re-
duced source by the method described in Result 3.4. The compact code as-
signed to , minus the code words assigned to any “dummy” symbols, is the
-ary Huffman code.

The operation of the -ary Huffman code is demonstrated in the following example.

EXAMPLE 3.19
We want to design a compact quaternary code for a source with originally 11
symbols and the following probability assignments (re-ordered in decreasing
order of probability for convenience):

½

Now and since is not an integer value we need to append “dummy”
symbols so that we have a source with symbols where
. Thus we append symbols with .
The source is then reduced to the source with only symbols as shown in
Figure 3.12, where the arrow-heads point to the combined symbol created in by
the combination of the last symbols from . Starting with the trivial compact
code of for and working back to a compact code is designed for each
reduced source following the procedure described by Result 3.4. The code word
for the last symbols is produced by taking the code word of the symbol pointed to
by the arrow-head and appending to form new code words.
The -ary Huffman code is (ignoring the “dummy” symbols and ):
Symbol Huffman
Code
0.16 2
0.14 3
0.13 00
0.12 01
0.10 02
0.10 03
0.06 11
0.06 12
0.05 13
0.04 100
0.04 101

The average length of the code is quaternary units per symbol and the

Huffman code has an efﬁciency of .
144 Fundamentals of Information Theory and Coding Design

Ë Ë½ Ë¾ Ë¿
0.16 2 0.16 2 0.25 1 0.45 0
0.14 3 0.14 3 0.16 2 0.25 1
0.13 00 0.13 00 0.14 3 0.16 2
0.12 01 0.12 01 0.13 00 0.14 3
0.10 02 0.10 02 0.12 01
0.10 03 0.10 03 0.10 02
0.06 11 0.08 10 0.10 03
0.06 12 0.06 11
0.05 13 0.06 12
0.04 100 0.05 13
0.04 101
0.00 102
0.00 103

FIGURE 3.12
-ary Huffman code table.
Source Coding 145

root
1.0

0 1 2 3

0.45 0.25 0.16 0.14

0 1 2 3 0 1 2 3

0.13 0.12 0.10 0.10 0.08 0.06 0.06 0.05

0 1 2 3

0.04 0.04 0.00 0.00

FIGURE 3.13
-ary Huffman code tree.
146 Fundamentals of Information Theory and Coding Design

3.8 Arithmetic Coding

The Huffman coding method described in Section 3.7 is guaranteed to be optimal

in the sense that it will generate a compact code for the given source alphabet and
associated probabilities. However a compact code may be anything but optimal if it
is not very efficient. Only for special sources, where ½ is an
integer for , will the compact code also be efficient. Inefficiencies
are introduced when is a non-integer and it is required to “round-up” to the nearest
integer value. The solution to this as a consequence of Shannon’s Noiseless Coding
Theorem (see Section 3.5) is to consider the th extension of the source for large
enough and build the Huffman code for all possible blocks of length Although this
does yield optimal codes, the implementation of this approach can easily become
unwieldy or unduly restrictive. Problems include:

The size of the Huffman code table is , representing an exponential increase

in memory and computational requirements.

The code table needs to be transmitted to the receiver.

The source statistics are assumed stationary. If there are changes an adaptive
scheme is required which re-estimates the probabilities, and recalculates the
Huffman code table.

Encoding and decoding is performed on a per block basis; the code is not
produced until a block of symbols is received. For large this may require
the last segment to be padded with dummy data.

One solution to using Huffman coding on increasingly larger extensions of the source
is to directly code the source message to a code sequence using arithmetic coding
which we will describe here. Rather than deriving and transmitting a code table of
size , segmenting the incoming source message into consecutive blocks of length
, and encoding each block by consulting the table, arithmetic coding directly pro-

cesses the incoming source message and produces the code “on-the-ﬂy.” Thus there
is no need to keep a code table, making arithmetic coding potentially much more
computationally efﬁcient than Huffman coding. The basic concept of arithmetic cod-
ing can be traced back to the 1960’s; however, it wasn’t until the late 1970’s and mid
1980’s that the method started receiving much more attention, with the paper by
Witten et al. [12] often regarded as one of the seminal papers in the area.
Arithmetic coding, like Huffman coding, does require reliable source statistics to be
available. Consider the -length source message ½ ¾ where
are the source symbols and indicates that the th character in the mes-
sage is the source symbol . Arithmetic coding assumes that the probabilities,
Source Coding 147

½ , for , can be calculated. The origin of these

probabilities depends on the underlying source model that is being used (e.g., zero-
memory source or th order Markov model) and how such models are arrived at
and probabilities estimated is not discussed here. Rather we assume that the required
probabilities are available and concentrate on the encoding and decoding process.
The goal of arithmetic coding is to assign a unique interval along the unit num-
ber line or “probability line” of length equal to the probability of the given
source message, , with its position on the number line given
by the cumulative probability of the given source message, .
However there is no need for direct calculation of either or
. The basic operation of arithmetic coding is to produce this
unique interval by starting with the interval and iteratively subdividing it by
for .
The interval subdivision operation of arithmetic coding proceeds as follows. Con-

sider the ﬁrst letter of the message, . The individual symbols, , are each assigned

the interval where and

. That is the length of each interval is
and the end of the interval is given by . The interval corresponding to the
symbol , that is , is then selected. Next we consider the second letter of
the message, . The individual symbols, , are now assigned the interval
where:

(3.36)

(3.37)

and . That is the length of each interval is

. The interval corresponding to the symbol , that is
,
is then selected. The length of the interval corresponding to the message seen so far,
, is . This interval subdivision
operation is depicted graphically in Figure 3.14.
We continue in this way until the last letter of the message, , is processed and the
ﬁnal interval of length is produced. Any number
that falls within that interval can be chosen and transmitted to identify the interval
(and hence the original message) to the receiver. Arithmetic codes are typically
binary codes. Thus the binary representation of the number is considered. Optimal
coding is assured by selecting a number that requires the minimum number of bits to
be transmitted to the receiver (by only transmitting the signiﬁcant bits and ignoring
148 Fundamentals of Information Theory and Coding Design
½ ¾

0 ½ ¾

½ ¾

FIGURE 3.14
Interval subdividing operation of arithmetic coding.

the trailing 0’s). Since the interval occurs with probability ½ ¾ then
approximately ¾ ½ ¾ bits are needed and where this becomes
equality for all message sequences a efﬁcient coding is possible. A simple
demonstration of arithmetic coding is given in Example 3.20.

EXAMPLE 3.20
Consider the message ¾ ¾ ½ originating from the 3-symbol source with the fol-
lowing individual and cumulative probabilities:
Symbol
½ 0.2 0.2
¾ 0.5 0.7
¿ 0.3 1.0

We further assume that the source is zero-memory, thus ½ ¾ ½
. As shown by Figure 3.15 initially the probability line is divided into
three consecutive, adjoint intervals of , and corresponding
to ½ ,¾ and ¿ and the length ratios , respectively. The length ratios and
interval lengths both correspond to . When the ﬁrst letter of the message, ¾ , is
received the interval is selected. The interval is further divided into
three consecutive, adjoint subintervals of length ratios , that is
and . The length ratios correspond to ¾ and
the subinterval lengths are equal to ¾ ¾ ¾ ¾ .
When the second letter of the message, ¾ , is received the subinterval is
selected. The interval is then further subdivided into the three subintervals
, and with length ratios and when
Source Coding 149

the third and last letter of the message, ½ , is received the corresponding and ﬁnal
interval of length ¾ ¾ ½ ¾ ¾ ½ is selected.
The interval selection process can be summarised by the following table:
Next Letter Interval
¾ [0.2,0.7)
¾ [0.3,0.55)
½ [0.3,0.35)

In binary the ﬁnal interval is [0.01001100,0.01011001). We need to select a number

from within the interval that can be represented with the least number of signiﬁcant
bits. The number 0.0101 falls within the interval and only requires the 4 bits 0101 to be
transmitted. We note that we need to transmit ¾ ¾ ¾ ½ ¾
bits of information and we have been able to do this with 4 bits.

The above description of arithmetic coding is still incomplete. For example, how
does one select the number that falls within the ﬁnal interval so that it can be trans-
mitted in the least number of bits? Let denote the ﬁnal interval. One
scheme described in [6] which works quite well in the majority of cases is to carry
out the binary expansion of and until they differ. Since , at

the ﬁrst place they differ there will be a 0 in the expansion for and a 1 in the
expansion for . That is:

½ ¾ ½
½ ¾ ½

The number ½ ¾ ½ falls within the interval and requires the least number
of bits from any other number within the same interval; so it is selected and trans-
mitted as the -bit code sequence ½ ¾ ½ .

3.8.1 Encoding and Decoding Algorithms

We have described how the message is encoded and transmitted but have not yet
discussed how the received code is decoded back to the original message. Figure
3.16 lists an algorithm for encoding a message using arithmetic coding to the binary
integer code, , and Figure 3.17 lists an algorithm for decoding the received
binary integer, , back to the original message.

NOTE In both the encoding and decoding algorithms mention is made of the
or end-of-ﬁle. Consider Example 3.20. The code sequence 0101 corresponds
to the binary fraction and decimal fraction . The number not
only falls within the ﬁnal interval for the message ¾ ¾ ½ but also for the longer
150 Fundamentals of Information Theory and Coding Design

1.0

0.3 ¿

0.7 0.7

0.3(0.5) = 0.15 ¿

0.55 0.55
0.3(0.25) = 0.075 ¿
0.475
0.5 ¾
0.5(0.5) = 0.25 ¾
0.5(0.25) = 0.125 ¾

0.35
0.2(0.25) = 0.05 ½ ¾ ¾ ½
0.3 0.3
0.2(0.5) = 0.1 ½
0.2 0.2

0.2 ½

FIGURE 3.15
Arithmetic coding for Example 3.20.
Source Coding 151

Deﬁnitions
1. Let denote the distinct source symbols or letters
of the alphabet.
2. Let ½ denote the -length message that is to be encoded where
indicates that the th letter in the message is the symbol .
3. Assume for and are
available or can be estimated from the underlying source model.

Initialisation

Iteration
While do
1. get next letter of message,

2.
3.

4.

5.
6.
7.
done

Termination
Let and . Then transmit
the code as the binary integer .
FIGURE 3.16
Encoding algorithm for arithmetic codes.
152 Fundamentals of Information Theory and Coding Design

Deﬁnitions
1. Let denote the distinct source symbols or letters
of the alphabet.
2. Let denote the binary integer that is to be decoded to the original
message ½ , where indicates that the th letter in the message
is the symbol .
3. Assume for and are
available or can be estimated from the underlying source model.

Initialisation

get and store as the binary fraction .
Iteration
Repeat

1. Start at and repeat:

(a)
(b)

(c)

until

2. output as the symbol
3.
4.
5.
6.
until is reached
FIGURE 3.17
Decoding algorithm for arithmetic codes.
Source Coding 153

length messages ¾ ¾ ½ ¾ and ¾ ¾ ½ ¾ ¾ and so on. Thus there is a need

for the decoder to know when to stop decoding. Possible solutions to this problem
include:

¯ Deﬁning a special or extra symbol called which is placed at the end of

each message.

¯ Transmitting the length of the message, , along with the code.

¯ Encoding and decoding messages in ﬁxed-sized blocks and only transmitting

the size of the block to the decoder at the beginning of the code sequence.

A demonstration of the encoding and decoding algorithms depicted in Figures 3.16

and 3.17 is provided by Examples 3.21 and 3.22.

EXAMPLE 3.21

Consider a zero-memory source with the following source alphabet and probabilities:
Symbol
6/15 6/15 [0.0, 6/15)
2/15 8/15 [6/15, 8/15)
2/15 10/15 [8/15, 10/15)
4/15 14/15 [10/15, 14/15)
1/15 15/15 [14/15, 1.0)

Suppose the message is generated by the source where is the . Apply-

ing the encoding algorithm of Figure 3.16,where ½ ¾ ½ ,
yields the following:
Next letter
Init - - [0.0,1.0)
1.0 [6/15, 8/15) [0.4, 0.533333333)
0.133333333 [0.0, 6/15) [0.4, 0.453333333)
0.053333333 [8/15, 10/15) [0.428444444,0.435555556)
0.007111111 [10/15, 14/15) [0.433185185,0.435081481)
0.001896296 [8/15, 10/15) [0.434196543,0.434449383)
0.000252840 [0.0, 6/15) [0.434196543,0.434297679)
0.000101136 [6/15, 8/15) [0.434236998,0.434250482)
0.000013485 [14/15, 1.0) [0.434249583,0.434250482)

The binary representation of the and is:

154 Fundamentals of Information Theory and Coding Design

and hence the integer code that uniquely identiﬁes the interval that is transmitted is
the 16-bit .

EXAMPLE 3.22

Consider decoding the received code produced by the

encoding process in Example 3.21. Applying the decoding algorithm of Figure 3.17
given the source alphabet and probabilities from Example 3.21 to the stored value of
yields the following:

[0.0, 1.0) 1.0 0.434249878 : [6/15, 8/15)
[0.4, 0.533333333) 0.133333333 0.256874149 : [0, 6/15)
[0.4, 0.453333333) 0.053333333 0.642185373 : [8/15, 10/15)
[0.428444444, 0.435555556) 0.007111111 0.816389094 : [10/15, 14/15)
[0.433185185, 0.435081481) 0.001896296 0.561459102 : [8/15, 10/15)
[0.434196543, 0.434449383) 0.000252840 0.210943262 : [0, 6/15)
[0.434196543, 0.434297679) 0.000101136 0.527358154 : [6/15, 8/15)
[0.434236998, 0.434250482) 0.000013485 0.955186157 : [14/15, 1)

and the decoded message is where is the used to terminate the
decoding process.

3.8.2 Encoding and Decoding with Scaling

The encoding and decoding algorithms depicted in Figures 3.16 and 3.17 iteratively
reduce the length of the interval with each successive letter from the message. Thus
with longer messages, smaller intervals are produced. If the interval range becomes
too small, then underflow is a very real problem. In Example 3.21 the interval range
is already down to 1.011e-04 after only the 8th letter in the message and in the binary
representation of the final interval the first (most significant) 16 bits are
identical. With modern 64-bit CPUs the and will become indistinguishable
when the first 64 bits are identical. The practical implementation of the encoding
and decoding algorithms requires an added scaling operation. The scaling operation
rescales the current based on the pseudo-algorithm depicted in Figure
3.18.
The scaling operation not only solves the underflow problem but permits transmis-
sion of the coded value before the encoding operation is complete. Thus a long
message block can be encoded without having to wait until the end of the message
before transmitting the coded value. Similarly, the decoder can begin decoding the
Source Coding 155

Rescale
If ½ ¾ ½ ½¾ and ½ ¾ ½ ½ ¾ then
rescale to:

½¾

½ ¾

If encoding then ½ ¾ ½ (initially is the NULL

string), where is the string concatenation operator.
If decoding then ½ , where returns the fractional
component of the product.
FIGURE 3.18
Scaling algorithm to avoid underﬂow.

most signiﬁcant bits of the coded value (which are transmitted ﬁrst by the encoder
when rescaling the intervals) before all the bits have been received. Examples 3.23
and 3.24 illustrate the use of the scaling algorithm of Figure 3.18 for the encoding
and decoding of arithmetic codes.

EXAMPLE 3.23

The encoding from Example 3.21 is now repeated with the additional rescaling oper-
ation of Figure 3.18 to prevent underﬂ
ow from occurring. The encoding with scaling
is shown in the following table where indicates that the and
are expressed in base :
Next letter ½¼ ¾
Init - - [0.0, 1.0) [0.0, 1.0)
1.0 [6/15, 8/15) [.4, .533333) [.01100110, .10001000)
.133333 [0, 6/15) [.4, .453333) [.01100110, .01110100)
(rescale) [.2, .626667) [.00110..., .10100...) 011
.426667 [8/15, 10/15) [.427556, .484444) [.01101101, .01111100)
(rescale) [.420444, .875556) [.01101..., .11100...) 011
.455111 [10/15, 14/15) [.723852, .845215) [.10111001, .11011000)
(rescale) [.447704, .690430) [.0111001..., .1011000...) 1
.242726 [8/15, 10/15) [.577158, .609521) [.10010011, .10011100)
(rescale) [.234520, .752335) [.0011..., .1100...) 1001
.517815 [0, 6/15) [.234520, .441646) [.00111100, .01110001)
(rescale) [.469040, .883292) [.0111100..., .1110001...) 0
.414252 [6/15, 8/15) [.634741, .689975) [.10100010, .10110000)
(rescale) [.077928, .519797) [.00010..., .10000...) 101
.441869 [14/15, 1) [.490339, .519797) [.01111101, .10000101)
(terminate) 1

When the last or letter of the message is read a ﬁnal is transmitted. Concatenat-
ing the intermediate bits that were output yields the ﬁnal
156 Fundamentals of Information Theory and Coding Design

which is the same as in Example 3.21.

EXAMPLE 3.24

The decoding from Example 3.22 is repeated with the additional rescaling operation
of Figure 3.18. The decoding processing with scaling is shown in the following table:
½¼ ¾

- - [0.0, 1.0) [0.0, 1.0) 1.0 .434250
.434250 [.400000,.533333) [.01100110, .10001000) .133333 .434250
.256874 [.4, .453333) [.01100110, .01110100)
(rescale) [.2, .626667) [.001100..., .10100...) .426667 .473999
.642185 [.427556, .484444) [.01101101, .01111100)
(rescale) [.420444, .875556) [.01101..., .11100...) .455111 .791992
.816389 [.723852, .845215) [.10111001, .11011000)
(rescale) [.447704, .690430) [.0111001..., .1011000...) .242726 .583984
.561459 [.577158, .609521) [.10010011, .10011100)
(rescale) [.234520, .752335) [.0011..., .1100...) .517815 .343750
.210944 [.234520, .441646) [.00111100, .01110001)
(rescale) [.469040, .883292) [.0111100..., .1110001...) .414252 .687500
.527360 [.634741, .689975) [.10100010, .10110000)
(rescale) [.077928, .519797) [.00010..., .10000...) .441870 .500000
.955197

NOTE The scaling operation not only expands the interval but repositions it
around 0.5. That is, the rescaled interval has and . Thus,
rescaling is needed whenever and are both less than or greater than 0.5.
One pathological condition that may arise and defeat the scaling operation is when
is just below 0.5 and is just above 0.5. Consider:

The interval cannot be rescaled and the interval and

when the CPU bit-size is exceeded and underﬂ
ow occurs.

3.8.3 Is Arithmetic Coding Better Than Huffman Coding?

Shannon’s noiseless coding theorem states that if one designs a compact binary code,
using Huffman coding, for the th extension of a source then the average length of
the code, , is no greater than ½ . A similar theorem outlined in [6] states
Source Coding 157

that for the arithmetic coding scheme just described the average length is also no
greater than ½
when encoding a message of length . Thus Huffman coding
and arithmetic coding exhibit similar performance in theory. However the arith-
metic code encoding and decoding algorithms with scaling can be implemented for

messages of any length . Huffman coding for the th extension of a source, how-

ever, becomes computationally prohibitive with increasing since the computational
complexity is of the order .

3.9 Higher-order Modelling

Shannon’s Noiseless Coding Theorem discussed in Section 3.5 states that code efﬁ-

ciency can be improved by coding the th extension of the source. Huffman coding

is then used to derive the compact code for , the th extension of the source. The

arithmetic codes from Section 3.8 directly code the -length message sequence to the
code word. In both cases source statistics are needed to provide the required condi-
tional and joint probabilities which we obtain with the appropriate model. But what
model do we use? Result 3.1 alluded to the requirement that we use the “true” model
for the source, except that in practice the order of the “true” model is not known, it
may be too high to be computationally feasible or, indeed, the “true” model is not a
Markov model of any order!We now formally deﬁne what we mean by the different
orders of modelling power.

DEFINITION 3.12 th-order Modelling

In th-order modelling the joint probabilities ½ ¾
·½ of all possible
-length message sequences are given. Furthermore:

¯ Since

½ ¾ ½ ·½ ·½

·½ ·¾ ·½
(3.38)
then the joint probabilities for the th-order model, where , are
also given.
Since

·½½ ¾ ½ ¾ ·½ (3.39)
½ ¾
then the stationary state and conditional probabilities for the th order
Markov model, where
, are also provided.

We need to consider what effect the model order has on the Huffman and arithmetic
158 Fundamentals of Information Theory and Coding Design

coding schemes we have discussed. In Chapter 1 it was shown that higher-order

Markov models yield lower values of entropy, and since the lower bound on the
average length of a code is the entropy, higher-order modelling would be expected
to produce more efﬁcient codes (see Result 3.1). Thus it is not only important to
consider the implementation of the encoder and decoder for both the Huffman and
arithmetic algorithms but also how the source statistics are calculated.

Ideally for encoding a -length message or th extension of a source the highest-

order model of order should be used. Thus coding of longer messages
and source extensions will indirectly beneﬁt from higher orders of modelling power
which, from Shannon’s Noiseless Coding Theorem and Result 3.1, will produce
more efﬁcient codes as the source entropy decreases. Intuitively, longer messages
and extensions are able to better exploit any redundant or repetitive patterns in the
data.

3.9.1 Higher-order Huffman Coding

With higher-order Huffman coding we consider the case of deriving the Huffman
code for the th extension of the source, , using a th order model of the source,
.

If then the joint probabilities ½ ¾ of the th

order (Markov) model can be used.

If then the product of the joint th order probabilities can be used.

In practice the -gram model probabilities ½ ¾ ·½ are directly
estimated from the frequency counts:

½ ¾ ·½

½
½
¾
¾
·½
·½
(3.40)

where ½ ¾ ·½ is the number of times the -length message,
½ ¾ ·½ , is seen and ½ ¾ ·½ is the sum of all -
length messages that have been seen so far.

When building the Huffman coding tree only the relative order of the th extension
probabilities is important. Thus a ﬁxed scaling of the probabilities can be applied
without affecting the Huffman coding algorithm. Thus practical implementations use
the model counts, ½ ¾ ·½ , directly instead of the model probabilities,
½ ¾ ·½ .
Source Coding 159

EXAMPLE 3.25
After observing a typical length binary message the following nd order model
counts are produced:

The binary Huffman code for the rd extension of the source is to be designed based
on the above nd order model. The binary Huffman code tree is shown in Figure 3.19
where the counts are directly used for construction of the tree.
The Huffman code is given in the following table:
rd extension Huffman code
000 1
001 010
010 0010
011 0000
100 011
101 00011
110 0011
111 00010

3.9.2 Higher-order Arithmetic Coding

When performing arithmetic coding on a message of length we require knowledge

of the conditional probabilities ½ ¾ ½ for as shown
by Figure 3.16. Assuming the th order model is given then:

If then ½ ¾ ½ ´½ ½ ¾ ¾ ½ ½ µ for
´ µ

where the joint probabilities of the th order model can be

used.
If then for we have ½ ¾ ½
·½ ½
´
´

·½

½ µ
·½ ½ µ
.
160 Fundamentals of Information Theory and Coding Design

root

100
1
0

54 47
1
000
0

29 24
1 1
0 0

15 14 12 12
1 1 001 100
0 0

8 7 7 7

011 1 010 110

4 3

111 101

FIGURE 3.19
¿rd order Huffman coding tree.
Source Coding 161

The th order probabilities are derived from the respective frequency counts of the
data seen thus far using Equation 3.40. Unlike Huffman coding, arithmetic cod-

ing requires the absolute probabilities so a count of both ½ ¾ ·½ and
½ ¾ ·½ are needed.

EXAMPLE 3.26
Consider performing arithmetic coding on the message 01011 based on the nd order
model counts from Example 3.25. From the counts the rd extension, nd extension
and single binary probabilities can be calculated as shown in Figure 3.20.

P(000)=0.47
P(00)=0.59
P(001)=0.12 P(0)=0.74
P(010)=0.07
P(01)=0.15
P(011)=0.08
P(100)=0.12 P(10)=0.15
P(101)=0.03 P(1)=0.26
P(110)=0.07 P(11)=0.11
P(111)=0.04

FIGURE 3.20
Joint probabilities up to 3rd order.

´ ½ ¾µ
Applying the encoding algorithm of Figure 3.16 where ¾ ½

´ ½ µ
and
´ ¾ ½ µ
¾ ½

´

¾

½µ
yields:

½ ¾ ½ ¾ ½
- - - - [0.0, 1.0)
0/- 1.0 [0.0, 0.74) [0.0, 0.74)
1/0 0.74 ´¼½µ
´¼µ [0.797, 1.0) [0.590, 0.740)
´¼½¼µ
0/01 0.15 ´¼½µ
[0.0, 0.467) [0.590, 0.660)
´½¼½µ
1/10 0.070 ´½¼µ
[0.800, 1.0) [0.646, 0.660)
´¼½½µ
1/01 0.014 ´¼½µ
[0.467, 1.0) [0.6525, 0.6600)

The corresponding interval sub-division of the probability line [0.0,1.0) is shown in

Figure 3.21.
The ﬁnal interval for the message 01011 is [0.6525, 0.6600) which in binary is
[0.101001110000, 0.101010001111) and hence the code word that is transmitted
is 10101. It should be noted that for this example the same number of bits is used for
both the code word and the message.
162 Fundamentals of Information Theory and Coding Design

1.0 0.74 0.74 0.66 0.66

1 1
1
0.59 0.646
0.74 1 1

0.66 0.652
0 0
0

0 0

0.0 0.0 0.59 0.59 0.646

FIGURE 3.21
Subdivision of probability line.
Source Coding 163

3.10 Exercises
1. For the following binary codes, determine:

(a) Whether the code is uniquely decodable. If not, exhibit two source mes-
sages with the same code.
(b) Whether the code is instantaneous. If not, can you design an instanta-
neous code with the same lengths of code words?

code A code B code C code D code E code F code G code H

000 0 0 0 0 0 01 1010
001 01 10 10 10 100 011 001
010 011 110 110 1100 101 10 101
011 0111 1110 1110 1101 110 1000 0001
100 01111 11110 1011 1110 111 1100 1101
101 011111 111110 1101 1111 001 0111 1011

2. Consider a block code based on an r-symbol code alphabet and designed for
a source with q possible messages. Derive the expression for the lower bound
on the block code word length.

3. We are going to devise a code for the decimal digits 0-9 using a binary code.
Analysis shows that we should use the shortest codes for 0 and 1. If we code:

digit 0 to code 00

digit 1 to code 11

ﬁnd the minimum code word length for the remaining digits 2-9 assuming
we want them to be the same code word length.

4. For the following code word lengths:

Code Lengths
A 222444
B 11233
C 112222

(a) Can an instantaneous binary code be formed? If so, give an example of

such a code.
(b) Can an instantaneous ternary code be formed? If so, give an example of
such a code.
164 Fundamentals of Information Theory and Coding Design

(c) If neither a binary nor ternary code can be formed ﬁnd the smallest num-
ber of code symbols that will allow a code to be formed. Give an example
of such a code.

*5. Find all possible combinations of the individual code word lengths ½
when coding the messages which result in uniquely decodable binary
codes with words not more than 3 bits long (i.e., ¿).
6. A 6-symbol source has the following statistics and suggested binary and ternary
codes:

Code A Code B
0.3 0 00
0.2 10 01
0.1 1110 02
0.1 1111 10
0.2 1100 11
0.1 1101 12

(a) What is the efﬁciency of binary code A?

(b) What is the efficiency of ternary code B?
(c) Can you design a more efficient binary code and, if so, what is the effi-
ciency of your code?
(d) Can you design a more efficient ternary code and, if so, what is the effi-
ciency of your code?
(e) Which is the most efficient code: binary or ternary?

7. Consider the following information source:

Symbol

0.1 0.1 0.45 0.05 0.2 0.1

(a) What is ?
(b) Derive a compact binary code using Huffman coding. What is the aver-
age length of your code?
(c) Can you find another compact code with different individual code word
lengths? If so, what is the average length of this code?
(d) What is the efficiency of the code?
(e) What coding strategy can be used to improve the efficiency?

8. Consider the following source:

Source Coding 165

Symbol ½
0.1 0.2 0.2 0.4 0.1

Design a compact binary code for the source with minimum variance between
code word lengths. What is the efﬁciency of your code? Now design the
compact binary code with maximum variance and compare.
9. Derive the compact ternary code using Huffman coding for the following
source and compute the efﬁciency:

Symbol
0.07 0.4 0.05 0.2 0.08 0.05 0.12 0.03

10. Consider the following binary source:

Symbol
0.1 0.9

(a) Find .
(b) What is the compact (indeed trivial!) binary code for this source? Com-
pare the average length with . What is the efﬁciency?
(c) Repeat (b) for (the extension of ) for . Find and
compare this with . What is happening?

11. A binary source has the following 2nd extension probabilities:

Symbol
00 0.75
01 0.1
10 0.1
11 0.05

(a) Design the source encoder by devising a compact code for the source.
(b) What is the efﬁciency of the code you designed in (a)?
(c) What is the entropy of the source encoder output (the code entropy),
? Compare your answer with the source entropy, . Will this
always be the case?

*12. Suppose a long binary message contains half as many 1’s as 0’s. Find a binary
Huffman coding strategy which uses:
166 Fundamentals of Information Theory and Coding Design

(a) at most 0.942 bits per symbol.

(b) at most 0.913 bits per symbol.

13. What is the efﬁciency of a binary source Ë in which 1 has a probability of

0.85? Find an extension of Ë with efﬁciency at least 95%.
*14. A binary source emits 0 with probability 3/8. Find a ternary coding scheme
that is at least 90% efﬁcient.
15. A source emits symbol Ü a third of the time and symbol Ý the rest of the time.
Devise a ternary coding scheme which is:

(a) at most 0.65 ternary units per symbol

(b) at most 0.60 ternary units per symbol
(c) at most 0.55 ternary units per symbol

What is the efficiency of the code you devised for the above cases?
16. You are required to design a ternary source encoder that is at least 97% ef-
ficient. Observations of the ternary source over a certain time interval reveal
that symbol ×½ occurred 15 times, ×¾ occurred 6 times and ×¿ occurred 9 times.
Design the coding scheme. What is the efficiency of your code?
17. A binary message has been modeled as a 1st order Markov source:

0.3

0.7 0 1 0.9

0.1

(a) Derive a binary compact code for the 2nd extension of the source. What
is the efﬁciency of your code?
(b) Repeat (a) for the 3rd extension of the source.

18. A binary source emits an equal number of 0’s and 1’s but emits the same
symbol as the previous symbol twice as often as emitting a different symbol
than the previous symbol. Derive a binary encoding which is at least 95%
efﬁcient.
19. The output of an unknown binary information source transmitting at 100 bps
(bits per second) produces a different output (from the previous output) three
times as often as producing the same output. Devise a binary source coding
strategy for each of the following cases, so that the channel bit rate is:
Source Coding 167

(a) at most 95 bps.

(b) at most 80 bps.

*20. Analysis of a binary information source reveals that the source is three times
more likely to emit the same bit if the last two bits were the same; otherwise it
is equally likely to produce a 0 or a 1.

(a) Design a coding scheme which is at least 85% efﬁcient.

(b) Design a coding scheme which is at least 90% efﬁcient.

*21. Consider the source of Qu. 10:

Symbol ½ ¾
0.1 0.9

(a) Perform arithmetic coding on the sequence ¾ ½ ¾ to produce the coded

output. Compare your answer with the Huffman code of the 3rd exten-
sion of the source from 10(c).
(b) Repeat (a) for the sequence ½ ½ ½
(c) Repeat (a) for the sequence ½ ¾ ½
(d) From the above cases, is arithmetic coding of a 3 letter message compa-
rable to, superior to or inferior to Huffman coding of the corresponding
3rd extension of the source?

NOTE Do not use any scaling. To convert from base 10 to base 2 (and vice
versa) you will need a number base converter that can deal with non-integer
values. One such tool available from the Internet is
http://www.math.com/students/converters/source/base.htm.

22. Perform arithmetic decoding of your coded output values from Qu. 21(a),(b),(c)
and conﬁrm that the original 3 letter messages are produced.
23. Repeat Qu. 21 but this time use scaling.
24. Repeat Qu. 22 but this time use scaling.
*25. Huffman codes are guaranteed to be instantaneous codes. Can the same state-
ment be made about arithmetic codes?
26. Consider Example 3.26:

(a) Perform arithmetic coding of the sequence 011 and compare with the
corresponding Huffman code from Example 3.25.
(b) Repeat (a) for the sequence 000.
168 Fundamentals of Information Theory and Coding Design

(c) Repeat (a) for the sequence 001.

(d) From the above cases, is arithmetic coding comparable to, superior to or
inferior to Huffman coding?

27. Consider Example 3.26:

(a) Perform arithmetic decoding of the coded binary integer value 10101
assuming the original message was 3 bits long.
(b) Repeat (a) but for the coded binary integer value of 1.

*28. After observing a typical sample of a binary source the following nd order
model counts are produced:

(a) Derive the binary Huffman code table for the 3rd extension of the source
such that there is minimum variance in the individual code word lengths.
What is the average length of your code?
(b) Perform arithmetic coding on the following 3 letter messages and com-
pare with the corresponding Huffman code:
i. 110
ii. 111
iii. 100

3.11 References
[1] ASCII, retrieved January 25, 2002 from
http://www.webopedia.com/TERM/A/ASCII.html

Uniquely Decodable Codes Overview
No ratings yet
Uniquely Decodable Codes Overview
6 pages
Source Coding: 1. Introduction-Encoding of The Source Output 2. Shannon S Encoding Algorithm 3. 4. 5. Outcome
No ratings yet
Source Coding: 1. Introduction-Encoding of The Source Output 2. Shannon S Encoding Algorithm 3. 4. 5. Outcome
15 pages
Source Coding for Digital Systems
No ratings yet
Source Coding for Digital Systems
26 pages
Designing Optimal Binary Prefix Codes
No ratings yet
Designing Optimal Binary Prefix Codes
37 pages
Binary Source Coding Basics
No ratings yet
Binary Source Coding Basics
16 pages
Data Communication Codes Guide
No ratings yet
Data Communication Codes Guide
10 pages
Data Compression: Coding Techniques
No ratings yet
Data Compression: Coding Techniques
18 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
9 pages
Coding & Information Theory: By: Shiva Navabi January, 29 2011
No ratings yet
Coding & Information Theory: By: Shiva Navabi January, 29 2011
38 pages
Lec 2
No ratings yet
Lec 2
17 pages
Information Theory and Coding: What You Need To Know in Today's ICE Age!
No ratings yet
Information Theory and Coding: What You Need To Know in Today's ICE Age!
44 pages
Unit 2
No ratings yet
Unit 2
28 pages
Source Coding in Information Theory
No ratings yet
Source Coding in Information Theory
5 pages
Lecture 4-Print
No ratings yet
Lecture 4-Print
18 pages
Ees452 2021 2.1
No ratings yet
Ees452 2021 2.1
9 pages
Lecture 4
No ratings yet
Lecture 4
65 pages
Source Encoding and Code Types
No ratings yet
Source Encoding and Code Types
186 pages
Source Coding & Theorems Guide
No ratings yet
Source Coding & Theorems Guide
29 pages
Uniquely Decodable Codes in Compression
No ratings yet
Uniquely Decodable Codes in Compression
4 pages
Source Coding Theorem Explained
No ratings yet
Source Coding Theorem Explained
27 pages
Data Compression
No ratings yet
Data Compression
26 pages
Lecture 4
No ratings yet
Lecture 4
34 pages
Week 2
No ratings yet
Week 2
73 pages
5.3 Kraft Inequality and Optimal Codeword Length: Theorem 22 Let X
No ratings yet
5.3 Kraft Inequality and Optimal Codeword Length: Theorem 22 Let X
11 pages
Unique Decipherability of Codes
No ratings yet
Unique Decipherability of Codes
5 pages
Entropy Coding
No ratings yet
Entropy Coding
18 pages
Source Coding: Importance and Types
No ratings yet
Source Coding: Importance and Types
2 pages
Coding Theory and Applications (I) : A Quick Introduction To Coding Theory
No ratings yet
Coding Theory and Applications (I) : A Quick Introduction To Coding Theory
86 pages
Huffman and Lempel-Ziv-Welch
No ratings yet
Huffman and Lempel-Ziv-Welch
14 pages
Source Coding Techniques
No ratings yet
Source Coding Techniques
18 pages
Coding Theory Lecture Notes
100% (1)
Coding Theory Lecture Notes
73 pages
Lossless Data Compression Basics
100% (2)
Lossless Data Compression Basics
26 pages
Imran Farid on Source Coding Techniques
No ratings yet
Imran Farid on Source Coding Techniques
32 pages
Source Coding and Shannon's Theorem
No ratings yet
Source Coding and Shannon's Theorem
30 pages
Information Theory and Coding PDF
No ratings yet
Information Theory and Coding PDF
150 pages
Shan PDF
No ratings yet
Shan PDF
104 pages
ITC InformationTheory 2024 25 Winetr
No ratings yet
ITC InformationTheory 2024 25 Winetr
12 pages
MATH 291T Coding Theory: California State University, Fresno
No ratings yet
MATH 291T Coding Theory: California State University, Fresno
74 pages
Algebraic Coding Theory Guide
No ratings yet
Algebraic Coding Theory Guide
41 pages
Understanding Information Theory Basics
No ratings yet
Understanding Information Theory Basics
19 pages
Shannon's Coding Techniques Explained
No ratings yet
Shannon's Coding Techniques Explained
68 pages
Data Compression Basic Concepts of Data Compression Data Compression
No ratings yet
Data Compression Basic Concepts of Data Compression Data Compression
21 pages
Source Coding Techniques in Digital Communication
No ratings yet
Source Coding Techniques in Digital Communication
31 pages
Unit 5 - Part-Ii
No ratings yet
Unit 5 - Part-Ii
41 pages
CS CT 2 Alvi
No ratings yet
CS CT 2 Alvi
6 pages
Communication Coding Basics
No ratings yet
Communication Coding Basics
5 pages
Data Compression
No ratings yet
Data Compression
35 pages
ECCLectureNotes 2
No ratings yet
ECCLectureNotes 2
81 pages
Coding Theory for Tech Enthusiasts
No ratings yet
Coding Theory for Tech Enthusiasts
46 pages
18ec501 U4lm1
No ratings yet
18ec501 U4lm1
20 pages
Instantaneous Codes and Kraft Inequality
No ratings yet
Instantaneous Codes and Kraft Inequality
5 pages
Source & Channel Encoding Basics
No ratings yet
Source & Channel Encoding Basics
15 pages
Introduction to Coding Theory
No ratings yet
Introduction to Coding Theory
28 pages
نظرية الترميز ١
No ratings yet
نظرية الترميز ١
8 pages
Code - Wikipedia
No ratings yet
Code - Wikipedia
4 pages
Homework Psychology Definition
100% (1)
Homework Psychology Definition
5 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Unit Test
No ratings yet
Unit Test
3 pages
Cement Application Flyer 2016
No ratings yet
Cement Application Flyer 2016
9 pages
Diagrama Electrico c15 9nz
100% (5)
Diagrama Electrico c15 9nz
9 pages
09
No ratings yet
09
7 pages
CUSAT B.Tech Syllabus 2006: CS Sem VII
No ratings yet
CUSAT B.Tech Syllabus 2006: CS Sem VII
19 pages
Call For Young Professionals Application G
No ratings yet
Call For Young Professionals Application G
2 pages
Application Note Abstract: Implementation of MPPT Solar Charge Controller With Powerpsoc
No ratings yet
Application Note Abstract: Implementation of MPPT Solar Charge Controller With Powerpsoc
14 pages
Inventory & Downtime Cost Analysis
No ratings yet
Inventory & Downtime Cost Analysis
2 pages
Plastic Moment of Resistance
No ratings yet
Plastic Moment of Resistance
5 pages
Dining Philosophers: Deadlock Solutions
No ratings yet
Dining Philosophers: Deadlock Solutions
6 pages
The Half Moon
No ratings yet
The Half Moon
12 pages
TESDA Circular No. 150-2020
No ratings yet
TESDA Circular No. 150-2020
42 pages
Parmar SSC
100% (1)
Parmar SSC
14 pages
Physicalsciences Grade 11 Term 1 1 3
No ratings yet
Physicalsciences Grade 11 Term 1 1 3
4 pages
Challenges in New Normal Education
100% (1)
Challenges in New Normal Education
8 pages
Danfoss Series 90 Pump and Motor Guide
100% (1)
Danfoss Series 90 Pump and Motor Guide
34 pages
Initial Imperfection Analysis and Design of Concrete-Filled Stiffened Thin-Walled Steel Tubular
No ratings yet
Initial Imperfection Analysis and Design of Concrete-Filled Stiffened Thin-Walled Steel Tubular
13 pages
External Reservoir For Seal Barrier: Safematic Safesiphon 10
No ratings yet
External Reservoir For Seal Barrier: Safematic Safesiphon 10
4 pages
Volkswagen Sachsen: Driving Eco-Friendly E-Mobility by Using Smarter Finances To Fine-Tune Sustainable Manufacturing
No ratings yet
Volkswagen Sachsen: Driving Eco-Friendly E-Mobility by Using Smarter Finances To Fine-Tune Sustainable Manufacturing
6 pages
Oral Pathology Charts II
No ratings yet
Oral Pathology Charts II
7 pages
Talking With Your Angels
100% (6)
Talking With Your Angels
39 pages
Personal Development Quarter 2 Module
No ratings yet
Personal Development Quarter 2 Module
9 pages
3452
No ratings yet
3452
12 pages
Definity: Enterprise Communications Server
No ratings yet
Definity: Enterprise Communications Server
2,350 pages
NE List of Students SEM 2 - 2024-25
No ratings yet
NE List of Students SEM 2 - 2024-25
9 pages
Spanish 2 Unit: Emotion & Adversity
No ratings yet
Spanish 2 Unit: Emotion & Adversity
2 pages
Psycology101: The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV)
No ratings yet
Psycology101: The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV)
18 pages
Grade 8 Revision Sheet Answers
No ratings yet
Grade 8 Revision Sheet Answers
12 pages

Ch3:source Code

Uploaded by

Ch3:source Code

Uploaded by

Chapter 3

abccde 010010 010010 abccde

DEFINITION 3.1 Source Coding Let the information source be described

DEFINITION 3.2 nth Extension of a Code The th extension of a code

 a non-singular code if all the code words are distinct.

All practical codes must be uniquely decodable.

DEFINITION 3.3 Unique Decodability A code is said to be uniquely decodable

NOTE The following observations can be made:

1. A block code of length which is non-singular is uniquely decodable

2. It is a non-trivial exercise to prove whether a code is uniquely decodable in

3. A code is proven to be NOT uniquely decodable if there is at least one instance

The code alphabet of most interest is the binary code,

Code A is NOT non-singular since       .

3.2 Instantaneous Codes

Although usable codes have to be at least uniquely decodable, there is a subclass of

DEFINITION 3.4 Instantaneous Codes A uniquely decodable code is said to be

Codes A, B and C are uniquely decodable (since no instance of a non-unique decoding

Now consider the following encoded string from code C:

DEFINITION 3.5 Preﬁx of a Code Let ½ ¾ be a code word

DEFINITION 3.6 Preﬁx Condition A necessary and sufﬁcient condition for a

NOTE Since an instantaneous code is uniquely decodable the preﬁx condition is

Code C is not instantaneous since ½ is a preﬁx of ¾ , etc. However it is

3.2.1 Construction of Instantaneous Codes

Problem 1: Design an instantaneous binary code with lengths of 3, 2, 3, 2, 2

Problem 2: Design an instantaneous ternary code with lengths 2, 3, 1, 1, 2

Problem 3: Design an instantaneous binary code with lengths 2, 3, 2, 2, 2

3.2.2 Decoding Instantaneous Codes

3.2.3 Properties of Instantaneous Codes

Property 1 Easy to prove whether a code is instantaneous by inspection of whether

Property 2 The preﬁx code permits a systematic design of instantaneous codes

Property 4 arises because of McMillan’s Theorem which will be discussed in the

3.2.4 Sensitivity to Bit Errors

Consider the following source sequence:



where indicates that symbol is being transmitted at time . The corresponding

 



½½       ¼¼½¼¼½½½¼½¼¼

3.3 The Kraft Inequality and McMillan’s Theorem

3.3.1 The Kraft Inequality

THEOREM 3.1 Kraft Inequality

By multiplying the th equation by and rearranging we get:



 

3.3.2 McMillan’s Theorem

THEOREM 3.2 McMillan’s Theorem

The proof of McMillan’s Theorem is presented as it is instructive to see the way it

PROOF Assume a uniquely decodable code and consider the quantity:

½ ¾

=      be the maximum code word length over the code

The implication of McMillan’s Theorem is that for every non-instantaneous uniquely

3.4 Average Length and Compact Codes

3.4.1 Average Length

DEFINITION 3.7 Average Length Deﬁne  as the individual

Then the average length of the code, , is given by:

DEFINITION 3.8 Compact Codes Consider a uniquely decodable code that

REMARK 3.1 Given     for a source with  symbols the compact

such that the Kraft Inequality constraint is satisﬁed:

3.4.2 Lower Bound on Average Length

with equality when  for  where  is the probability of source

Using the inequality  (with equality when where 

with equality when 

where if the code is  efﬁcient.

          

and code A has an efﬁciency of ½ .

DEFINITION 3.10 Special Source A sourcewith symbol probabilities 

Now consider the following 9-symbol source:

3.5 Shannon’s Noiseless Coding Theorem

PROOF Let ½        and consider the sub-optimal coding scheme

Let ½ ¾ be the th extension of the source ½ ¾ 

    (3.25)

The  can be considered the joint entropy of independent sources since:

and from Section 1.18 we have the result that 

THEOREM 3.5 Shannon’s Noiseless Coding Theorem

The efﬁciency of this compact code (which is no code at all!) is   

3.5.2 Shannon’s Theorem for Markov Sources

a non-singular code if all the code words are distinct.

Code A is NOT non-singular since .

½½ ¼¼½¼¼½½½¼½¼¼

= be the maximum code word length over the code

DEFINITION 3.7 Average Length Deﬁne as the individual

Then the average length of the code, , is given by:

REMARK 3.1 Given for a source with symbols the compact

with equality when for where is the probability of source

Using the inequality (with equality when where

with equality when

where if the code is efﬁcient.

and code A has an efﬁciency of ½ .

DEFINITION 3.10 Special Source A sourcewith symbol probabilities

PROOF Let ½ and consider the sub-optimal coding scheme

Let ½ ¾ be the th extension of the source ½ ¾

(3.25)

The can be considered the joint entropy of independent sources since:

and from Section 1.18 we have the result that

The efﬁciency of this compact code (which is no code at all!) is

(3.30)

Assuming a zero-memory model of the source gives , and

The average length of the code is ¾ and

DEFINITION 3.11 Reduced Source Consider the source with symbols:

reduced source of , containing only symbols, .

Re-ordering in decreasing order of symbol probability produces .

½

The size of the Huffman code table is , representing an exponential increase

½ , for , can be calculated. The origin of these

1. Start at and repeat:

length messages ¾ ¾ ½ ¾ and ¾ ¾ ½ ¾ ¾ and so on. Thus there is a need

¯ Transmitting the length of the message, , along with the code.

Suppose the message is generated by the source where is the . Apply-

The binary representation of the and is:

If encoding then ½ ¾ ½ (initially is the NULL

The interval cannot be rescaled and the interval and

½ ¾ ½ ·½ ·½

If then the joint probabilities ½ ¾ of the th

If then the product of the joint th order probabilities can be used.

½ ¾ ·½

where the joint probabilities of the th order model can be