0% found this document useful (0 votes)
182 views64 pages

Ch3:source Code

Chapter 3 discusses source coding, which is essential for efficient transmission and storage of digital information by encoding source messages into binary code. It covers definitions of source coding, unique decodability, and instantaneous codes, emphasizing the importance of the prefix condition for efficient decoding. The chapter also provides examples and methods for constructing instantaneous codes and illustrates the decoding process using a decoding tree.

Uploaded by

zak zak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views64 pages

Ch3:source Code

Chapter 3 discusses source coding, which is essential for efficient transmission and storage of digital information by encoding source messages into binary code. It covers definitions of source coding, unique decodability, and instantaneous codes, emphasizing the importance of the prefix condition for efficient decoding. The chapter also provides examples and methods for constructing instantaneous codes and illustrates the decoding process using a decoding tree.

Uploaded by

zak zak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Chapter 3

Source Coding

3.1 Introduction
An important problem in digital communications and computer storage is the effi-
cient transmission and storage of information. Furthermore transmission and storage
of digital data require the information to be represented in a digital or binary form.
Thus there is a need to perform source coding, the process of encoding the informa-
tion source message to a binary code word, and lossless and faithful decoding from
the binary code word to the original information source message. The goal of source
coding for digital systems is two-fold:

1. Uniquely map any arbitrary source message to a binary code and back again
2. Efficiently map a source message to the most compact binary code and back
again

This is shown in Figure 3.1 for a source transmitting arbitrary text through a binary
(digital) communication channel. The source encoder efficiently maps the text to
a binary representation and the source decoder performs the reverse operation. It
should be noted that the channel is assumed noiseless. In the presence of noise,
channel codes are also needed and these are introduced in Chapter 5.

abccde 010010 010010 abccde


Source BINARY Source
Source Receiver
Encoder CHANNEL Decoder

FIGURE 3.1
Noiseless communication system.

We first present the notation and terminology for the general source coding problem
and establish the fundamental results and properties of codes for their practical use

105
106 Fundamentals of Information Theory and Coding Design

and implementation. We then show how entropy is the benchmark used to determine
the efficiency of a particular coding strategy and then proceed to detail the algorithms
of several important binary coding strategies.
Consider a mapping from the source alphabet, , of size  to the code alphabet,  ,
of size  and define the source coding process as follows:

DEFINITION 3.1 Source Coding Let the information source be described


by the source alphabet of size  ,           , and define 
         as the code alphabet of size . A source message of length
is an n-length string of symbols from the source alphabet, that is,  ½ ¾     .
A code word, , is a finite-length string of, say, symbols from the code alphabet,
that is,   ½  ¾  ¿      . Encoding is the mapping from a source symbol,
 , to the code word,  , and decoding is the reverse process of mapping a
code word,  , to the source symbol,  . A code table or simply source code
completely describes the source code by listing the code word encodings of all the
source symbols,              .

DEFINITION 3.2 nth Extension of a Code The th extension of a code


maps the source messages of length n,  , which are the symbols from  , the
th extension of , to the corresponding sequence of code words,   
½  ¾      , from the code table for source .
A source code is identified as:

 a non-singular code if all the code words are distinct.


 a block code of length if the code words are all of fixed length .

All practical codes must be uniquely decodable.

DEFINITION 3.3 Unique Decodability A code is said to be uniquely decodable


if, and only if, the th extension of the code is non-singular for every finite value
of . Informally, a code is said to be uniquely decodable if there are no instances
of non-unique (i.e., ambiguous) source decodings for any and all possible code
messages.

NOTE The following observations can be made:

1. A block code of length which is non-singular is uniquely decodable

2. It is a non-trivial exercise to prove whether a code is uniquely decodable in


general
Source Coding 107

3. A code is proven to be NOT uniquely decodable if there is at least one instance


of a non-unique decoding

The code alphabet of most interest is the binary code, 


 , of size  
since this represents the basic unit of storage and transmission in computers and
digital communication systems. The ternary code,  
  , of size  
is also important in digital communication systems utilising a tri-level form of line

coding. In general we can speak of -ary codes to refer to code alphabets of arbitrary

size .

EXAMPLE 3.1
Consider the source alphabet,     
½   , and binary code,  .
The following are three possible binary source codes.
Source Code A Code B Code C
 0 0 00
 11 11 01
 00 00 10
 11 010 11

We note that:

 Code A is NOT non-singular since       .


 Code B is non-singular but it is NOT uniquely decodable since the code se-
  
quence  can be decoded as either  or   , that is    
    

where      
       is from the second extension of the code.
 Code C is a non-singular block code of length 2 and is thus uniquely decodable.

3.2 Instantaneous Codes

Although usable codes have to be at least uniquely decodable, there is a subclass of


uniquely decodable codes, called instantaneous codes, that exhibit extremely useful
properties for the design and analysis of such codes and the practical implementation
of the encoding and decoding processes.
108 Fundamentals of Information Theory and Coding Design

DEFINITION 3.4 Instantaneous Codes A uniquely decodable code is said to be


instantaneous if it is possible to decode each message in an encoded string without
reference to succeeding code symbols.

A code that is uniquely decodable but not instantaneous requires the past and current
code symbols to be buffered at the receiver in order to uniquely decode the source
message. An instantaneous code allows the receiver to decode the current code word
to the correct source message immediately upon receiving the last code symbol of
that code word (e.g., the decoding is performed “on the fly”).

EXAMPLE 3.2
Consider the following three binary codes for the source ½      :
Source Code A Code B Code C
 0 0 0
 10 01 01
 110 011 011
 1110 0111 111

Codes A, B and C are uniquely decodable (since no instance of a non-unique decoding


can be found) but only code A is instantaneous. Why? Consider the following
encoded string from code A:

          

The bars under and over each code symbol highlight each of the encoded source
symbols. It is apparent that the “0” code symbol acts as a code word terminator or
separator. Hence the moment a “0” is received it represents the last code symbol
of the current code word and the receiver can immediately decode the word to the
correct source symbol; thus code A is instantaneous.
Now consider the following encoded string from code B:

      

Here too the “0” acts as a code word separator but, unlike code A, the “0” now
represents the first code symbol of the next code word. This means the current code
word cannot be decoded until the first symbol of the next code word, the symbol “0,”
is read. For example assume we have received the string . We cannot decode this
to the symbol  until we see the “0” of the next code word since if the next character
we read is in fact a “1” then that means the current code word () is  , not  .
Although we can verify that the code is uniquely decodable (since the “0” acts as a
separator) the code is not instantaneous.
Source Coding 109

Now consider the following encoded string from code C:



Surprisingly we cannot yet decode this sequence! In fact until we receive a “0”
or an EOF (end-of-file or code string termination) the sequence cannot be decoded
since we do not know if the first code word is 0, 01 or 011. Furthermore once we
are in a position to decode the sequence the decoding process is itself not at all as
straightforward as was the case for code A and code B. Nevertheless this code is
uniquely decodable since the “0” still acts as a separator.

From Example 3.2 one can see that instantaneous codes are so named because de-
coding is a very fast and easy process (i.e., instantaneous!). However the practicality
of instantaneous codes also lie in the prefix condition which allows such codes to be
efficiently analysed and designed.

DEFINITION 3.5 Prefix of a Code Let     ½  ¾      be a code word


of length n. A sub-string of code characters from  ,  ½  ¾      , where
  , is called a prefix of the code word  .

DEFINITION 3.6 Prefix Condition A necessary and sufficient condition for a


code to be instantaneous is that no complete code word of the code be a prefix of
some other code word.

Since the prefix condition is both necessary and sufficient the instantaneous codes
are also referred to as prefix codes since such codes obey the above prefix condition.

NOTE Since an instantaneous code is uniquely decodable the prefix condition is


a sufficient condition for uniquely decodable codes. However it is not a necessary
condition since a code can be uniquely decodable without being instantaneous.

EXAMPLE 3.3
Consider the codes from Example 3.2. We can now immediately state whether the
codes are instantaneous or not:

Code A is instantaneous since no code word is a prefix of any other code word (code
A obeys the prefix condition). Since it is instantaneous it is also uniquely
decodable.
Code B is not instantaneous since ½  is a prefix of ¾  , ¾   is a prefix
of ¿  , etc. However it is uniquely decodable since the “0” acts as a
separator.
110 Fundamentals of Information Theory and Coding Design

Code C is not instantaneous since ½  is a prefix of ¾ , etc. However it is


uniquely decodable since the “0” can be used as a separator.

Figure 3.2 graphically depicts the universe of all codes and how the different classes
of codes we have described are subsets of one another. That is, the class of all
instantaneous codes is a subset of the class of all uniquely decodable codes which in
turn is a sub-class of all non-singular codes.

All Codes

Uniquely Decodable
Codes

Non−singular Codes

Instantaneous (Prefix)
Codes

FIGURE 3.2
Classes of codes.

3.2.1 Construction of Instantaneous Codes

The prefix condition not only makes it easy to determine whether a given code is
instantaneous or not but it can also be used to systematically design an instantaneous
code with specified lengths for the individual code words. The problem can be stated
as follows. For a source with  symbols it is required to design the  individual
code words with specified code word lengths of ½  ¾      such that the code is
instantaneous (i.e., it satisfies the prefix condition). To design the code the code
Source Coding 111

word lengths are sorted in order of increasing length. The code words are derived
in sequence such that at each step the current code word does not contain any of the
other code words as a prefix. A systematic way to do this is to enumerate or count
through the code alphabet.

EXAMPLE 3.4

Problem 1: Design an instantaneous binary code with lengths of 3, 2, 3, 2, 2


Solution 1: The lengths are re-ordered in increasing order as 2, 2, 2, 3, 3. For the first
three code words of length 2 a count from 00 to 10 is used:




For the next two code words of length 3, we count to 11, form 110, and then start
counting from the right most symbol to produce the complete code:






Problem 2: Design an instantaneous ternary code with lengths 2, 3, 1, 1, 2


Solution 2: The code word lengths are re-ordered as 1, 1, 2, 2, 3 and noting that a
ternary code has symbols   we systematically design the code as follows:






Problem 3: Design an instantaneous binary code with lengths 2, 3, 2, 2, 2


Solution 3: The code word lengths are re-ordered as 2, 2, 2, 2, 3 and we immediately
see that an instantaneous code cannot be designed with these lengths since:





112 Fundamentals of Information Theory and Coding Design

3.2.2 Decoding Instantaneous Codes

Since an instantaneous code has the property that the source symbol or message can
be immediately decoded upon reception of the last code character in the current code
word the decoding process can be fully described by a decoding tree or state machine
which can be easily implemented in logic.

EXAMPLE 3.5

Figure 3.3 is the decoding tree that corresponds to the binary code:
Source Code
½ 00
 01
 10
 110
 111


0

Read 1
0

Start
Read
1 
0
Read 
1 0
Read
1


FIGURE 3.3
Decoding tree for an instantaneous code.
Source Coding 113

The receiver simply jumps to the next node of the tree in response to the current
code character and when the leaf node of the tree is reached the receiver produces
the corresponding source symbol,  , and immediately returns to the root of the tree.
Thus there is no need for the receiver to buffer any of the received code characters in
order to uniquely decode the sequence.

3.2.3 Properties of Instantaneous Codes

Property 1 Easy to prove whether a code is instantaneous by inspection of whether


the code satisfies the prefix condition.

Property 2 The prefix code permits a systematic design of instantaneous codes


based on the specified code word lengths.

Property 3 Decoding based on a decoding tree is fast and requires no memory stor-
age.

Property 4 Instantaneous codes are uniquely decodable codes and where the length
of a code word is the main consideration in the design and selection of codes
there is no advantage in ever considering the general class of uniquely decod-
able codes which are not instantaneous.

Property 4 arises because of McMillan’s Theorem which will be discussed in the


next section.

3.2.4 Sensitivity to Bit Errors

The instantaneous codes we have been discussing are variable-length codes since the
code words can be of any length. Variable-length codes should be contrasted with
block codes of length  that restrict code words to be of the same length . As we
will see variable-length codes provide greater flexibility for sources to be encoded
more efficiently. However a serious drawback to the use of variable-length codes is
their sensitivity to bit or code symbol errors in the code sequence. If a single bit
error causes the decoder to interpret a shorter or longer code word than is the case
then this will create a synchronisation error between the first bit of the code word
generated by the encoder and the root of the decoding tree. Subsequent code words
will be incorrectly decoded, including the possibility of insertion errors, until the
synchronisation is re-established (which may take a very long time). With block
codes a single bit error will only affect the current block code and the effect will not
propagate to subsequent code words. In fact block codes only suffer this problem
initially when the transmission is established and the decoder has to “lock” onto the
first bit of the code and where there are unmanaged timing discrepancies or clock
skew between the transmitter and receiver.
114 Fundamentals of Information Theory and Coding Design

EXAMPLE 3.6
Consider the following variable-length and block instantaneous binary codes for the
source ½       .
Source Variable-length code Block code
 0 00
 10 01
 110 10
 111 11

Consider the following source sequence:

     

where  indicates that symbol  is being transmitted at time . The corresponding


variable-length code sequence is:


Assume the code sequence is now transmitted through an information channel which
introduces a bit error in the 2nd bit. The source decoder sees:


and generates the decoded message sequence:

      

The single bit error in the encoded sequence produces both a substitution error (2nd
symbol  is substituted by  ) and insertion error ( is inserted after  and subse-
quent source symbols appear as  ) in the decoded message sequence and 7 rather
than 6 source symbols are produced.
Now assume there is a bit error in the 6th bit. The source decoder sees:


and generates the decoded message sequence:

     

The single bit error now causes two isolated substitution errors ( 3rd symbol  sub-
stituted by  and 5th symbol  substituted by  ). Now consider the corresponding
block code sequence:
Source Coding 115

½½       ¼¼½¼¼½½½¼½¼¼


For errors in both the 2nd and 6th bits, a bit error in the 2nd bit causes a single
substitution error (1st symbol  substituted by  ) and a bit error in the 6th bit also
causes a single substitution error (3rd symbol  substituted by  ). In fact any bit
errors will only affect the current code word and will not have any effect on subsequent
code words, and all errors will be substitution errors.

From Example 3.6 it is apparent that variable-length codes are very sensitive to bit
errors, with errors propagating to subsequent code words and symbols being inserted
as well as being substituted. If the main goal of the source coder is to map the
source symbols to the code alphabet then a block coding scheme should be used. For
example, the ASCII code [1] is a 7-bit (or 8-bit in the case of extended ASCII) block
code mapping letters of the English alphabet and keyboard characters to a binary
block code of length 7 or 8. Furthermore, a channel coder (see Chapter 5) with an
appropriate selection of an error-correcting or error-detecting code (see Chapters 6
to 9) is mandatory to ensure the final code sequence is close to error-free and must
always be present when using variable-length codes.

The loss of synchronisation arising from code symbol errors with variable-length
codes is a specific example of the more general problem of word synchronisation
between the source and receiver which is discussed by Golomb et al. [5].

3.3 The Kraft Inequality and McMillan’s Theorem

3.3.1 The Kraft Inequality

If the individual code word lengths are specified there is no guarantee that an in-
stantaneous code can be designed with those lengths (see Example 3.4). The Kraft
Inequality theorem [8] provides a limitation on the code word lengths for the design
of instantaneous codes. Although not of real use in practice (the coding strategies we
will later discuss will guarantee codes that are instantaneous) the Kraft Inequality is
a precursor to the more important McMillan’s Theorem [10] which states that where
code word lengths are the only consideration, an instantaneous code will always be
as good as any uniquely decodable code which is not necessarily instantaneous.
116 Fundamentals of Information Theory and Coding Design

THEOREM 3.1 Kraft Inequality


A necessary and sufficient condition for the existence of an instantaneous code with
alphabet size and  code words with individual code word lengths of ½        
is that the following inequality be satisfied:

 
 (3.1)


Conversely, given a set of code word lengths that satisfy this inequality, then there
exists an instantaneous code with these word lengths.

The proof of the Kraft Inequality is interesting in that it is based on a formal de-
scription of how instantaneous codes are constructed (see Section 3.2.1) given the
code word lengths, where the Kraft Inequality needs to be satisfied for the code to be
successfully constructed.

PROOF Let  be the th source message or symbol and    the corresponding
code word of length  . The proof requires the code word lengths to be arranged in
ascending order, that is, we assume that the code word lengths are arranged such that
       . The number of possible code words for    is  . To ensure
the code is instantaneous we need to consider the number of permissible code words
for    such that the prefix condition is satisfied.
Consider the shortest code word,    Then the number of permissible code words
for    is simply ½ . Next consider code word    The number of possible
   with    as a prefix is given by the expression ¾ ½ (since with    as the
prefix the first  symbols are fixed and one can choose the remaining    symbols
arbitrarily). To ensure the prefix condition is satisfied the number of permissible code
words for    is the number of possible code words, ¾  less those code words
which have    as a prefix, ¾ ½ . That is, the number of permissible code words
for    is ¾  ¾ ½ . Similarly for    the number of permissible code words
is the number of possible code words, ¿ , less those code words which have   
as a prefix, ¿ ¾  and less those code words which have    as a prefix, ¿ ½ 
that is ¿  ¿ ¾  ¿ ½ . For code word,   , the expression for the number of
permissible code words is:

   ½    ¾   ½
(3.2)

To be able to construct the code we want to ensure that there is at least one permissible
code word for all source messages,        That is we require the following
inequalities to be simultaneously satisfied:



 ¾ ½

Source Coding 117
..
.
½ 
½  (3.3)

By multiplying the th equation by  and rearranging we get:


½ 

¾  ½ 

..
.
 ½   ½  (3.4)

We note that if the last inequality holds then all the preceding inequalities will also
hold. Thus the following inequality expression must hold to ensure we can design an
instantaneous code:


 ½   ½  
  (3.5)
 
which is the Kraft Inequality.

The Kraft Inequality of Equation 3.1 only indicates whether an instantaneous code
can be designed from the given code word lengths. It does not provide any indica-
tion of what the actual code is, nor whether a code we have designed which satisfies
Equation 3.1 is instantaneous, but it does tell us that an instantaneous code with the
given code word lengths can be found. Only by checking the prefix condition of the
given code can we determine whether the code is instantaneous. However if a code
we have designed does not satisfy Equation 3.1 then we know that the code is not
instantaneous and that it will not satisfy the prefix condition. Furthermore, we will
not be able to find a code with the given code word lengths that is instantaneous. A
less apparent property of the Kraft Inequality is that the minimum code word lengths
for the given alphabet size and number of code words that can be used for designing
an instantaneous code is provided by making  as close as possible to 1. Obvi-
ously shorter overall code word lengths intuitively yield more efficient codes. This
is examined in the next section.

EXAMPLE 3.7
Consider the following binary (   ) codes
Source Code A Code B Code C
 0 0 0
 100 100 10
 110 110 110
 111 11 11
118 Fundamentals of Information Theory and Coding Design

Code A satisfies the prefix condition and is hence instantaneous. Calculating the
Kraft Inequality yields:

       




which is as expected.

Code B does not satisfy the prefix condition since is a prefix of ; hence the code
is not instantaneous. Calculating the Kraft Inequality yields:

     


 


which implies that an instantaneous code is possible with the given code word lengths.
Thus a different code can be derived with the same code word lengths as code B which
does satisfy the prefix condition. One example of an instantaneous code with the same
code word lengths is:





Code C does not satisfy the prefix condition since is a prefix of ; hence the code
is not instantaneous. Calculating the Kraft Inequality yields:

       




which implies that an instantaneous code cannot be designed with these code word
lengths, and hence we shouldn’t even try.

3.3.2 McMillan’s Theorem

As we will discuss in the next section, the use of shorter code word lengths creates
more efficient codes. Since the class of uniquely decodable codes is larger than the
class of instantaneous codes, one would expect greater efficiencies to be achieved
considering the class of all uniquely decodable codes rather than the more restric-
tive class of instantaneous codes. However, instantaneous codes are preferred over
uniquely decodable codes given that instantaneous codes are easier to analyse, sys-
tematic to design and can be decoded using a decoding tree (state machine) structure.
McMillan’s Theorem assures us that we do not lose out if we only consider the class
of instantaneous codes.
Source Coding 119

THEOREM 3.2 McMillan’s Theorem


The code word lengths of any uniquely decodable code must satisfy the Kraft
Inequality:

 
 (3.6)

Conversely, given a set of code word lengths that satisfy this inequality, then there
exists a uniquely decodable code (Definition 3.3) with these code word lengths.

The proof of McMillan’s Theorem is presented as it is instructive to see the way it


uses the formal definition of unique decodability to prove that the inequality must be
satisfied. The proof presented here is based on that from [2].

PROOF Assume a uniquely decodable code and consider the quantity:

 

    
 

  ½ ¾  
(3.7)

When written out, the quantity will consist of the  

terms arising from the th
extension of the code, each of the form:

 ½ ¾  


 
(3.8)

where we define         . 

Then  is the length,  , of the  th code word in the th extension of the code and
 is the length of the sequence of code words in the th extension of the code. Let


 =       be the maximum code word length over the  code


 

words. The minimum code word length is, of course, a length of 1. Then  can


assume any value from  to  . Let be the number of terms of the form  .
 


Then:
 
 



 
  
(3.9)
  

Thus  represents the number of code word sequences in the th extension of the 

code with a length of . If the code is uniquely decodable then the th extension of 
the code must be non-singular. That is,  must be no greater than  , the number 

of distinct sequences of length . Thus for any value of we must have: 
 
 





 
   

    

    
 

   (3.10)
120 Fundamentals of Information Theory and Coding Design

or:


 
 (3.11)

For   and   we have that      
     .
 
Since the above inequality has to hold for all values of , then this will be true if:


 
(3.12)


The implication of McMillan’s Theorem is that for every non-instantaneous uniquely


decodable code that we derive, an instantaneous code with the same code word
lengths will always be found since both codes satisfy the same Kraft Inequality.
Thus we can restrict ourselves to the class of instantaneous codes since we will not
gain any efficiencies based on code word lengths by considering the larger class of
uniquely decodable codes.

EXAMPLE 3.8
It is required to design a uniquely decodable ternary (   ) code with code word
lengths 1, 1, 2, 2, 3, 3. Since a uniquely decodable code satisfies the Kraft Inequality
by McMillan’s Theorem we check whether this is the case. Calculating the Kraft
Inequality:


 




 
  
     

shows that a uniquely decodable code with these lengths can be found.
BUT the same Kraft Inequality is satisfied for instantaneous codes and since instan-
taneous codes can be systematically constructed following the procedure described
in Section 3.2.1 the following instantaneous code, which is uniquely decodable, is
designed:


Source Coding 121

3.4 Average Length and Compact Codes

3.4.1 Average Length

When considering a collection of possible -ary codes for the same source a choice
needs to be made between the different codes by comparing the performance based
on a criteria of interest. For storage and communication purposes the main crite-
rion of interest is the average length of a code, where codes with smaller average
length are preferred. To calculate the average length the source symbol (or message)
probabilities are needed.

DEFINITION 3.7 Average Length Define           as the individual


source symbol probabilities for a source with  possible symbols. Define   
       as the length of the corresponding code words for a given source coding.

Then the average length of the code, , is given by:

   (3.13)
 
Consider all possible -ary codes for the same source,  . The number of source
symbols,  , and source probabilities,          , are constant, but the code
word lengths           vary with each code. The best code or compact
code will be the one with the smallest average length.

DEFINITION 3.8 Compact Codes Consider a uniquely decodable code that


maps the symbols for a source  to code words from an -ary code alphabet. The
code will be a compact code if its average length is less than or equal to the average
length of all other uniquely decodable codes for the same source and code alphabet.

EXAMPLE 3.9
Consider the following two binary codes for the same source. Which code is better?
Source  Code A Code B
 0.5 00 1
 0.1 01 000
 0.2 10 001
 0.2 11 01

The average length of code A is obviously    bits per symbol. The average
length of code B is        
      bits per symbol.
122 Fundamentals of Information Theory and Coding Design

Code B is better than code A since   . But is there another code, call it code
C, which has   ? Or is code B the compact code for this source? And how
small can the average length get?

The fundamental problem when coding information sources and the goal of source
coding for data compression is to find compact codes. This obviously requires the
individual code word lengths to be made as small as possible, as long as we still
end up with a uniquely decodable (i.e., instantaneous) code. Intuitively the average
length will be reduced when the shorter length code words are assigned to the most
probable symbols and the longer code words to the least probable symbols. This
concept is shown by Example 3.9 where code B assigns the shortest code word (of
length 1) to the most probable symbol (with probability 0.5). Formally, the problem
of searching for compact codes is a problem in constrained optimisation. We state
the problem as follows.

REMARK 3.1 Given           for a source with  symbols the compact


-ary code is given by the set of integer-valued code word lengths         
that minimise:

   (3.14)
 

such that the Kraft Inequality constraint is satisfied:



   (3.15)
 

3.4.2 Lower Bound on Average Length

From Chapter 1 the information content of a source is given by its entropy. When
using logarithms to base 2 the entropy is measured in units of “(information) bits per
source symbol.” Similarly when using binary source codes the average length is also
measured in units of “(code) bits per source symbol.” Since the entropy provides a
measure of the intrinsic information content of a source it is perhaps not surprising
that, for there to be no losses in the coding process, the average length must be at least
the value of the entropy (no loss of information in the coding representation) and
usually more to compensate for the inefficiencies arising from the coding process.
The following theorem establishes this lower bound on the average length of any
possible coding of the source based on the entropy of the source.
Source Coding 123

THEOREM 3.3
Every instantaneous -ary code of the source,  ½        , will have an
average length, , which is at least the entropy,   , of the source, that is:

    (3.16)

with equality when   for       where  is the probability of source


symbol  ,      
¾ is the entropy of the source  using logarithms to the
base and    is the entropy of the source  using logarithms to the base 2.

PROOF Consider the difference between the entropy    and the average
length  and simplify the expression as follows:

 
 
 
      
  

 
 
    

  

 
    
  
(3.17)

Using the inequality     (with equality when  where 




gives:

 
     


  
  



   

  
 
 



  
 

 
(3.18)

code is instantaneous the code word lengths will obey the Kraft Inequality
Since the
so that     and hence:

 
        

with equality when 


 
, or   for       .

The definition of entropy in Chapter 1 now makes practical sense. The entropy is
a measure of the intrinsic amount of average information (in -ary units) and the
124 Fundamentals of Information Theory and Coding Design

average length of the code must be at least equal to the entropy of the code to ensure
no loss in coding (lossless coding). Thus the smallest average length, , for any code
we design will be   . However there is no guarantee that a compact code
for a particular source and code alphabet can be found. Indeed, unless    for
       we will always have that   , but the closer is to   
then the better or more efficient the code.

DEFINITION 3.9 Code Efficiency The efficiency of the code is given by:

    (3.19)

where if    the code is  efficient.

EXAMPLE 3.10
The entropy for the source in Example 3.9 is:

                 


  
Thus for this source we must have    no matter what instantaneous binary
code we design. From Example 3.9 we have:

   and code A has an efficiency of ½  .


   and code B has an efficiency of   .
Since code B is already at  efficiency we can probably state with some confidence
that code B is a compact code for source  . And even if it wasn’t, at best a compact
code would only provide us with no more than  improvement in coding efficiency.

Ideally compact codes should exhibit  efficiency. This requires that
  , and from the condition for equality,    , it implies   
for        . The problem lies in the fact that the  have to be integer values.
If this is the case then we have a special source.

DEFINITION 3.10 Special  Source A sourcewith symbol probabilities  


      such that           are integers is a special source
for -ary codes since an instantaneous code with code word lengths   

for
a code alphabet of size  can be designed which is 100% efficient with    .

EXAMPLE 3.11
Consider the following 4-symbol source:
Source Coding 125

Source A
½ 0.125
 0.25
 0.5
 0.125

We note that the symbol probabilities are of the form  with  ,  ,
 , and  . Thus a 100% efficient compact binary code can be designed with
code word lengths of 3, 2, 1, 3 with     . For example:
Source A Code A
 110
 10
 0
 111

Now consider the following 9-symbol source:


Source B 

 1/9
 1/9
 1/3
 1/27
 1/27
 1/9
 1/9
 1/27
 1/9

We note that the symbol probabilities are of the form  
 with     
     for        . Thus a 100% efficient ternary code can be designed
with code word lengths of 2,2,1,3,3,2,2,3,2 with      .

3.5 Shannon’s Noiseless Coding Theorem


3.5.1 Shannon’s Theorem for Zero-Memory Sources

Equation 3.16 states that the average length of any instantaneous code is bounded
below by the entropy of the source. In this section we show that the average length,
, of a compact code is also bounded above by the entropy plus 1 unit of information
(1 bit in the case of a binary code). The lower bound was important for establishing
how the efficiency of a source code is measured. Of practical importance is the exis-
tence of both a lower and upper bound which, as we will see, leads to the important
126 Fundamentals of Information Theory and Coding Design

Shannon’s Noiseless Coding Theorem [11]. The theorem states that the coding effi-
ciency (which will always be less than 100% for compact codes that are not special)
can be improved by coding the extensions of the source (message blocks generated
by the source) rather than just the source symbols themselves.

THEOREM 3.4
Let be the average length of a compact -ary code for the source  . Then:

       (3.20)

The proof of the theorem as presented makes mention of a possible coding strategy
that one may adopt in an attempt to systematically design compact codes. Although
such a strategy, in fact, designs codes which are not compact it establishes the theo-
rem which can then be extended to the case of compact codes.

PROOF Let   ½         and consider the sub-optimal coding scheme


where the individual code word lengths are chosen according to:



        (3.21)


We can justify that this is a reasonable coding scheme by noting from Theorem 3.3
that a 100% efficient code with     is possible if we select    

and where   does not equal an integer we “round up” to provide integer length
assignments, yielding the coding scheme just proposed. Multiplying Equation 3.21
by  and summing over gives:
   

 
          (3.22)
    

which yields, using the definitions for entropy and average length:

         (3.23)

where  is the average length using this sub-optimal coding scheme. Consider
a compact code for the same source  with average length . Then by definition
  and from Theorem 3.3 we have     . Thus we must also have:

        (3.24)
Source Coding 127

NOTE One coding scheme resulting in code word lengths given by Equation 3.21
is the Shannon code [9]. Due to the sub-optimal nature of this coding scheme it will
not be elaborated further upon.

Equation 3.20 indicates the average length of a compact code will be no more than 1
unit away from the average length of a 100% efficient code. However we now show
that by taking extensions using Equation 3.20 we can improve the efficiency of the
code.

Let ½  ¾       be the th extension of the source ½  ¾      


where  ½ ¾     . Consider the compact code for with average length
 . Then from Theorem 3.4 we have that:

       (3.25)

The    can be considered the joint entropy of  independent sources since:


  ½  ¾       (3.26)

and from Section 1.18 we have the result that   



     
which when substituted into Equation 3.25 and dividing by  yields:

           (3.27)

NOTE The term  is the average length of the code words per symbol  when
coding the th extension which should not be confused with  which is the average
length of the code words per symbol  when coding the source . However  can
be used as the average length of a coding scheme for the source (based on coding
the th extension of the source).

From Equation 3.27 the average length of a compact code for based on coding
the th extension of is no more than ½ units away from the average length of a
100% efficient code and this overhead decreases with larger extensions. Intuitively
we expect that coding the th extension of the source will provide more efficient
codes, approaching 100% efficiency as   . We formally state these results in
the following theorem.
128 Fundamentals of Information Theory and Coding Design

THEOREM 3.5 Shannon’s Noiseless Coding Theorem


Let be the average length of a compact code for the source  , then:

      
Now let  be the average length of a compact code for the th extension of  , then:
  
    
 
and thus:





  
that is, in the limit the code becomes 100% efficient.

NOTE Since we can’t get something for nothing there is a price to be paid in
improving coding efficiency by taking extensions. The price is the increased cost of
encoding and decoding a code with   code words, which represents an exponential
increase in complexity.

EXAMPLE 3.12
Taking extensions to improve efficiency is apparent when the source and code alpha-
bets are the same. Consider the case of a binary coding scheme for a binary source.
Without extensions we would have the trivial result:
Source  Compact
Code
½ 0.8 0
¾ 0.2 1

The efficiency of this compact code (which is no code at all!) is   


´ µ
where
  bits per symbol and      yielding a compact code which is only
72% efficient. Consider taking the second extension of the code where ¾   
and   ¾        for   and   :
Source  Compact
Code
½ ½ 0.64 0
½ ¾ 0.16 10
¾ ½ 0.16 110
¾ ¾ 0.04 111

The compact code for the second extension is as shown in the above table. The
average length of this code is ¾                  
Source Coding 129

bits per symbol pair and ¾¾   bits per symbol yielding a compact code for the
 ´ µ
 efficient, a significant improvement
¾
second extension which is ¾
over 72% efficient.

3.5.2 Shannon’s Theorem for Markov Sources

The statement of Shannon’s Noiseless Coding Theorem was proven for zero-memory
sources. We now show that a similar statement applies to the general case of Markov
sources. Consider an th order Markov source,  , and the corresponding entropy,
  , based on the th order Markov model of the source. Define   as the
entropy of the equivalent zero-memory model of the source (where  is the adjoint
source). Since Equation 3.20 applies to zero-memory sources it also applies to the
adjoint source, hence:

       (3.28)

and also:

         (3.29)

where   is the adjoint of the th extension of the source,   (not to be confused

with  which is the th extension of the adjoint source,  ).
We know from our results from Chapter 1 that       , or equivalently
that        (for  ). We also have that      .
Substituting these into Equation 3.29 gives:

         (3.30)

and

 

     
 
(3.31)

and hence we can see that


    (3.32)

That is, the lower bound on  is the entropy of the th order Markov model   .
Consider a sequence of symbols from an unknown source that we are coding. The
entropy of the source depends on the model we use for the source. From Chapter 1 we
had that, in general,       ½ where   is used to indicate a Markov
source of order . Thus higher order models generally yield lower values for the
entropy. And the lower the entropy is, then from Equation 3.32, the smaller the
130 Fundamentals of Information Theory and Coding Design

average length we expect as we take extensions to the source. This realisation does
not affect how we design the compact codes for extensions to the source, but affects
how we calculate the corresponding th extension symbol probabilities (which are
based on the underlying model of the source) and the resulting code efficiency.

RESULT 3.1
When coding the th extension of a source, the probabilities should be derived based
on the “best” or “true” model of the source, and the efficiency of the code should
be based on the same model of the source.

EXAMPLE 3.13

Consider the following typical sequence from an unknown binary source:

01010010110101010101

If we assume a zero-memory model of the source then we can see that    ,
    and thus     . This implies that the original binary source is 100%
efficient.

However the “true” model is the first order Markov model from Figure 3.4.

0.9

0.1 0 1 0.1

0.9

FIGURE 3.4
True model of source.

The “true” entropy is calculated to be      and the original binary source
is, in fact, only  efficient! To improve the efficiency we find a binary code for
the second extension. Noting that            this gives the second
extension symbol probabilities:
Source Coding 131

Source Compact
Code
½ ½ 0.05 110
½ ¾ 0.45 0
¾ ½ 0.45 10
¾ ¾ 0.05 111

The compact code is shown in the above table. The average length of this code is
¾  bits per symbol pair or ¾¾  bits per symbol which improves the
coding efficiency to  ¾´¾µ .

3.5.3 Code Efficiency and Channel Capacity

Consider the communication system shown in Figure 3.5 for the case of a noiseless
channel.

q−ary
source
message

code
entropy
H(X) Noiseless
Source Encoder r input Decoder Receiver
source Channel
entropy
H(S)
r−ary
code
message

FIGURE 3.5
Noiseless communication system.

The source entropy   is interpreted as the average number of bits transmitted per
source symbol. Each source symbol is encoded to a code word of  code symbols on
average. Thus the ratio  ´ µ represents the average number of bits transmitted per
code symbol. By definition this gives the entropy of the encoded sequence or code
entropy,   . That is:

 (3.33)

132 Fundamentals of Information Theory and Coding Design

Since     ¾


and    
then we have:

       (3.34)

Thus if the code is 100% efficient then  


   which represents the case

of maximum entropy for a source with symbols. Consider the -input noiseless 
channel. From Chapter 2 we established that the mutual information of a noiseless
channel is     
, where and  
are the channel input and output,
respectively, and hence for a noiseless channel   
    
 
 . 
RESULT 3.2
We make the following observations:
(a) A 100% efficient source coding implies maximum code entropy and hence
equiprobable code symbols.
(b) A 100% efficient source coding implies maximum or “best” use of the channel.

This result has important implications when we consider the inclusion of a channel
code with the source code. As we will see in Chapter 5 one of the key assumptions
is that the (binary) inputs to the channel coder are equally likely, and with efficient
source coding this will definitely be the case.

EXAMPLE 3.14
Consider the binary source sequence:

0001100100

Assuming a zero-memory model of the source gives   ,   and 


   . We find the compact binary code of the second extension of the source
is as shown in the following table:
     Compact
Code
00 0.49 0
01 0.21 10
10 0.21 110
11 0.09 111

The average length of the code is ¾    and   


¾   
 . The code
entropy is    . Consider the corresponding code sequence given the
above compact code and initial source sequence:

0 10 110 10 0
Source Coding 133

We assume a zero-memory model and this gives    ,    and    


. Although contrived, this still shows that the code sequence exhibits a higher
entropy than the source.

3.6 Fano Coding


We now describe a coding scheme for the design of instantaneous codes called the
Fano coding scheme, or Shannon-Fano code in reference to the fact that the scheme
was independently published by Shannon [11] and Fano [4], that can yield close to
optimal codes in most cases. Although mainly of historical interest because of its
lack of optimality, Fano coding can be considered the precursor to optimal Huff-
man coding discussed in the next section as it is based on the same principle but
implemented less rigidly.
Fano coding is predicated on the idea that equiprobable symbols should lead to code
words of equal length. We describe Fano’s method for the design of -ary instanta-
neous codes. Consider the source  with  symbols:            and asso-
ciated symbol probabilities           . Let the symbols be numbered
so that            . In Fano’s method the  symbols are di-
vided or split into  equiprobable or close to equiprobable groups of symbols. Each
symbol in the th group is then assigned the code symbol  and this is done for all
        groups. Each of the  groups is then further split into  sub-groups
(or into less than  sub-groups of one symbol if there are less than  symbols in the
group) with each symbol in the th sub-group having the code symbol  appended
to it for all         sub-groups. The process is repeated until the groups can be
split no further.
For the important case of binary codes, Fano coding splits each group into two
equiprobable sub-groups appending a  to one group and  to the other group.
Groups are successively split until no more groups can be split (i.e., each group
has only one symbol in it).

EXAMPLE 3.15
Consider the design of a binary code using the Fano coding scheme for a 8-symbol
source with the following ordered probability assignments:

    
       
          
       
134 Fundamentals of Information Theory and Coding Design

The regrouping and code symbol assignment in each step of the Fano coding scheme
are shown in Figure 3.6. The dashed lines represent the splitting of each set of symbols
into two equiprobable groups. For each division a 0 is appended to each symbol in
one group and a 1 is appended to each symbol in the other group. The number in
front of each dashed line indicates the level of grouping. Thus the level 1 grouping
shown in Figure 3.6(a) splits the original source into two groups with one group of
probability 1/2 containing ½  being assigned a 0 and the other group of probability
1/2 containing         being assigned a 1. Each of these groups is
split into the level 2 groups as shown in Figure 3.6(b). This continues for each level
group until the level 5 grouping of Figure 3.6(e) when there are no more groups left
to split is reached. Figure 3.6(e) is the binary Fano code for this source. The average

Prob Code Prob Code Prob Code


1/2 0 1/2 0 1/2 0
1 1 1
1/8 1 1/8 10 1/8 100
3
1/8 1 1/8 10 1/8 101
2 2
1/16 1 1/16 11 1/16 110
1/16 1 1/16 11 1/16 110
3
1/16 1 1/16 11 1/16 111
1/32 1 1/32 11 1/32 111
1/32 1 1/32 11 1/32 111
(a) (b) (c)

Prob Code Prob Code


1/2 0 1/2 0
1 1
1/8 100 1/8 100
3 3
1/8 101 1/8 101
2 2
1/16 1100 1/16 1100
4 4
1/16 1101 1/16 1101
3 3
1/16 1110 1/16 1110
4 4
1/32 1111 1/32 11110
5
1/32 1111 1/32 11111

(d) (e)

FIGURE 3.6
Binary Fano coding.

length of the Fano code is   bits and since     the Fano code
provides the compact code (with efficiency of 100%). This result can also be verified
by noting that the source probabilities are such that this is a special source for binary
codes.
Source Coding 135

In Example 3.15 the symbols are successively divided into equiprobable groups of
1/2, 1/4, 1/8, 1/16 and 1/32. In the majority of cases equal groupings are not possible,
so closest to equal groupings are used and since there may be many such “quasi”
equiprobable groups different Fano codes will result, and not all may yield compact
codes.

EXAMPLE 3.16

Consider the design of a binary code using the Fano coding scheme for the 5-symbol
source with the following probability assignment:

½   
               

There are two possible Fano codes depending upon how we perform the level 1
grouping. One grouping creates a group of probability 4/9 containing   and the
other group of probability of 5/9 containing           as shown by Figure
3.7(a). The alternative grouping creates a group of probability 5/9 containing    
and the other group of probability 4/9 containing         as shown by Figure
3.7(b). Both groupings are equally valid. The average lengths of the two Fano codes

1
4/9 0 2
4/9 0 0
1/9 1 0 0 0 1/9 0 1
4 1
1/9 1 0 0 1 1/9 1 0 0
3 3
1/9 1 0 1 1/9 1 0 1
2 2
1/9 1 1 0 1/9 1 1 0
3 3
1/9 1 1 1 1/9 1 1 1
(a) (b)

FIGURE 3.7
Different binary Fano codes.

shown in Figure 3.7 are     and    with code (b) being the
compact code. Thus a sub-optimal code may result depending on how the symbols
are grouped.

Example 3.16 demonstrates that Fano codes are not necessarily compact and differ-
ent Fano codes may yield different average lengths. Thus Fano codes are of limited
use since such codes cannot be guaranteed to be compact.
136 Fundamentals of Information Theory and Coding Design

3.7 Huffman Coding

3.7.1 Huffman Codes

We now describe an important class of instantaneous codes known as Huffman codes


which are attributed to the pioneering work of Huffman [7]. Huffman codes arise
when using the Huffman algorithm for the design of instantaneous codes given the
source symbols,  , and corresponding source probabilities,  . The Huffman
algorithm attempts to assign each symbol a code word of length proportional to the
amount of information conveyed by that symbol. Huffman codes are important be-
cause of the following result.

RESULT 3.3
Huffman codes are compact codes. That is, the Huffman algorithm produces a code
with an average length, , which is the smallest possible to achieve for the given
number of source symbols, code alphabet and source statistics.

We now proceed to outline the basic principles of the Huffman algorithm and then
detail the steps and provide examples for specific cases.
The Huffman algorithm operates by first successively reducing a source with  sym-
bols to a source with  symbols, where  is the size of the code alphabet.

DEFINITION 3.11 Reduced Source Consider the source  with  symbols:


          and associated symbol probabilities           .
Let the symbols be renumbered so that ½           . By com-
bining the last  symbols of  ,            , into one symbol,   ,
with probability,        , we obtain a new source, termed a


reduced source of  , containing only      symbols,             .


Call this reduced source  . Successive reduced sources      can be formed
by a similar process of renumbering and combining until we are left with a source
with only  symbols.

It should be noted that we will only be able to reduce a source to exactly  symbols if
the original source has        symbols where is a non-negative integer.
For a binary    code this will hold for any value of   . For non-binary codes
if    is not an integer value then “dummy” symbols with zero probability
are appended to create a source with          symbols, where   is the
smallest integer greater than or equal to .
The trivial -ary compact code for the reduced source with  symbols is then used
to design the compact code for the preceding reduced source as described by the
following result.
Source Coding 137

RESULT 3.4
Assume that we have a compact code for the reduced source . Designate the last 
symbols from ½ ,             , as the symbols which were combined
to form the combined symbol,    of  . We assign to each symbol of   ,
except the last  symbols, the code word used by the corresponding symbol of  .
The code words for the last  symbols of   are formed by appending      
to the code word of    of  to form  new code words.

The Huffman algorithm then operates by back-tracking through the sequence of re-
duced sources,        , designing the compact code for each source, until the
compact code for the original source is designed.

3.7.2 Binary Huffman Coding Algorithm

For the design of binary Huffman codes the Huffman coding algorithm is as follows:

1. Re-order the source symbols in decreasing order of symbol probability.


2. Successively reduce the source to  , then  and so on, by combining the
last two symbols of  into a combined symbol and re-ordering the new set
of symbol probabilities for   in decreasing order. For each source keep
track of the position of the combined symbol,   . Terminate the source
reduction when a two symbol source is produced. For a source with  symbols
the reduced source with two symbols will be .
3. Assign a compact code for the final reduced source. For a two symbol source
the trivial code is  .
4. Backtrack to the original source assigning a compact code for the  th re-
duced source by the method described in Result 3.4. The compact code as-
signed to is the binary Huffman code.

The operation of the binary Huffman coding algorithm is best shown by the following
example.

EXAMPLE 3.17
Consider a 5 symbol source with the following probability assignments:

                        

Re-ordering in decreasing order of symbol probability produces          .


The re-ordered source is then reduced to the source  with only two symbols as
shown in Figure 3.8, where the arrow-heads point to the combined symbol created in
138 Fundamentals of Information Theory and Coding Design

by the combination of the last two symbols from ½ . Starting with the trivial
compact code of   for and working back to a compact code is designed for
each reduced source following the procedure described by Result 3.4 and shown
in Figure 3.8. In each the code word for the last two symbols is produced by taking
the code word of the symbol pointed to by the arrow-head and appending a 0 and 1
to form two new code words. The Huffman coding algorithm of Figure 3.8 can be
depicted graphically as the binary Huffman coding tree of Figure 3.9. The Huffman
coding tree is of similar form to the decoding tree for instantaneous codes shown in
Figure 3.3 and can thus be used for decoding Huffman codes. The Huffman code
itself is the bit sequence generated by the path from the root to the corresponding leaf
node.

Ë Ë Ë Ë
× 0.4 1 0.4 1 0.4 1 0.6 0
× 0.2 01 0.2 01 0.4 00 0.4 1
× 0.2 000 0.2 000 0.2 01
× 0.1 0010 0.2 001
× 0.1 0011

FIGURE 3.8
Binary Huffman coding table.

The binary Huffman code is:


Symbol    Huffman
Code
 0.2 01
 0.4 1
 0.1 0010
 0.1 0011
 0.2 000

The average length of the code is    bits/symbol and the efficiency of the
 
Huffman code is     

  . Since the Huffman code is a compact
code then any other instantaneous code we may design will have an average length,
 , such that   .

From Figure 3.8 in Example 3.17 different reduced sources  and  are possible
by inserting the combined symbol   in a different position when re-ordering.
Source Coding 139

root

1.0

1
0

0.6 0.4

1 
0

0.4 0.2

1 
0

0.2 0.2

 1
0

0.1 0.1

 

FIGURE 3.9
Binary Huffman coding tree.
140 Fundamentals of Information Theory and Coding Design

This may or may not result in a compact code with different individual code word
lengths. Either way the compact code will have the same average length. However
the average length variance:

¾
   (3.35)


may be different.

EXAMPLE 3.18
The average length variance for the Huffman code of Example 3.17 is  .
Consider a different Huffman code tree for the same source shown by Figures 3.10
and 3.11.

Ë Ë Ë Ë
 0.4 00 0.4 00 0.4 1 0.6 0
 0.2 10 0.2 01 0.4 00 0.4 1
 0.2 11 0.2 10 0.2 01
 0.1 010 0.2 11
 0.1 011

FIGURE 3.10
Different binary Huffman code table.

The binary Huffman code is now:


Symbol    Huffman
Code
 0.2 10
 0.4 00
 0.1 010
 0.1 011
 0.2 11

Although this different Huffman code for the same source possesses different indi-
vidual code word lengths the average length is still   bits/symbol; however,
the average length variance is now   .
Source Coding 141

root

1.0

1
0

0.6 0.4

1 1
0 0

0.4 0.2 0.2 0.2

  
1
0

0.1 0.1

 

FIGURE 3.11
Different binary Huffman code tree.
142 Fundamentals of Information Theory and Coding Design

The Huffman code from Example 3.18 has a smaller average length variance than
the code produced in Example 3.17. Codes with smaller average length variance are
preferable since they produce a more constant code bit rate. The following result
can be used to ensure the Huffman coding tree produces a compact code with the
smallest average length variance.

RESULT 3.5
If the combined symbol,  ½ , is placed at the highest available position for that
probability assignment when the reduced source is re-ordered then the resulting
compact code will possess the smallest average length variance.

3.7.3 Software Implementation of Binary Huffman Coding

The binary Huffman coding algorithm can be implemented in software as a sequence


of merge and sort operations on a binary tree. The Huffman coding algorithm is a
greedy algorithm that builds the Huffman decoding tree (e.g., Figures 3.9 and 3.11)
by initially assigning each symbol as the root node of a single-node tree. The decod-
ing tree is then constructed by successively merging the last two nodes, labeling the
edges of the left-right child pairs of the merge operation with a 0 and 1, respectively,
and sorting the remaining nodes until only one root node is left. The code word for
 is then the sequence of labels on the edges connecting the root to the leaf node of
 . The details of the algorithm including a proof of the optimality of Huffman codes
can be found in [3].

3.7.4 -ary Huffman Codes

For the design of general -ary Huffman codes the Huffman coding algorithm is as
follows:
 
1. Calculate  ´ ½µ
. If  is a non-integer value then append “dummy” sym-
bols to the source with zero probability until there are      
symbols.
2. Re-order the source symbols in decreasing order of symbol probability.
3. Successively reduce the source  to ½ , then ¾ and so on, by combining
the last symbols of  into a combined symbol and re-ordering the new set
of symbol probabilities for  ·½ in decreasing order. For each source keep
track of the position of the combined symbol,  ·½ . Terminate the source
reduction when a source with exactly symbols is produced. For a source with
 symbols the reduced source with symbols will be   .

4. Assign a compact -ary code for the final reduced source. For a source with
symbols the trivial code is       .
Source Coding 143

5. Backtrack to the original source assigning a compact code for the  th re-
duced source by the method described in Result 3.4. The compact code as-
signed to , minus the code words assigned to any “dummy” symbols, is the
-ary Huffman code.

The operation of the -ary Huffman code is demonstrated in the following example.

EXAMPLE 3.19
We want to design a compact quaternary    code for a source with originally 11
   symbols and the following probability assignments (re-ordered in decreasing
order of probability for convenience):

 ½                       


                          
Now    and since  is not an integer value we need to append “dummy”
symbols so that we have a source with          symbols where
  . Thus we append symbols     with         .
The source is then reduced to the source  with only  symbols as shown in
Figure 3.12, where the arrow-heads point to the combined symbol created in by
the combination of the last  symbols from  . Starting with the trivial compact
code of     for  and working back to a compact code is designed for each
reduced source following the procedure described by Result 3.4. The code word
for the last  symbols is produced by taking the code word of the symbol pointed to
by the arrow-head and appending     to form  new code words.
The -ary Huffman code is (ignoring the “dummy” symbols  and  ):
Symbol    Huffman
Code
 0.16 2
 0.14 3
 0.13 00
 0.12 01
 0.10 02
 0.10 03
 0.06 11
 0.06 12
 0.05 13
 0.04 100
 0.04 101

The average length of the code is   quaternary units per symbol and the
 
 
Huffman code has an efficiency of     .
144 Fundamentals of Information Theory and Coding Design

Ë Ë½ ˾ Ë¿
0.16 2 0.16 2 0.25 1 0.45 0
0.14 3 0.14 3 0.16 2 0.25 1
0.13 00 0.13 00 0.14 3 0.16 2
0.12 01 0.12 01 0.13 00 0.14 3
0.10 02 0.10 02 0.12 01
0.10 03 0.10 03 0.10 02
0.06 11 0.08 10 0.10 03
0.06 12 0.06 11
0.05 13 0.06 12
0.04 100 0.05 13
0.04 101
0.00 102
0.00 103

FIGURE 3.12
-ary Huffman code table.
Source Coding 145

root
1.0

0 1 2 3

0.45 0.25 0.16 0.14

0 1 2 3 0 1 2 3

0.13 0.12 0.10 0.10 0.08 0.06 0.06 0.05

0 1 2 3

0.04 0.04 0.00 0.00

FIGURE 3.13
-ary Huffman code tree.
146 Fundamentals of Information Theory and Coding Design

3.8 Arithmetic Coding

The Huffman coding method described in Section 3.7 is guaranteed to be optimal


in the sense that it will generate a compact code for the given source alphabet and
associated probabilities. However a compact code may be anything but optimal if it
is not very efficient. Only for special sources, where  ½   is an
integer for        , will the compact code also be  efficient. Inefficiencies
are introduced when  is a non-integer and it is required to “round-up” to the nearest
integer value. The solution to this as a consequence of Shannon’s Noiseless Coding
Theorem (see Section 3.5) is to consider the th extension of the source for  large
enough and build the Huffman code for all possible blocks of length  Although this
does yield optimal codes, the implementation of this approach can easily become
unwieldy or unduly restrictive. Problems include:

The size of the Huffman code table is   , representing an exponential increase


in memory and computational requirements.

The code table needs to be transmitted to the receiver.

The source statistics are assumed stationary. If there are changes an adaptive
scheme is required which re-estimates the probabilities, and recalculates the
Huffman code table.

Encoding and decoding is performed on a per block basis; the code is not
produced until a block of  symbols is received. For large  this may require
the last segment to be padded with dummy data.

One solution to using Huffman coding on increasingly larger extensions of the source
is to directly code the source message to a code sequence using arithmetic coding
which we will describe here. Rather than deriving and transmitting a code table of
size   , segmenting the incoming source message into consecutive blocks of length
, and encoding each block by consulting the table, arithmetic coding directly pro-

cesses the incoming source message and produces the code “on-the-fly.” Thus there
is no need to keep a code table, making arithmetic coding potentially much more
computationally efficient than Huffman coding. The basic concept of arithmetic cod-
ing can be traced back to the 1960’s; however, it wasn’t until the late 1970’s and mid
1980’s that the method started receiving much more attention, with the paper by
Witten et al. [12] often regarded as one of the seminal papers in the area.
Arithmetic coding, like Huffman coding, does require reliable source statistics to be
available. Consider the  -length source message ½  ¾      where    
     are the source symbols and  indicates that the th character in the mes-
sage is the source symbol  . Arithmetic coding assumes that the probabilities,
Source Coding 147

 ½         , for         , can be calculated. The origin of these


probabilities depends on the underlying source model that is being used (e.g., zero-
memory source or th order Markov model) and how such models are arrived at
and probabilities estimated is not discussed here. Rather we assume that the required
probabilities are available and concentrate on the encoding and decoding process.
The goal of arithmetic coding is to assign a unique interval along the unit num-
ber line or “probability line”   of length equal to the probability of the given
source message,         , with its position on the number line given
by the cumulative probability of the given source message,          .
However there is no need for direct calculation of either          or
         . The basic operation of arithmetic coding is to produce this
unique interval by starting with the interval   and iteratively subdividing it by
           for         .
The interval subdivision operation of arithmetic coding proceeds as follows. Con-

sider the first letter of the message,  . The individual symbols,  , are each assigned

the interval      where          and     


 
     . That is the length of each interval is     
and the end of the interval is given by   . The interval corresponding to the
symbol  , that is     , is then selected. Next we consider the second letter of
the message,  . The individual symbols,  , are now assigned the interval     
where:

     


   



       (3.36)


              



 
       (3.37)


and      . That is the length of each interval is   


     . The interval corresponding to the symbol  , that is 
  , 
is then selected. The length of the interval corresponding to the message seen so far,
   , is              . This interval subdivision
operation is depicted graphically in Figure 3.14.
We continue in this way until the last letter of the message,  , is processed and the
final interval of length             is produced. Any number
that falls within that interval can be chosen and transmitted to identify the interval
(and hence the original message) to the receiver. Arithmetic codes are typically
binary codes. Thus the binary representation of the number is considered. Optimal
coding is assured by selecting a number that requires the minimum number of bits to
be transmitted to the receiver (by only transmitting the significant bits and ignoring
148 Fundamentals of Information Theory and Coding Design
½  ¾    

0  ½   ¾        

½     ¾         
 

FIGURE 3.14
Interval subdividing operation of arithmetic coding.

the trailing 0’s). Since the interval occurs with probability ½  ¾       then
approximately ¾ ½  ¾       bits are needed and where this becomes
equality for all message sequences a  efficient coding is possible. A simple
demonstration of arithmetic coding is given in Example 3.20.

EXAMPLE 3.20
Consider the message ¾  ¾  ½ originating from the 3-symbol source with the fol-
lowing individual and cumulative probabilities:
Symbol     
½ 0.2 0.2
¾ 0.5 0.7
¿ 0.3 1.0

We further assume that the source is zero-memory, thus  ½  ¾      ½  
 . As shown by Figure 3.15 initially the probability line   is divided into
three consecutive, adjoint intervals of   ,     and    corresponding
to ½ ,¾ and ¿ and the length ratios   , respectively. The length ratios and
interval lengths both correspond to  . When the first letter of the message, ¾ , is
received the interval     is selected. The interval     is further divided into
three consecutive, adjoint subintervals of length ratios   , that is   
   and    . The length ratios correspond to  ¾     and
the subinterval lengths are equal to  ¾  ¾     ¾     ¾ .
When the second letter of the message, ¾ , is received the subinterval    is
selected. The interval    is then further subdivided into the three subintervals
  ,     and     with length ratios    and when
Source Coding 149

the third and last letter of the message, ½ , is received the corresponding and final
interval     of length  ¾  ¾  ½    ¾  ¾  ½     is selected.
The interval selection process can be summarised by the following table:
Next Letter Interval
¾ [0.2,0.7)
¾ [0.3,0.55)
½ [0.3,0.35)

In binary the final interval is [0.01001100,0.01011001). We need to select a number


from within the interval that can be represented with the least number of significant
bits. The number 0.0101 falls within the interval and only requires the 4 bits 0101 to be
transmitted. We note that we need to transmit  ¾  ¾  ¾  ½    ¾   
 bits of information and we have been able to do this with 4 bits.

The above description of arithmetic coding is still incomplete. For example, how
does one select the number that falls within the final interval so that it can be trans-
mitted in the least number of bits? Let    denote the final interval. One
scheme described in [6] which works quite well in the majority of cases is to carry
out the binary expansion of  and   until they differ. Since   , at

the first place they differ there will be a 0 in the expansion for  and a 1 in the
expansion for  . That is:

   ½ ¾  ½
    ½ ¾  ½

The number  ½ ¾  ½ falls within the interval and requires the least number
of bits from any other number within the same interval; so it is selected and trans-
mitted as the -bit code sequence ½ ¾  ½ .

3.8.1 Encoding and Decoding Algorithms

We have described how the message is encoded and transmitted but have not yet
discussed how the received code is decoded back to the original message. Figure
3.16 lists an algorithm for encoding a message using arithmetic coding to the binary
integer code, , and Figure 3.17 lists an algorithm for decoding the received
binary integer, , back to the original message.

NOTE In both the encoding and decoding algorithms mention is made of the
  or end-of-file. Consider Example 3.20. The code sequence 0101 corresponds
to the binary fraction    and decimal fraction   . The number    not
only falls within the final interval for the message ¾  ¾  ½ but also for the longer
150 Fundamentals of Information Theory and Coding Design

1.0

0.3 ¿

0.7 0.7

0.3(0.5) = 0.15 ¿

0.55 0.55
0.3(0.25) = 0.075 ¿
0.475
0.5 ¾
0.5(0.5) = 0.25 ¾
0.5(0.25) = 0.125 ¾

0.35
0.2(0.25) = 0.05 ½ ¾ ¾ ½
0.3 0.3
0.2(0.5) = 0.1 ½
0.2 0.2

0.2 ½

FIGURE 3.15
Arithmetic coding for Example 3.20.
Source Coding 151

Definitions
1. Let         denote the  distinct source symbols or letters
of the alphabet.
2. Let  ½       denote the  -length message that is to be encoded where
 indicates that the  th letter in the message is the symbol  .
3. Assume            for        and        are
available or can be estimated from the underlying source model.

Initialisation
 
  
  
Iteration
While     do
1. get next letter of message, 
   

2.
3.  
         
 

4. 
                        
5.      
6.     
7.  
done

Termination
Let            and            . Then transmit
the code as the binary integer          .
FIGURE 3.16
Encoding algorithm for arithmetic codes.
152 Fundamentals of Information Theory and Coding Design

Definitions
1. Let         denote the  distinct source symbols or letters
of the alphabet.
2. Let  denote the binary integer that is to be decoded to the original
message  ½       , where  indicates that the th letter in the message
is the symbol  .
3. Assume           for        and      are
available or can be estimated from the underlying source model.

Initialisation
  
  
 
 get  and store  as the binary fraction     .
Iteration
Repeat

1. Start at   and repeat:


 
(a)
(b) 
        
(c) 
        
until      
 
2. output  as the symbol 
3.    
4.      
5.      
6. 
until  is reached
FIGURE 3.17
Decoding algorithm for arithmetic codes.
Source Coding 153

length messages ¾ ¾ ½ ¾ and ¾ ¾ ½ ¾ ¾ and so on. Thus there is a need


for the decoder to know when to stop decoding. Possible solutions to this problem
include:

¯ Defining a special or extra symbol called   which is placed at the end of


each message.

¯ Transmitting the length of the message,  , along with the code.

¯ Encoding and decoding messages in fixed-sized blocks and only transmitting


the size of the block to the decoder at the beginning of the code sequence.

A demonstration of the encoding and decoding algorithms depicted in Figures 3.16


and 3.17 is provided by Examples 3.21 and 3.22.

EXAMPLE 3.21

Consider a zero-memory source with the following source alphabet and probabilities:
Symbol        
6/15 6/15 [0.0, 6/15)
2/15 8/15 [6/15, 8/15)
 2/15 10/15 [8/15, 10/15)
4/15 14/15 [10/15, 14/15)
 1/15 15/15 [14/15, 1.0)

Suppose the message    is generated by the source where  is the   . Apply-


ing the encoding algorithm of Figure 3.16,where   ½ ¾     ½     ,
yields the following:
Next letter        
Init - - [0.0,1.0)
1.0 [6/15, 8/15) [0.4, 0.533333333)
0.133333333 [0.0, 6/15) [0.4, 0.453333333)
 0.053333333 [8/15, 10/15) [0.428444444,0.435555556)
0.007111111 [10/15, 14/15) [0.433185185,0.435081481)
 0.001896296 [8/15, 10/15) [0.434196543,0.434449383)
0.000252840 [0.0, 6/15) [0.434196543,0.434297679)
0.000101136 [6/15, 8/15) [0.434236998,0.434250482)
 0.000013485 [14/15, 1.0) [0.434249583,0.434250482)

The binary representation of the     and      is:

        

          
154 Fundamentals of Information Theory and Coding Design

and hence the integer code that uniquely identifies the interval that is transmitted is
the 16-bit  .

EXAMPLE 3.22

Consider decoding the received code   produced by the


encoding process in Example 3.21. Applying the decoding algorithm of Figure 3.17
given the source alphabet and probabilities from Example 3.21 to the stored value of
   yields the following:
 
        
[0.0, 1.0) 1.0 0.434249878  : [6/15, 8/15)
[0.4, 0.533333333) 0.133333333 0.256874149  : [0, 6/15)
[0.4, 0.453333333) 0.053333333 0.642185373  : [8/15, 10/15)
[0.428444444, 0.435555556) 0.007111111 0.816389094 : [10/15, 14/15)
[0.433185185, 0.435081481) 0.001896296 0.561459102  : [8/15, 10/15)
[0.434196543, 0.434449383) 0.000252840 0.210943262  : [0, 6/15)
[0.434196543, 0.434297679) 0.000101136 0.527358154  : [6/15, 8/15)
[0.434236998, 0.434250482) 0.000013485 0.955186157  : [14/15, 1)

and the decoded message is   where  is the  used to terminate the
decoding process.

3.8.2 Encoding and Decoding with Scaling

The encoding and decoding algorithms depicted in Figures 3.16 and 3.17 iteratively
reduce the length of the interval with each successive letter from the message. Thus
with longer messages, smaller intervals are produced. If the interval range becomes
too small, then underflow is a very real problem. In Example 3.21 the interval range
is already down to 1.011e-04 after only the 8th letter in the message and in the binary
representation of the final  interval the first (most significant) 16 bits are
identical. With modern 64-bit CPUs the  and will become indistinguishable
when the first 64 bits are identical. The practical implementation of the encoding
and decoding algorithms requires an added scaling operation. The scaling operation
rescales the current  based on the pseudo-algorithm depicted in Figure
3.18.
The scaling operation not only solves the underflow problem but permits transmis-
sion of the coded value before the encoding operation is complete. Thus a long
message block can be encoded without having to wait until the end of the message
before transmitting the coded value. Similarly, the decoder can begin decoding the
Source Coding 155

Rescale
If  ½ ¾     ½ ½¾  and  ½ ¾     ½ ½ ¾    then
rescale to:

 ½¾
 

  ½ ¾
   

If encoding then    ½ ¾     ½ (initially  is the NULL


string), where  is the string concatenation operator.
If decoding then    ½  , where   returns the fractional
component of the product.
FIGURE 3.18
Scaling algorithm to avoid underflow.

most significant bits of the coded value (which are transmitted first by the encoder
when rescaling the intervals) before all the bits have been received. Examples 3.23
and 3.24 illustrate the use of the scaling algorithm of Figure 3.18 for the encoding
and decoding of arithmetic codes.

EXAMPLE 3.23

The encoding from Example 3.21 is now repeated with the additional rescaling oper-
ation of Figure 3.18 to prevent underfl
ow from occurring. The encoding with scaling
is shown in the following table where   indicates that the  and 
are expressed in base :
Next letter        ½¼     ¾  
Init - - [0.0, 1.0) [0.0, 1.0)
 1.0 [6/15, 8/15) [.4, .533333) [.01100110, .10001000)
 .133333 [0, 6/15) [.4, .453333) [.01100110, .01110100)
(rescale) [.2, .626667) [.00110..., .10100...) 011
 .426667 [8/15, 10/15) [.427556, .484444) [.01101101, .01111100)
(rescale) [.420444, .875556) [.01101..., .11100...) 011
.455111 [10/15, 14/15) [.723852, .845215) [.10111001, .11011000)
(rescale) [.447704, .690430) [.0111001..., .1011000...) 1
 .242726 [8/15, 10/15) [.577158, .609521) [.10010011, .10011100)
(rescale) [.234520, .752335) [.0011..., .1100...) 1001
 .517815 [0, 6/15) [.234520, .441646) [.00111100, .01110001)
(rescale) [.469040, .883292) [.0111100..., .1110001...) 0
 .414252 [6/15, 8/15) [.634741, .689975) [.10100010, .10110000)
(rescale) [.077928, .519797) [.00010..., .10000...) 101
 .441869 [14/15, 1) [.490339, .519797) [.01111101, .10000101)
(terminate) 1

When the last or  letter of the message is read a final  is transmitted. Concatenat-
ing the intermediate bits that were output yields the final  
156 Fundamentals of Information Theory and Coding Design

which is the same  as in Example 3.21.

EXAMPLE 3.24

The decoding from Example 3.22 is repeated with the additional rescaling operation
of Figure 3.18. The decoding processing with scaling is shown in the following table:
   ½¼  ¾   
   
- - [0.0, 1.0) [0.0, 1.0) 1.0 .434250
.434250  [.400000,.533333) [.01100110, .10001000) .133333 .434250
.256874 [.4, .453333) [.01100110, .01110100)
(rescale) [.2, .626667) [.001100..., .10100...) .426667 .473999
.642185  [.427556, .484444) [.01101101, .01111100)
(rescale) [.420444, .875556) [.01101..., .11100...) .455111 .791992
.816389 [.723852, .845215) [.10111001, .11011000)
(rescale) [.447704, .690430) [.0111001..., .1011000...) .242726 .583984
.561459  [.577158, .609521) [.10010011, .10011100)
(rescale) [.234520, .752335) [.0011..., .1100...) .517815 .343750
.210944 [.234520, .441646) [.00111100, .01110001)
(rescale) [.469040, .883292) [.0111100..., .1110001...) .414252 .687500
.527360  [.634741, .689975) [.10100010, .10110000)
(rescale) [.077928, .519797) [.00010..., .10000...) .441870 .500000
.955197 

NOTE The scaling operation not only expands the interval but repositions it
around 0.5. That is, the rescaled interval has    and . Thus,
rescaling is needed whenever  and are both less than or greater than 0.5.
One pathological condition that may arise and defeat the scaling operation is when
 is just below 0.5 and is just above 0.5. Consider:

   
  

The interval cannot be rescaled and the interval      and


  when the CPU bit-size is exceeded and underfl
ow occurs.

3.8.3 Is Arithmetic Coding Better Than Huffman Coding?

Shannon’s noiseless coding theorem states that if one designs a compact binary code,
using Huffman coding, for the th extension of a source then the average length of
the code,  , is no greater than     ½ . A similar theorem outlined in [6] states
Source Coding 157

that for the arithmetic coding scheme just described the average length is also no
greater than ½
  when encoding a message of length . Thus Huffman coding 
and arithmetic coding exhibit similar performance in theory. However the arith-
metic code encoding and decoding algorithms with scaling can be implemented for
 
messages of any length . Huffman coding for the th extension of a source, how-

ever, becomes computationally prohibitive with increasing since the computational
complexity is of the order . 

3.9 Higher-order Modelling


Shannon’s Noiseless Coding Theorem discussed in Section 3.5 states that code effi-

ciency can be improved by coding the th extension of the source. Huffman coding
 
is then used to derive the compact code for , the th extension of the source. The

arithmetic codes from Section 3.8 directly code the -length message sequence to the
code word. In both cases source statistics are needed to provide the required condi-
tional and joint probabilities which we obtain with the appropriate model. But what
model do we use? Result 3.1 alluded to the requirement that we use the “true” model
for the source, except that in practice the order of the “true” model is not known, it
may be too high to be computationally feasible or, indeed, the “true” model is not a
Markov model of any order!We now formally define what we mean by the different
orders of modelling power.


DEFINITION 3.12 th-order Modelling

In th-order modelling the joint probabilities ½ ¾        
·½  of all possible
 -length message sequences are given. Furthermore:

¯ Since

 ½  ¾         ½      ·½    ·½ 


·½ ·¾ ·½
(3.38)
then the joint probabilities for the th-order model, where   , are
also given.
 Since

 ·½½  ¾        ½  ¾      ·½ (3.39)
½ ¾ 
then the stationary state and conditional probabilities for the th order
Markov model, where 
 , are also provided.

We need to consider what effect the model order has on the Huffman and arithmetic
158 Fundamentals of Information Theory and Coding Design

coding schemes we have discussed. In Chapter 1 it was shown that higher-order


Markov models yield lower values of entropy, and since the lower bound on the
average length of a code is the entropy, higher-order modelling would be expected
to produce more efficient codes (see Result 3.1). Thus it is not only important to
consider the implementation of the encoder and decoder for both the Huffman and
arithmetic algorithms but also how the source statistics are calculated.

Ideally for encoding a -length message or th extension of a source the highest-


order model of order    should be used. Thus coding of longer messages
and source extensions will indirectly benefit from higher orders of modelling power
which, from Shannon’s Noiseless Coding Theorem and Result 3.1, will produce
more efficient codes as the source entropy decreases. Intuitively, longer messages
and extensions are able to better exploit any redundant or repetitive patterns in the
data.

3.9.1 Higher-order Huffman Coding

With higher-order Huffman coding we consider the case of deriving the Huffman
code for the th extension of the source,  , using a  th order model of the source,
.

If     then the joint probabilities  ½  ¾       of the   th


order (Markov) model can be used.

If     then the product of the joint  th order probabilities can be used.

In practice the   -gram model probabilities  ½  ¾     ·½  are directly
estimated from the frequency counts:

 ½  ¾     ·½      


½
½
¾
¾
·½
·½
(3.40)


where  ½  ¾     ·½  is the number of times the   -length message,
½  ¾     ·½ , is seen and  ½  ¾     ·½  is the sum of all   -
length messages that have been seen so far.

When building the Huffman coding tree only the relative order of the th extension
probabilities is important. Thus a fixed scaling of the probabilities can be applied
without affecting the Huffman coding algorithm. Thus practical implementations use
the model counts,  ½  ¾     ·½ , directly instead of the model probabilities,
 ½  ¾     ·½ .
Source Coding 159

EXAMPLE 3.25
After observing a typical  length binary message the following nd order model
counts are produced:

  
   
   
   
   
  
   
 

The binary Huffman code for the rd extension of the source is to be designed based
on the above nd order model. The binary Huffman code tree is shown in Figure 3.19
where the counts are directly used for construction of the tree.
The Huffman code is given in the following table:
rd extension Huffman code
000 1
001 010
010 0010
011 0000
100 011
101 00011
110 0011
111 00010

3.9.2 Higher-order Arithmetic Coding

When performing arithmetic coding on a message of length  we require knowledge


of the conditional probabilities   ½  ¾      ½  for         as shown
by Figure 3.16. Assuming the  th order model is given then:

If  then   ½  ¾      ½    ´½ ½ ¾ ¾  ½ ½ µ for  
 ´    µ
 

 


      where the joint probabilities of the   th order model can be


used.
If    then for           we have   ½  ¾      ½ 
      ·½      ½ 
 ´ 
 ´
  

·½  
  
½  µ
·½   ½ µ
.
160 Fundamentals of Information Theory and Coding Design

root

100
1
0

54 47
1
000
0

29 24
1 1
0 0

15 14 12 12
1 1 001 100
0 0

8 7 7 7

011 1 010 110


0

4 3

111 101

FIGURE 3.19
¿rd order Huffman coding tree.
Source Coding 161

The th order probabilities are derived from the respective frequency counts of the
data seen thus far using Equation 3.40. Unlike Huffman coding, arithmetic cod-

ing requires the absolute probabilities so a count of both   ½   ¾      ·½  and
 ½  ¾      ·½  are needed.

EXAMPLE 3.26
Consider performing arithmetic coding on the message 01011 based on the nd order
model counts from Example 3.25. From the counts the rd extension, nd extension
and single binary probabilities can be calculated as shown in Figure 3.20.

P(000)=0.47
P(00)=0.59
P(001)=0.12 P(0)=0.74
P(010)=0.07
P(01)=0.15
P(011)=0.08
P(100)=0.12 P(10)=0.15
P(101)=0.03 P(1)=0.26
P(110)=0.07 P(11)=0.11
P(111)=0.04

FIGURE 3.20
Joint probabilities up to 3rd order.

´ ½  ¾µ
Applying the encoding algorithm of Figure 3.16 where  ¾ ½  
 
 ´ ½ µ
and
´ ¾ ½ µ
    ¾    ½ 
 
´
  
  
¾  

½µ
yields:

  ½  ¾    ½ ¾     ½       
- - - - [0.0, 1.0)
0/- 1.0     [0.0, 0.74) [0.0, 0.74)
1/0 0.74     ´¼½µ
´¼µ   [0.797, 1.0) [0.590, 0.740)
 ´¼½¼µ
0/01 0.15      ´¼½µ
   [0.0, 0.467) [0.590, 0.660)
 ´½¼½µ
1/10 0.070      ´½¼µ
  [0.800, 1.0) [0.646, 0.660)
 ´¼½½µ
1/01 0.014      ´¼½µ
  [0.467, 1.0) [0.6525, 0.6600)

The corresponding interval sub-division of the probability line [0.0,1.0) is shown in


Figure 3.21.
The final interval for the message 01011 is [0.6525, 0.6600) which in binary is
[0.101001110000, 0.101010001111) and hence the code word that is transmitted
is 10101. It should be noted that for this example the same number of bits is used for
both the code word and the message.
162 Fundamentals of Information Theory and Coding Design

1.0 0.74 0.74 0.66 0.66

1 1
1
0.59 0.646
0.74 1 1

0.66 0.652
0 0
0

0 0

0.0 0.0 0.59 0.59 0.646

FIGURE 3.21
Subdivision of probability line.
Source Coding 163

3.10 Exercises
1. For the following binary codes, determine:

(a) Whether the code is uniquely decodable. If not, exhibit two source mes-
sages with the same code.
(b) Whether the code is instantaneous. If not, can you design an instanta-
neous code with the same lengths of code words?

code A code B code C code D code E code F code G code H


 000 0 0 0 0 0 01 1010
 001 01 10 10 10 100 011 001
 010 011 110 110 1100 101 10 101
 011 0111 1110 1110 1101 110 1000 0001
 100 01111 11110 1011 1110 111 1100 1101
 101 011111 111110 1101 1111 001 0111 1011

2. Consider a block code based on an r-symbol code alphabet and designed for
a source with q possible messages. Derive the expression for the lower bound
on the block code word length.

3. We are going to devise a code for the decimal digits 0-9 using a binary code.
Analysis shows that we should use the shortest codes for 0 and 1. If we code:

digit 0 to code 00

digit 1 to code 11

find the minimum code word length for the remaining digits 2-9 assuming
we want them to be the same code word length.

4. For the following code word lengths:

Code Lengths
A 222444
B 11233
C 112222

(a) Can an instantaneous binary code be formed? If so, give an example of


such a code.
(b) Can an instantaneous ternary code be formed? If so, give an example of
such a code.
164 Fundamentals of Information Theory and Coding Design

(c) If neither a binary nor ternary code can be formed find the smallest num-
ber of code symbols that will allow a code to be formed. Give an example
of such a code.

*5. Find all possible combinations of the individual code word lengths ½   
when coding the messages      which result in uniquely decodable binary
codes with words not more than 3 bits long (i.e., ¿).
6. A 6-symbol source has the following statistics and suggested binary and ternary
codes:

  Code A Code B
 0.3 0 00
 0.2 10 01
 0.1 1110 02
 0.1 1111 10
 0.2 1100 11
 0.1 1101 12

(a) What is the efficiency of binary code A?


(b) What is the efficiency of ternary code B?
(c) Can you design a more efficient binary code and, if so, what is the effi-
ciency of your code?
(d) Can you design a more efficient ternary code and, if so, what is the effi-
ciency of your code?
(e) Which is the most efficient code: binary or ternary?

7. Consider the following information source:

Symbol      

 0.1 0.1 0.45 0.05 0.2 0.1

(a) What is   ?
(b) Derive a compact binary code using Huffman coding. What is the aver-
age length of your code?
(c) Can you find another compact code with different individual code word
lengths? If so, what is the average length of this code?
(d) What is the efficiency of the code?
(e) What coding strategy can be used to improve the efficiency?

8. Consider the following source:


Source Coding 165

Symbol ½    
0.1 0.2 0.2 0.4 0.1

Design a compact binary code for the source with minimum variance between
code word lengths. What is the efficiency of your code? Now design the
compact binary code with maximum variance and compare.
9. Derive the compact ternary code using Huffman coding for the following
source and compute the efficiency:

Symbol        
0.07 0.4 0.05 0.2 0.08 0.05 0.12 0.03

10. Consider the following binary source:

Symbol  
0.1 0.9

(a) Find   .
(b) What is the compact (indeed trivial!) binary code for this source? Com-
pare the average length  with   . What is the efficiency?
(c) Repeat (b) for  (the  extension of  ) for    . Find   and
compare this with   . What is happening?

11. A binary source has the following 2nd extension probabilities:

Symbol 
00 0.75
01 0.1
10 0.1
11 0.05

(a) Design the source encoder by devising a compact code for the source.
(b) What is the efficiency of the code you designed in (a)?
(c) What is the entropy of the source encoder output (the code entropy),
  ? Compare your answer with the source entropy,   . Will this
always be the case?

*12. Suppose a long binary message contains half as many 1’s as 0’s. Find a binary
Huffman coding strategy which uses:
166 Fundamentals of Information Theory and Coding Design

(a) at most 0.942 bits per symbol.


(b) at most 0.913 bits per symbol.

13. What is the efficiency of a binary source Ë in which 1 has a probability of


0.85? Find an extension of Ë with efficiency at least 95%.
*14. A binary source emits 0 with probability 3/8. Find a ternary coding scheme
that is at least 90% efficient.
15. A source emits symbol Ü a third of the time and symbol Ý the rest of the time.
Devise a ternary coding scheme which is:

(a) at most 0.65 ternary units per symbol


(b) at most 0.60 ternary units per symbol
(c) at most 0.55 ternary units per symbol

What is the efficiency of the code you devised for the above cases?
16. You are required to design a ternary source encoder that is at least 97% ef-
ficient. Observations of the ternary source over a certain time interval reveal
that symbol ׽ occurred 15 times, ׾ occurred 6 times and ׿ occurred 9 times.
Design the coding scheme. What is the efficiency of your code?
17. A binary message has been modeled as a 1st order Markov source:

0.3

0.7 0 1 0.9

0.1

(a) Derive a binary compact code for the 2nd extension of the source. What
is the efficiency of your code?
(b) Repeat (a) for the 3rd extension of the source.

18. A binary source emits an equal number of 0’s and 1’s but emits the same
symbol as the previous symbol twice as often as emitting a different symbol
than the previous symbol. Derive a binary encoding which is at least 95%
efficient.
19. The output of an unknown binary information source transmitting at 100 bps
(bits per second) produces a different output (from the previous output) three
times as often as producing the same output. Devise a binary source coding
strategy for each of the following cases, so that the channel bit rate is:
Source Coding 167

(a) at most 95 bps.


(b) at most 80 bps.

*20. Analysis of a binary information source reveals that the source is three times
more likely to emit the same bit if the last two bits were the same; otherwise it
is equally likely to produce a 0 or a 1.

(a) Design a coding scheme which is at least 85% efficient.


(b) Design a coding scheme which is at least 90% efficient.

*21. Consider the source of Qu. 10:

Symbol ½ ¾
 0.1 0.9

(a) Perform arithmetic coding on the sequence ¾ ½ ¾ to produce the coded


output. Compare your answer with the Huffman code of the 3rd exten-
sion of the source from 10(c).
(b) Repeat (a) for the sequence ½ ½ ½
(c) Repeat (a) for the sequence ½ ¾ ½
(d) From the above cases, is arithmetic coding of a 3 letter message compa-
rable to, superior to or inferior to Huffman coding of the corresponding
3rd extension of the source?

NOTE Do not use any scaling. To convert from base 10 to base 2 (and vice
versa) you will need a number base converter that can deal with non-integer
values. One such tool available from the Internet is
http://www.math.com/students/converters/source/base.htm.

22. Perform arithmetic decoding of your coded output values from Qu. 21(a),(b),(c)
and confirm that the original 3 letter messages are produced.
23. Repeat Qu. 21 but this time use scaling.
24. Repeat Qu. 22 but this time use scaling.
*25. Huffman codes are guaranteed to be instantaneous codes. Can the same state-
ment be made about arithmetic codes?
26. Consider Example 3.26:

(a) Perform arithmetic coding of the sequence 011 and compare with the
corresponding Huffman code from Example 3.25.
(b) Repeat (a) for the sequence 000.
168 Fundamentals of Information Theory and Coding Design

(c) Repeat (a) for the sequence 001.


(d) From the above cases, is arithmetic coding comparable to, superior to or
inferior to Huffman coding?

27. Consider Example 3.26:

(a) Perform arithmetic decoding of the coded binary integer value 10101
assuming the original message was 3 bits long.
(b) Repeat (a) but for the coded binary integer value of 1.

*28. After observing a typical sample of a binary source the following nd order
model counts are produced:

  
  
  
  
  
  
  
  

(a) Derive the binary Huffman code table for the 3rd extension of the source
such that there is minimum variance in the individual code word lengths.
What is the average length of your code?
(b) Perform arithmetic coding on the following 3 letter messages and com-
pare with the corresponding Huffman code:
i. 110
ii. 111
iii. 100

3.11 References
[1] ASCII, retrieved January 25, 2002 from
http://www.webopedia.com/TERM/A/ASCII.html

You might also like