Coding For Discrete Sources
ETE/EEE 422
Saif Ahmed (SfA)
Discrete and Continuous Data
Continuous Data:
A set of data is said to be continuous if the
values belonging to the set can take
on ANY value within a finite or infinite interval.
Examples:
• The height of a horse (could be any value
within the range of horse heights).
• Time to complete a task (which could be
measured to fractions of seconds).
• The outdoor temperature at noon (any value
within possible temperatures ranges.)
• The speed of a car on Route 3 (assuming
legal speed limits).
Discrete and Continuous Data
Discrete Data:
A set of data is said to be discrete if the values
belonging to the set are distinct and separate
(unconnected values).
Examples:
• The number of people in your class (no
fractional parts of a person).
• The number of TV sets in a home (no
fractional parts of a TV set).
• The number of puppies in a liter (no
fractional puppies).
• The number of questions on a math test (no
incomplete questions).
Discrete sources
The output of a discrete source is a sequence
of symbols from a known discrete alphabet X .
This alphabet could be the alphanumeric
characters, the characters on a computer
keyboard, English letters, Chinese characters,
the symbols in sheet music (arranged in some
systematic fashion), binary digits, etc.
A, B, C, D…… Z
a, b, c, …….z
X 1, 2, 3…..0
! ? . / , ….
Continuous Sources
The output of an analog source, in the simplest
case, is an analog real waveform, representing,
for example, a speech waveform. The word
analog is used to emphasize that the waveform
can be arbitrary and is not restricted to taking
on amplitudes from some discrete set of
values.
Discrete-time sources with analog
values (analog sequence sources)
These sources are halfway between discrete and analog sources.
The source output is a sequence of real numbers (or perhaps
complex numbers). Encoding such a source is of interest in its own
right, but is of interest primarily as a sub problem in encoding
analog sources. That is, analog waveform sources are almost
invariably encoded by first either sampling the analog waveform or
representing it by the coefficients in a series expansion. Either
way, the result is a sequence of numbers, which is then encoded.
Discrete-time sources with analog
values (analog sequence sources)
Fixed-length codes for discrete sources
The simplest approach to encoding a discrete
source into binary digits is to create a code C
that maps each symbol x of the alphabet X into C(a) = 000
a distinct codeword C(x), where C(x) is a block C(b) = 001
of binary digits. Each such block is restricted to C(c) = 010
have the same block length L, which is why the C(d) = 011
code is called a fixed-length code. C(e) = 100
C(f) = 101
C(g) = 110.
For example, if the alphabet X consists of the 7
symbols {a, b, c, d, e, f, g}, then the following
fixed-length code of block length L = 3 could be
used.
Fixed-length codes for discrete sources
if the alphabet X consists of the 7 symbols
{a, b, c, d, e, f, g}, then the following fixed-
length code of block length L = 3 could be used C(a) = 000
C(b) = 001
C(c) = 010
C(d) = 011
C(e) = 100
if the source alphabet has size M, then this C(f) = 101
coding method requires L bits to encode each C(g) = 110.
source symbol
Encoded Bits per original Symbol
Variable-length codes for discrete
sources
Variable-length codes for discrete
sources
Successive codeword of a variable-length code are assumed to be transmitted as a
continuing sequence of bits, with no demarcations of codeword boundaries (i.e., no
commas or spaces). The source decoder, given an original starting point, must
determine where the codeword boundaries are; this is called parsing.
A potential system issue with variable-length coding is the requirement for buffering. If
source symbols arrive at a fixed rate and the encoded bit sequence must be transmitted
at a fixed bit rate, then a buffer must be provided between input and output. This
requires some sort of recognizable ‘fill’ to be transmitted when the buffer is empty and
the possibility of lost data when the buffer is full.
Unique decodability
The major property that is usually required from any variable-length code is that of
unique decodability. This essentially means that for any sequence of source symbols,
that sequence can be reconstructed unambiguously from the encoded bit sequence.
Here initial synchronization is assumed: the source decoder knows which is the first bit in
the coded bit sequence. Note that without initial synchronization, not even a fixed-
length code can be uniquely decoded.
Unique decodability and non-unique
decodability
Prefix free coding for variable length
coding
Decoding the output from a uniquely-decodable code, and even
determining whether it is uniquely decodable, can be quite complicated.
However, there is a simple class of uniquely decodable codes called prefix-
free codes. As shown later, these have the following advantages over other
uniquely-decodable codes:
• If a uniquely-decodable code exists with a certain set of codeword
lengths, then a prefix-free code can easily be constructed with the same set
of lengths.
• The decoder can decode each codeword of a prefix-free code
immediately on the arrival of the last bit in that codeword.
• Given a probability distribution on the source symbols, it is easy to
construct a prefix-free code of minimum expected length.
The Kraft inequality for prefix-free
codes
Discrete memoryless sources