0% found this document useful (0 votes)

179 views20 pages

Shannon's Source Coding Theorem Explained

This lecture discusses source coding and Huffman coding. It introduces information theory concepts like entropy and self-information. The source coding theorem states that it is possible to encode source symbols into a bitstream at a rate of at least the source entropy. Huffman coding is an optimal variable-length coding technique that assigns shorter codes to more probable symbols. It constructs a binary tree by recursively combining the least probable symbols.

Uploaded by

Harsha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views20 pages

Shannon's Source Coding Theorem Explained

Uploaded by

Harsha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

3100/7100

Introduction to Communications
Lecture 31: Source Coding
This lecture:
1. Information Theory.
2. Entropy.
3. Source Coding.
4. Huffman Coding.
Ref: CCR pp. 697–709, A Mathematical Theory of Communication.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 1 / 20

Information Theory

Claude Shannon’s paper, A Mathematical Theory of

Communication, showed that reliable
communication is possible at non-zero rate.
I Shannon proposed the following model for

point-to-point communication.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 2 / 20

Information Theory (2)
I Shannon showed that there is a very widely
applicable, quantitative de inition of
information.
I The source generates information at a certain
rate.
I The noisy channel can be shown to have a
capacity at which it can reliably transmit
information.
I Reliable transmission is possible if and only if
the source’s rate is not greater than the
channel’s capacity.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 3 / 20

The Mathematical Model

The source produces symbols from an alphabet.

I The alphabet is the (discrete) set of all
possible symbols.
E.g., the Roman alphabet, the Unicode character set or the range of
possible pixel intensities from a camera sensor.

I The source produces these symbols in a

discrete-time sequence.
I The channel has its own alphabet or, rather,
alphabets: one at the input and one at the
output.
E.g., consider a channel including a polar NRZ modulator and
demodulator, so the input and output alphabets are {−A, +A}.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 4 / 20

The Mathematical Model (2)
I At each use of the channel, the channel output
is random (because of noise) but dependent
on the input.
I The rate at which the source generates
symbols may be different to the rate at which
the channel is used.
I Some mechanism is needed to translate
between the source and the channel alphabets
and back again.
I Shannon found that this can always be broken
into two independent processes at the
transmitter: source and channel coding.
I Similarly, at the receiver, source and channel decoding.
COMS3100/COMS7100 Intro to Communications L31 - Source Coding 5 / 20
The Mathematical Model (3)

I Without loss of generality, the common

language of the source and channel (de)coder
is a bitstream.
I Source symbol generation, intermediate
bitstream and channel use may all be at
different rates.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 6 / 20

Information & Entropy
We measure information and entropy in terms of
probability and random variables.
Self-Information
I In order to use random variables, map the
source alphabet to the numbers {0, . . . , M − 1}
where M is the size of the alphabet.
I Let the source symbol selected for
transmission at a certain time instant be
represented as a discrete r.v. X.
I Let i represent one possible value for X and
de ine pi = P (X = i).

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 7 / 20

Self-Information
The amount of information (or surprise) at
learning that X = i is log(1/pi ) = − log pi .
I Hence, the rarer the event, i.e., the less probable,

the more surprising and the more informative

it is.
I Shannon calls this self-information.
I The base of the logarithm has not been
speci ied, but it is usually taken to be 2, in
which case the unit is bits.
I If we take the natural logarithm, the unit is nats.
I (Technically, self-information is dimensionless.)
I Measurement in bits is natural: if eight
symbols are equally likely, it makes sense that
they have 3 bits of information each.
COMS3100/COMS7100 Intro to Communications L31 - Source Coding 8 / 20
Entropy
The expected self-information is called the
entropy of X:
[ ] ∑
M−1
1
H (X) = E log =− pi log pi .
pi i=0

I Shannon used this name because of the

similar expression that arises in statistical
thermodynamics.
I It can be regarded as measure of randomness
or disorder.
I Degenerate r.v.s have zero entropy.
COMS3100/COMS7100 Intro to Communications L31 - Source Coding 9 / 20
Entropy (2)

I It can be shown that uniformly distributed r.v.s

have the highest entropy for a given M, so that

0 ≤ H (X) ≤ log M.
I We’ll use Hb (X) when we need to be explicit
about using a logarithm to the base b.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 10 / 20

The Source Coding Theorem
If, each time the source emits a symbol, it is
independent of previous symbols and identically
distributed, we call it a discrete memoryless source
(DMS).
I Suppose the DMS emits r symbols per second.

I Shannon showed that it is possible to use

source coding to encode the symbols in a
bitstream at rH2 (X) bits per second.
I Conversely, he showed it is impossible to have a uniquely
decodable bitstream at a lower rate.

I This is Shannon’s source coding theorem (and

converse).

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 11 / 20

Shannon’s Source Coding Procedure

In proving the source coding theorem, Shannon

devised a simple but impractical source coding
scheme.
I We group N symbols together into a block.

I All likely sequences (in a certain sense) are

identi ied.
I These sequences are enumerated using binary

words of NH2 (X) bits (rounded down;

ignoring some sequences if too many).
I This constitutes the codebook.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 12 / 20

Coding Procedure (2)

I To perform source coding, compare a given

symbol sequence against those in the
codebook & output the code, if there is one.
I The probability that this scheme doesn’t work
→ 0 as N → ∞.
I This scheme is impractical because there is not
necessarily much structure in the codebook.
⇒ The codebook may require massive storage space.
⇒ The codebook may need to be exhaustively searched.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 13 / 20

Variable-Length Codes

An ideal source-coding scheme is theoretically

easy but practically dif icult.
I Also, for inite N, the scheme is unreliable (not

all sequences have codes!).

I How to make the best possible codes for inite

block sizes?
I We’ll start with codes for a single symbol, i.e.,

N = 1.
I Consider a variable-length source code where

each symbol maps to a variable number of bits.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 14 / 20

Variable-Length Codes(2)

I Suppose symbol i is assigned a code of ni bits.

I It turns out that the code can be made
uniquely decodable if and only if the Kraft
inequality is satis ied:
∑
M−1
2−ni ≤ 1.
i=0

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 15 / 20

Huffman Coding

In 1952, David Huffman discovered a simple

method of constructing an optimal
variable-length, uniquely decodable source code.
I The method constructs a binary tree—a tree in

which each node has at most two children.

I It proceeds from the bottom up, combining

leaves into ‘twigs’, twigs into ‘branches’ and so

on until the tree is built.
I Let’s call any partially assembled portion of

the tree a twig.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 16 / 20

Huffman Coding Technique

I To start with, there are M twigs which consist

only of the leaves themselves, the symbols.
I The probability of a twig is the sum of the
probabilities of all of its leaves.
I The code construction algorithm is simply the
following step, iterated (M − 1 times) until
only one twig remains:
I Choose the two twigs with the least probability and assemble
them together to make a larger twig.
I To read off the codes, descend through the tree
towards the symbol’s leaf.
I Each time we take a left branch, output a ‘0’, otherwise a ‘1’.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 17 / 20

Huffman Code Example
Consider a source with M = 5 for which the probabilities are
p0 = p1 = 0.25, p2 = 0.2, p3 = p4 = 0.15.

p=1 Step 4
0

p=
Step 3 0.55
1

0 1

p=
p= 0 0.45
0.3
0 1
p0=0.25 0 1
Step 2
Step 1 1 2
3 4 p1=0.25 p2=0.2
p3=0.15 p4=0.15

I Average code length is 2.3 bits and the entropy is 2.29 bits.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 18 / 20

Developments of Source Coding

Source coding is also known as data compression.

I More particularly, lossless data compression,

since the input and output symbols are

identical.
I Our exposition required that the probability

distribution is known in advance.

I If we don’t, we can use universal source
coding.
I Examples: Lempel-Ziv (LZ77) & Lempel-Ziv-Welch (LZW)
algorithms & derivatives such as DEFLATE in ZIP & gzip
software.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 19 / 20

Source Coding Applications

I For sources like English text, lossless coding is

very important.
I In other applications, like audio, images and
video, we may be able to put up with some
distortion for a lower bit rate.
I In 1963, Shannon developed rate-distortion
theory, the basis of modern lossy data
compression.
I Examples: voice coding in mobile phones, MP3 for music,
JPEG for images, MPEG for video.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 20 / 20

Digital Communications C M. Skoglund Digital Communications C M. Skoglund
No ratings yet
Digital Communications C M. Skoglund Digital Communications C M. Skoglund
5 pages
Information and Coding Theory
No ratings yet
Information and Coding Theory
177 pages
Info Theory for Telecom Students
No ratings yet
Info Theory for Telecom Students
28 pages
Info Theory & Entropy Basics
No ratings yet
Info Theory & Entropy Basics
44 pages
Topic 2 Information and Coding Theory
No ratings yet
Topic 2 Information and Coding Theory
68 pages
Unit 5 - Part-Ii
No ratings yet
Unit 5 - Part-Ii
41 pages
Information Coding Techniques
33% (3)
Information Coding Techniques
374 pages
Concepts & Information Theory
No ratings yet
Concepts & Information Theory
68 pages
Information Theory in Communication Systems
No ratings yet
Information Theory in Communication Systems
105 pages
Imran Farid on Source Coding Techniques
No ratings yet
Imran Farid on Source Coding Techniques
32 pages
Introduction To Information Theory: Hsiao-Feng Francis Lu Dept. of Comm Eng. National Chung-Cheng Univ
No ratings yet
Introduction To Information Theory: Hsiao-Feng Francis Lu Dept. of Comm Eng. National Chung-Cheng Univ
45 pages
Information Theory & Coding Basics
No ratings yet
Information Theory & Coding Basics
45 pages
Source Coding and Shannon's Theorem
No ratings yet
Source Coding and Shannon's Theorem
30 pages
Rohini 67178593226
No ratings yet
Rohini 67178593226
6 pages
Source Coding & Theorems Guide
No ratings yet
Source Coding & Theorems Guide
29 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Introduction to Information Theory
No ratings yet
Introduction to Information Theory
45 pages
Shannon's Source Coding Overview
No ratings yet
Shannon's Source Coding Overview
48 pages
Source & Channel Encoding Basics
No ratings yet
Source & Channel Encoding Basics
15 pages
Introduction
No ratings yet
Introduction
47 pages
Information Coding Techniques
No ratings yet
Information Coding Techniques
42 pages
Chapter 5: Introduction To Information Theory and Coding: Efficient and Reliable Communication
No ratings yet
Chapter 5: Introduction To Information Theory and Coding: Efficient and Reliable Communication
22 pages
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
No ratings yet
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
13 pages
Channel Coding Theorem Explained
No ratings yet
Channel Coding Theorem Explained
23 pages
Information Theory & Coding Techniques-DCom
No ratings yet
Information Theory & Coding Techniques-DCom
28 pages
Coding Theory Essentials
No ratings yet
Coding Theory Essentials
7 pages
Source Coding Techniques in Digital Communication
No ratings yet
Source Coding Techniques in Digital Communication
31 pages
Introduction To ITC
No ratings yet
Introduction To ITC
39 pages
DC Lecture Slides 1 - Information Theory
No ratings yet
DC Lecture Slides 1 - Information Theory
22 pages
Communication System CH#2
No ratings yet
Communication System CH#2
40 pages
Script PDF
No ratings yet
Script PDF
78 pages
Week 3
No ratings yet
Week 3
30 pages
Information Theory and Coding Overview
100% (1)
Information Theory and Coding Overview
79 pages
Lec-1 Introduction. Information Theory
No ratings yet
Lec-1 Introduction. Information Theory
9 pages
Revision of Lecture 1: Q Bits R R Q Q (Bits/symbol) I (M P Log R R R) ? M, P
No ratings yet
Revision of Lecture 1: Q Bits R R Q Q (Bits/symbol) I (M P Log R R R) ? M, P
18 pages
Information Theory & Coding Course
No ratings yet
Information Theory & Coding Course
24 pages
Information Theory and Coding PDF
100% (1)
Information Theory and Coding PDF
214 pages
Chapter 5 - Information Theory
No ratings yet
Chapter 5 - Information Theory
22 pages
Basics of Coding Theory Overview
No ratings yet
Basics of Coding Theory Overview
17 pages
Intro To Communication
No ratings yet
Intro To Communication
19 pages
Information Theory Coding 6 Sem Ec Notes
91% (22)
Information Theory Coding 6 Sem Ec Notes
174 pages
Lossless Compression: Huffman Coding: Mikita Gandhi Assistant Professor Adit
No ratings yet
Lossless Compression: Huffman Coding: Mikita Gandhi Assistant Professor Adit
39 pages
01-Syllabus and Intro
No ratings yet
01-Syllabus and Intro
21 pages
Information Theory and Coding Syllabus
100% (2)
Information Theory and Coding Syllabus
45 pages
Info Theory & Noise Analysis
No ratings yet
Info Theory & Noise Analysis
34 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Cse3086 Itc Notes 30 Oct
No ratings yet
Cse3086 Itc Notes 30 Oct
50 pages
5CS3 ITC Unit II @zammers
No ratings yet
5CS3 ITC Unit II @zammers
50 pages
Toaz - Info Analog and Digital Communication 2016pdf PR 344 399
No ratings yet
Toaz - Info Analog and Digital Communication 2016pdf PR 344 399
56 pages
Pres 3may 5may 9871
No ratings yet
Pres 3may 5may 9871
11 pages
Information Theory
No ratings yet
Information Theory
12 pages
Module-3 Information Theory: Entropy Source-Coding Theorem
No ratings yet
Module-3 Information Theory: Entropy Source-Coding Theorem
14 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
EE-305 Electromagnetic Fields Notes
60% (5)
EE-305 Electromagnetic Fields Notes
79 pages
Verisity SOCverify
No ratings yet
Verisity SOCverify
7 pages
802 11ac
No ratings yet
802 11ac
29 pages
Electric Circuits and Devices Overview
No ratings yet
Electric Circuits and Devices Overview
62 pages
IEEE 802.11 WLAN Draft Standard Overview
No ratings yet
IEEE 802.11 WLAN Draft Standard Overview
23 pages
Image Segmentation Algorithms Overview
No ratings yet
Image Segmentation Algorithms Overview
13 pages
Edge Detection
No ratings yet
Edge Detection
36 pages
Mobile Communications
100% (1)
Mobile Communications
49 pages
Real Time Operating Systems
No ratings yet
Real Time Operating Systems
44 pages
X Band Oscillator
No ratings yet
X Band Oscillator
6 pages
Gunn Diodes
No ratings yet
Gunn Diodes
4 pages
Antenna System in Cellular Mobile Communication
No ratings yet
Antenna System in Cellular Mobile Communication
50 pages
8051 Microcontroller Architecture Overview
No ratings yet
8051 Microcontroller Architecture Overview
6 pages
529 - Microwave Circuit Design by Prashanth
No ratings yet
529 - Microwave Circuit Design by Prashanth
38 pages
Thumb Instruction Set
No ratings yet
Thumb Instruction Set
30 pages
Z-Transform for Engineers
No ratings yet
Z-Transform for Engineers
13 pages
RS-232 Serial Communication Guide
No ratings yet
RS-232 Serial Communication Guide
48 pages
Final Chapter 6 PDF
No ratings yet
Final Chapter 6 PDF
15 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
3-Frequency Reuse Channel Assignment Strategies-17!12!2024
No ratings yet
3-Frequency Reuse Channel Assignment Strategies-17!12!2024
24 pages
ICP Laser Tachometer 2 Datasheet DS 0160
No ratings yet
ICP Laser Tachometer 2 Datasheet DS 0160
2 pages
A Picture Is Worth Thousand Words
No ratings yet
A Picture Is Worth Thousand Words
4 pages
Mobile ⅝ λ Antenna for 160 MHz Band
No ratings yet
Mobile ⅝ λ Antenna for 160 MHz Band
4 pages
Optical Communication Fundamentals - Sudhir Warier
No ratings yet
Optical Communication Fundamentals - Sudhir Warier
270 pages
SRVCC Feature Description RAN16
100% (1)
SRVCC Feature Description RAN16
63 pages
Analog Electronics: Op Amp Stability Analysis and Compensation
No ratings yet
Analog Electronics: Op Amp Stability Analysis and Compensation
23 pages
Measurement of VSWR by Using Slotted Waveguide
No ratings yet
Measurement of VSWR by Using Slotted Waveguide
5 pages
PSTN
No ratings yet
PSTN
24 pages
RTR Part 2 1st Delhi 2024
No ratings yet
RTR Part 2 1st Delhi 2024
4 pages
Slotline Impedance on Low-c Substrates
No ratings yet
Slotline Impedance on Low-c Substrates
3 pages
Isolation Amplifiers for Engineers
No ratings yet
Isolation Amplifiers for Engineers
10 pages
Gain and Amplification in Circuits
No ratings yet
Gain and Amplification in Circuits
30 pages
Samples
No ratings yet
Samples
4 pages
Difference Between Solid State Radar and Magnetron
No ratings yet
Difference Between Solid State Radar and Magnetron
2 pages
Lte 1
No ratings yet
Lte 1
1 page
Analog Multiplier and PLL Overview
No ratings yet
Analog Multiplier and PLL Overview
49 pages
Network Analysis and Synthesis
No ratings yet
Network Analysis and Synthesis
1 page
Septier Cellular Airport Defense Brochure
No ratings yet
Septier Cellular Airport Defense Brochure
4 pages
Thomson DPL5000HT
No ratings yet
Thomson DPL5000HT
13 pages
Bose Soundbar 500 Manual
No ratings yet
Bose Soundbar 500 Manual
56 pages
Phased Array Radar Trends 1999
No ratings yet
Phased Array Radar Trends 1999
10 pages
Chevishev Filter Design
No ratings yet
Chevishev Filter Design
13 pages
ASK Transmitter IC Datasheet
No ratings yet
ASK Transmitter IC Datasheet
18 pages
Multi-Band Bidirectional Antenna Vertical Polarization Half-Power Beam Width
No ratings yet
Multi-Band Bidirectional Antenna Vertical Polarization Half-Power Beam Width
1 page
A Sub-50-Fs RMS Jitter 103.5-GHz Fundamental-Sampling PLL With An Extended Loop Bandwidth
No ratings yet
A Sub-50-Fs RMS Jitter 103.5-GHz Fundamental-Sampling PLL With An Extended Loop Bandwidth
4 pages
GSM-UMTS Cell Reselection and Handover
100% (1)
GSM-UMTS Cell Reselection and Handover
34 pages
Microscan MS300
No ratings yet
Microscan MS300
4 pages
NG SDH Slides e
No ratings yet
NG SDH Slides e
58 pages

Shannon's Source Coding Theorem Explained

Uploaded by

Shannon's Source Coding Theorem Explained

Uploaded by

3100/7100

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 1 / 20

Claude Shannon’s paper, A Mathematical Theory of

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 2 / 20

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 3 / 20

The source produces symbols from an alphabet.

I The source produces these symbols in a

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 4 / 20

I Without loss of generality, the common

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 6 / 20

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 7 / 20

the more surprising and the more informative

I Shannon used this name because of the

I It can be shown that uniformly distributed r.v.s

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 10 / 20

I Shannon showed that it is possible to use

I This is Shannon’s source coding theorem (and

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 11 / 20

In proving the source coding theorem, Shannon

I All likely sequences (in a certain sense) are

words of NH2 (X) bits (rounded down;

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 12 / 20

I To perform source coding, compare a given

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 13 / 20

An ideal source-coding scheme is theoretically

all sequences have codes!).

each symbol maps to a variable number of bits.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 14 / 20

I Suppose symbol i is assigned a code of ni bits.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 15 / 20

In 1952, David Huffman discovered a simple

which each node has at most two children.

leaves into ‘twigs’, twigs into ‘branches’ and so

the tree a twig.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 16 / 20

I To start with, there are M twigs which consist

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 17 / 20

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 18 / 20

Source coding is also known as data compression.

since the input and output symbols are

distribution is known in advance.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 19 / 20

I For sources like English text, lossless coding is

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 20 / 20

You might also like