The Virtues of Redundancy
The Virtues of Redundancy
An Introduction
to
Error-Correcting Codes
Paul H. Siegel
Director, CMRR
University of California, San Diego
2/28/03 1
Outline
• Information Storage and Transmission
• Repetition Codes
• Hamming Code (with a live demo!)
• Shannon’s Coding Theorem
• Compact Audio Disk Code (another live demo!)
• Summary
2/28/03 ECE Staff Luncheon 2
Information
• We all generate and use information:
Text Images
Scientific
Music Measurements
2/28/03 ECE Staff Luncheon 3
Information Storage
• We often store the information for later use:
• Disk drive
• Tape drive
• Zip/diskette drive
• CD-R/W
2/28/03 ECE Staff Luncheon 4
Information Storage
• And we want our storage devices to be:
• Reliable (no errors!)
• Compact (lots of storage capacity in a small space!)
• Inexpensive (almost free!)
2/28/03 ECE Staff Luncheon 5
Information Transmission
• We also send our information to others:
• Email
• Fax
• Wireless telephony
• Satellite broadcast
2/28/03 ECE Staff Luncheon 6
Information Transmission
• And we want the transmission system to be:
• Reliable (no errors!)
• Fast (instant downloads!)
• Inexpensive (almost free!)
2/28/03 ECE Staff Luncheon 7
Data Storage and Transmission
• Data storage and data transmission are two sides of
the same coin – communication systems.
A data storage system communicates information
through time, i.e. , “from now to then.”
A data transmission system communicates information
through space, i.e., “from here to there.”
2/28/03 ECE Staff Luncheon 8
Communication System
INFORMATION
SOURCE TRANSMITTER RECEIVER DESTINATION
SIGNAL RECEIVED
SIGNAL
MESSAGE MESSAGE
CHANNEL
2/28/03 ECE Staff Luncheon 9
Digitized Information
• We often represent the information digitally, as a
sequence of numbers.
• Each number is converted to a unique string of bits
(0’s and 1’s).
• The 0’s and 1’s are converted to an electrical signal
that is used to store or transmit the information.
2/28/03 ECE Staff Luncheon 10
“Noisy” Channels
• When we retrieve or receive the signal, it gets converted back
to a sequence of 0’s and 1’s.
• But an evil force is at work in the channel…NOISE!
• Electrical and magnetic devices can distort the electrical
signal, and the recovered sequence of bits may have errors:
1111111111 1111101111
sent received
2/28/03 ECE Staff Luncheon 11
A Noisy Communication System
INFORMATION
SOURCE TRANSMITTER RECEIVER DESTINATION
CHANNEL
SIGNAL RECEIVED
SIGNAL
MESSAGE
MESSAGE
NOISE
SOURCE
2/28/03 ECE Staff Luncheon 12
Coding to the Rescue
• To combat the effects of noise, we use an
Error Correction Code (ECC).
• The ECC adds redundancy in a special way that allows us
to correct the bit errors caused by the noisy channel.
Data Error Noisy
Compression Correction Channel
Code Code
Remove redundancy Add redundancy
2/28/03 ECE Staff Luncheon 13
Repetition
The simplest form of redundancy is: Repetition!
Sender Receiver
“0” Did she say “1” ?
I said “0” Sounded like “0”
One more time: “0” Sounded like “0” again
She was sending “0”
2/28/03 ECE Staff Luncheon 14
3-Repetition Code
• Encoding rule: Repeat each bit 3 times
Example:
1 . 0 . 1 . 111 . 000 . 111 .
• Decoding rule: Majority vote!
Examples of received codewords:
110 . 000 . 111 . 1 . 0 . 1 . Error-free!
111 . 000 . 010 . 1 . 0 . 0 . Error!
2/28/03 ECE Staff Luncheon 15
How good is this 3-repetition code?
• The code can correct 1 bit error per 3-bit codeword.
• The price we pay in redundancy is measured by the
efficiency or rate of the code, denoted by R:
R= #information bits / # bits in codeword
• For the 3-repetition code: R=33%
2/28/03 ECE Staff Luncheon 16
How good is this 3-repetition code?
Suppose that, on average, the noisy channel flips
1 code bit in 100
Then, on average, the 3-repetition code makes
only 1 information bit error in 3333 bits!
2/28/03 ECE Staff Luncheon 17
Can we do better?
How about repeat 5 times?
On average, only 1 bit error in 100,000 bits.
How about repeat 7 times?
On average, only 1 bit error in 2,857,142 bits
If we let the number of repetitions grow and grow, we can
approach perfect reliability !
2/28/03 ECE Staff Luncheon 18
What’s the catch????
The catch is:
As the number of repetitions grows to infinity, the
transmission rate shrinks to zero!!!
This means: slow data transmission / low storage density.
Is there a better (more efficient) error correcting code?
2/28/03 ECE Staff Luncheon 19
Hamming Code
• Invented by Richard W. Hamming in 1948.
• 7-bit codewords: 4 information bits, 3 redundant bits.
• Corrects 1 bit error per codeword
(like 3-repetition code)
• Code efficiency: R = 4/7 57%
(compared to rate R 33% for 3-repetition code)
2/28/03 ECE Staff Luncheon 20
Hamming Code
2 3
1
7 6
4
2/28/03 ECE Staff Luncheon 21
Hamming Code: Encoder
• Simple encoding rule:
5
1. Insert data bits in 1,2,3,4. 2 3
1
2. Insert “parity” bits in 5,6,7 7 6
to ensure an even number 4
of 1’s in each circle.
2/28/03 ECE Staff Luncheon 22
Hamming Code
5
Information: 0 1 0 0 1
2 3
Parity bit 5: 1 1 0
0 1
Parity bit 6: 0 0
1
Parity bit 7: 1
0
7 4 6
Codeword: 0 1 0 0 1 0 1
2/28/03 ECE Staff Luncheon 23
Hamming Code: Decoder
• Simple decoding rule:
1. Check the parity of each circle – 5
is the number of 1’s even or odd ?
2 3
The pattern of parities is called 1
the syndrome . 7 6
4
2. Look up the syndrome in the
decoding table.
2/28/03 ECE Staff Luncheon 24
Hamming Code: Decoding Table
Syndrome Flipped bit
even even even None!
odd odd odd 1 5
odd even odd 2
odd odd even 3 2 3
1
even odd odd 4 7 6
odd even even 5 4
even odd even 6
even even odd 7
2/28/03 ECE Staff Luncheon 25
How good is the Hamming code?
Suppose that, on average, the noisy channel flips
1 bit in 100
Then, on average, the Hamming code makes
On average, only 1 error in 1111 !!
(Not as good as 3-repetition, but efficiency is 57% compared to 33%)
2/28/03 ECE Staff Luncheon 26
Matrix description of Hamming Code
Bit # 1 2 3 4 5 6 7 5
Red circle 1 1 1 0 1 0 0
2 3
Green circle 1 0 1 1 0 1 0 1
Blue circle 1 1 0 1 0 0 1 7 6
4
This is the “parity-check matrix” for the code.
Encoding and decoding can be described algebraically .
2/28/03 ECE Staff Luncheon 27
Practical ECC
• Lots of “algebraic” codes can be designed by
choosing different parity-check matrices.
• Many useful and powerful codes have been found
and are used in many data transmission and data
storage applications.
2/28/03 ECE Staff Luncheon 28
Compact Disc ECC
• The compact disk uses a powerful code with 75%
efficiency
• It can correct a burst of about 4000 bits, which
corresponds to a scratch of length about 1-tenth of
an inch (2.5 mm).
• Using the redundancy in musical signals, the CD
player can “fix” a burst of 12,300 bits - a scratch
of length about 3-tenths of an inch (7.5 mm)!
2/28/03 ECE Staff Luncheon 29
CD Demonstration
(Warning: do not attempt this at home!)
8 mm thick line
2.5 mm thick 4 mm thick line
2/28/03 ECE Staff Luncheon 30
How “good” can a code be?
• What is the best tradeoff between efficiency
(transmission speed / storage density) and error-
correcting capability (reliability)?
• Does the efficiency have to go to zero in order for
the average number of decoder errors to approach
zero (as in the repetition codes)?
Amazingly, the answer is NO !
2/28/03 ECE Staff Luncheon 31
Shannon’s Coding Theorem
• A mathematical result proved by Claude E. Shannon (Bell
Labs) - in 1948 (coincidentally!)
• For any noisy channel, there is a maximum achievable
efficiency (larger than 0), called the channel capacity, and
denoted by C.
• For any rate R smaller than C, there exist codes that can
approach perfect reliability!
• For any rate R greater than C, perfect reliability is
IMPOSSIBLE !
2/28/03 ECE Staff Luncheon 32
Shannon’s Coding Theorem
For the channel that flips 1 bit in 100 bits,
the capacity is C = 91.92% !!!
Achievable
Region
More
Errors H
C=0.9192
3R Impossible
5R
Region
7R
Perfect
Reliability
Higher rate R
2/28/03 ECE Staff Luncheon 33
Claude E. Shannon
(CMRR Lobby)
2/28/03 ECE Staff Luncheon 34
The Inscription
2/28/03 ECE Staff Luncheon 35
The Formula on the “Paper”
The paper shows the formula
for the capacity of a discrete
noisy channel [Shannon, 1948]
C Max (H(x) – Hy (x))
2/28/03 ECE Staff Luncheon 36
Capacity-approaching codes
• Shannon proved that lots of such “good” codes exist…
but he did not show how to construct the encoders and
decoders!
• Finding such practical good codes has been the “holy
grail” of coding theorists since 1948.
• Turbo codes (1993) and Low-Density Parity-Check codes
(2001) finally have come close to achieving Shannon’s
theoretical limits !!! Hooray!!
• Are coding theorists out of work? No way….
2/28/03 ECE Staff Luncheon 37
Summary
• Coding theory is fun and useful!
• You’ve been a nice audience – Thanks!
2/28/03 ECE Staff Luncheon 38