0 ratings0% found this document useful (0 votes) 61 views40 pagesData Compression Chapter 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Data Compression
Eng: Syed Ati charWhy Data Compression?
e Make optimal use of limited storage space
e Save time and help to optimize resources
© H compression and decompression are done in VO processor, less time Is
required to move data to or from storage subsystem, freeing I/O bus for
other work
© In sending data over communication line:less time to transmit and less
storage to hostData Compression- Entropy
e Entropy is the measure of information content
in a message.
= Messages with higher entropy carry more information than
messages with lower entropy.
¢ How to determine the entropy
® Find the probability p(x) of symbol x in the message
© The entropy H(x) of the symbol x is:
H(x) = - p(x)» logap(x)
e The average entropy over the entire message is
the sum of the entropy of all n symbols in the
messageData Compression Methods
e Data compression is about storing and sending
a smaller number of bits.
e There’re two major categories for methods to
compress data: lossless and lossy methods
| tina] Lenpe i ] JPEG | | MP3_ Lossless Compression Methods
e In lossless methods, original data and the data
after compression and decompression are
exactly the same.
e Redundant data is removed in compression and
added during decompression.
e Lossless methods are used when we can’t afford
to lose any data: legal and medical documents,
computer programs.Run-length encoding
* Simplest method of compression.
+ How: replace consecutive repeating occurrences of a symbol by |
occurrence of the symbol itself, then followed by the number of
occurrences.
BBBBBBBBBAAAAAAAAAAAAAAAANMMMMMMMMMM
——
Compressed data
8, Original data
* The method can be more efficient if the data uses only 2 symbols
(0s and Is) in bit patterns and | symbol is more frequent than
another.Huffman Coding
Assign fewer bits to symbols that occur more
frequently and more bits to symbols appear less often.
¢ There's no unique Huffman code and every Huffman
code has the same average code length.
© Algorithm:
@ Make a leaf node for each code symbol
* Add the generation probability of each symbol to the leaf node
@ Take the two leaf nodes with the smallest probability and connect
them into a new node
* Add | or 0 to each of the two branches
¢ The probability of the new node is the sum of the probabilities of
the two connecting nodes
@
If thereis only one node left, the code construction is completed. If
not, go back to (2)
ann Ege Syed Acie iara Huffman Coding
od * Example — Table 15.1 Frequency of characters,
Character | A]B|]C]O]E
Frequency | 17 | 12 | 12 | 27 | 32
Engr Syed Aci charHuffman Code — Creating Tree
e Algorithm
Place each symbol in leaf
» Weight of leaf = symbol frequency
° Select two trees L and R (initially leafs)
» Such that L, R have lowest frequencies in tree
° Create new (internal) node
» Left child > L
» Right child > R
» New frequency = frequency( L ) + frequency( R )
° Repeat until all nodes merged into one treeHuffman Tree Construction |
Foe
~—B E H I¢ _ Huffman Tree Construction 2
Da H c E 1
—@°0 60 0
3_ Huffman Tree Construction 3
E |
H
Gf)
wf.
>
a
™~
™s
3:
aHuffman Tree Construction 4
oe
eo
SA H
@
™~
™s
3:
aH
LP om
nou ow wou
Huffman Tree Construction 5
01
00
10
111
110Huffman Coding Example
eHuffmancode =
Cc =
A =
H =
e Input
° ACE
e Output
© (LIN)(10)(01) = 1111001
01
00
10
111
110
Eng: Syed Aci harHuffman Code Algorithm Overview
e Decoding
° Read compressed file & binary tree
e Use binary tree to decode file
» Follow path from root to leaf
Eng: Syed Acie Mihar_ Huffman Decoding |
H
1111001_ Huffman Decoding 2
H
1111001_ Huffman Decoding 3
H
1111001
t\ fo © E \
|: a
TX fot No
. 9_ Huffman Decoding 4
H
1111001
t\ fo © E \
|: a
TX fot No
. 9_ Huffman Decoding 5
H
1111001
"Soe e
o ws on
Net_ Huffman Decoding 6
H
1111001
ieee
o ait AC
eZ_ Huffman Decoding 7
H
1111001
Soe e
o fo a Neto —
eZHuffman Coding
e Encoding
° Decoding
Encoder
Text
EAEBAECDEA,
Huffinan code
Huflinan code
00 — A 010 —B
w— pd onc
nt
Decoder
EAEBAECDEA|
TextLempel Ziv Encoding
e It is dictionary-based encoding
e Basic idea:
= Create a dictionary(a table) of strings used during
communication.
= If both sender and receiver have a copy of the
dictionary, then previously-encountered strings can
be substituted by their index in the dictionary.Lempel Ziv Compression
e Have 2 phases:
= Building an indexed dictionary
= Compressing a string of symbols
» Algorithm:
» Extract the smallest substring that cannot be found
in the remaining uncompressed string.
= Store that substring in the dictionary as a new
entry and assign it an index value
= Substring is replaced with the index found in the
dictionary
= Insert the index and the last character of the
substring into the compressed stringLempel Ziv Compression
+ Compression
example: a a es
[oa] aanannnaapanans
J— anpoaannneasAudio Encoding
¢ Predictive encoding
= Only the differences
Engr Syed Avi Mikhar
572023 —Lempel Ziv Decompression
Conprsod
© It’s just the inverse =
of compression process [fpf Je atts
ees
J 8 tat sn
Seda Ea
J nese
UncgmpessedLossy Compression Methods
e Used for compressing images and video files
(our eyes cannot distinguish subtle changes, so
lossy data is acceptable).
e These methods are cheaper, less time and space.
Several methods:
= JPEG: compress pictures and graphics
= MPEG: compress video
= MP3: compress audio