0% found this document useful (0 votes)
16 views18 pages

Data Compression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views18 pages

Data Compression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Data Compression

Hae-sun Jung
CS146 Dr. Sin-Min Lee
Spring 2004

1
2
Introduction

 Compression is used to reduce the volume


of information to be stored into storages or
to reduce the communication bandwidth
required for its transmission over the
networks

3
4
Compression Principles
 Entropy Encoding
 Run-length encoding
 Lossless & Independent of the type of source i
nformation
 Used when the source information comprises l
ong substrings of the same character or binar
y digit
(string or bit pattern, # of occurrences), as FA
X
e.g) 000000011111111110000011……
 0,7 1, 10, 0,5 1,2……  7,10,5,2……

5
Compression Principles

 Entropy Encoding
 Statistical encoding
 Based on the probability of occurrence of a pat
tern
 The more probable, the shorter codeword
 “Prefix property”: a shorter codeword must not
form the start of a longer codeword

6
Compression Principles

 Huffman Encoding
 Entropy, H: theoretical min. avg. # of bits that are required
to transmit a particular stream

H = -Σ i=1 n Pi log2Pi

where n: # of symbols, Pi: probability of symbol i

 Efficiency, E = H/H’
where, H’ = avr. # of bits per codeword = Σ i=1 n Ni P
i

Ni: # of bits of symbol i


7
 E.g) symbols M(10), F(11), Y(010), N(011), 0(0
00), 1(001) with probabilities 0.25, 0.25, 0.125,
0.125, 0.125, 0.125

 H’ = Σ i=1 6 Ni Pi = (2(20.25) + 4(30.125)) = 2.5 bit


s/codeword
 H = -Σ i=1 6 Pi log2Pi = - (2(0.25log20.25) + 4(0.125log20.
125)) = 2.5
 E = H/H’ =100 %
 3-bit/codeword if we use fixed-length codewords for six
symbols

8
Huffman Algorithm

Method of construction for an encoding tree


• Full Binary Tree Representation
• Each edge of the tree has a value,
(0 is the left child, 1 is the right child)
• Data is at the leaves, not internal nodes
• Result: encoding tree
• “Variable-Length Encoding”

9
Huffman Algorithm

• 1. Maintain a forest of trees


• 2. Weight of tree = sum frequency of
leaves
• 3. For 0 to N-1
– Select two smallest weight trees
– Form a new tree

10
• Huffman coding

• variable length code whose length is inversely prop


ortional to that character’s frequency
• must satisfy nonprefix property to be uniquely deco
dable
• two pass algorithm
– first pass accumulates the character frequency
and generate codebook
– second pass does compression with the codeb
ook

11
Huffman coding

• create codes by constructing a binary tree


1. consider all characters as free nodes
2. assign two free nodes with lowest frequency to
a parent nodes with weights equal to sum of th
eir frequencies
3. remove the two free nodes and add the newly
created parent node to the list of free nodes
4. repeat step2 and 3 until there is one free node
left. It becomes the root of tree

12
• Right of binary tree :1
• Left of Binary tree :0
• Prefix (example)
– e:”01”, b: “010”
– “01” is prefix of “010” ==> “e0”
• same frequency : need consistency o
f left or right

13
• Example(64 data)
• R K K K K K K K
• K K K R R K K K
• K K R R R R G G
• K K B C C C R R
• G G G M C B R R
• B B B M Y B B R
• G G G G G G G R
• G R R R R G R R

14
• Color frequency Huffman code
• =========================
========
• R 19 00
• K 17 01
• G 14 10
• B 7 110
• C 4 1110
• M 2 11110
• Y 1 11111

15
16
Static Huffman Coding
 Huffman (Code) Tree
 Given : a number of symbols (or characters) and their relative
probabilities in prior
 Must hold “prefix property” among codes

Symbol Occurrence
Root node 0 8 1
A 4/8
0 4 1 A
B 2/8 Branch node
0 2 1 B
C 1/8
Leaf node D C
D 1/8

Symbol Code
A 1 41 + 22 + 13
B 01 + 13 = 14 bits are
C 001 required to transmit
“AAAABBCD”
D 000
Prefix Property !
17
The end

18

You might also like