0% found this document useful (0 votes)
13 views15 pages

Huffman Code Greedy Approach

The document discusses Huffman coding, an efficient technique for lossless data compression that uses a greedy algorithm to create optimal prefix codes, resulting in significant savings in storage space. It also covers the naive string matching algorithm, which finds all occurrences of a pattern in a text through character-by-character comparison, with a running time of O(mn). The analysis highlights the efficiency of Huffman coding using a priority queue and the time complexity involved in both algorithms.

Uploaded by

tayyaba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views15 pages

Huffman Code Greedy Approach

The document discusses Huffman coding, an efficient technique for lossless data compression that uses a greedy algorithm to create optimal prefix codes, resulting in significant savings in storage space. It also covers the naive string matching algorithm, which finds all occurrences of a pattern in a text through character-by-character comparison, with a running time of O(mn). The analysis highlights the efficiency of Huffman coding using a priority queue and the time complexity involved in both algorithms.

Uploaded by

tayyaba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15

Greedy Algorithms

Lecture 22
29th November, 2013
Huffman Codes
 Suppose we have a 100,000 character data file that we wish to store
compactly
 Frequencies of the characters are given in Table
 A fixed length code requires 3 bits, so 300,000 bits are required to code
entire file

a b c d e f
Frequency (in thousands) 45 13 12 16 9 5
Fixed length code 000 001 010 011 100 101

November, 2013
Huffman Codes
 A widely used and very efficient technique for compressing data

 Savings of 20% to 90% are typical, depending on characteristics


of data being compressed

 Huffman coding is an entropy encoding algorithm used for


lossless data compression

 We consider data to be a sequence of characters

 The algorithm uses a greedy strategy

November, 2013
Huffman Codes
 Consider a variable length code

 If we give frequent characters short codes and infrequent characters


long codes, we can do better

 For example, using the coding scheme in the table, bits required are
(45*1+13*3+12*3+16*3+9*4+5*4)*1000=224,000 bits
a b c d e f
Frequency (in thousands) 45 13 12 16 9 5
Fixed length code 000 001 010 011 100 101
Variable length code 0 101 100 111 1101 1100

November, 2013
Huffman Codes
 Codes in which no codeword is prefix of some other codeword is called
a prefix code

 Encoding is simple: A 3 character file abc is coded as 0.101.100 which


is 0101100

 Prefix codes simplify decoding

 The codeword that begins an encoded file is unambiguous. We can


simply identify the initial codeword, translate it back to original, and
repeat decoding process on remainder of the encoded file

 e.g. 001011101 parses uniquely to aabe

November, 2013
Huffman Codes
 Decoding requires an appropriate representation of the prefix
code

 A binary tree whose leaves are the given characters provides one
such representation
0 100 1
55
0 100 1 a 0 1

86 14 25
1 30
0 1
14 0 0 1
58 28 c fb 14 d
1
0 1 0 1 0 0 1
a b c d e f f e

November, 2013
Huffman Pseudocode
 Huffman invented a greedy algorithm that constructs an optimal prefix
code called a Huffman code

November, 2013
Analysis
 The Huffman code tree is generated by using a priority queue(heap data
structure).

 Initially, all the characters are stored in the heap. A data element is
extracted from a heap in O(lg n) time.

 Each cycle, extracts two nodes and frequencies from the queue, and
adds one node and one frequency to the queue. This operation is
repeated n times. Thus, altogether 3n times the queue is enqueued
/dequeued.

 The total time for extracting and storing is 3n.lg n.

 Therefore, running time for the creation of code tree is O(nlg n)

November, 2013
String Matching
 Finding all occurrences of a pattern in a text

 More formally, assume text is an array T[1..n] of length n, and the


pattern is an array P[1..m] of length m <=n. We further assume that
elements of P and T are characters drawn from a finite alphabet Σ. For
example, we may have Σ={0, 1} or Σ={a, b, …, z}

 We say that P occurs with shift s in text T if 0<=s<=n-m and


T[s+1..s+m]=P[1..m]. If P occurs with shift s in T, then we call s a
valid shift otherwise we call s an invalid shift

 The string matching problem is the problem of finding all valid shifts
for which a given pattern P occurs in a given text T

November, 2013
Naïve String Matching Algorithm
 The brute force algorithm makes character-by-character
comparison between the pattern and given text block

 The pattern is moved against the text to examine all possible


comparisons

 The figure shows the initial and terminal placements of the


pattern

November, 2013
Naïve String Matching Algorithm

November, 2013
Naïve String Matching Algorithm
 The naïve algorithm finds all valid shifts using a loop that checks
the condition P[1..m]=T[s+1…s+m] for each of the n-m+1
possible values of s

NAIVE-STRING-MATCHER(T, P)
1. n ← length[T]
2. m ← length[P]
3. For s ← 0 to n-m
4. do if P[1..m] = T[s+1..s+m]
5. then print ‘pattern occurs with shift’ s

November, 2013
Naïve String Matching Algorithm

November, 2013
Naïve String Matching Algorithm

November, 2013
Analysis
 The code consists of two nested loops
for i =1 to n-m+1 do
for j = 1 to m do

 The outer loop iterates n-m+1 times. The inner loop iterates m
times. Thus, in worst case, altogether m(n-m+1) iterations are
performed.

 Thus, running time for the algorithm is O(m(n-m+1)), which is


simplified as O(mn)

November, 2013

You might also like