DATA COMPRESSION
WHY COMPRESS DATA?
• Uncompressed graphics, audio & video data
require
– large storage space and
– huge bandwidth. So compression of digital video
& audio data are required. Data compression &
coding are almost synonymous. Text data requires
least storing space, image requires more, audio &
video data requires still more space. Data
compression is needed for:
WHY COMPRESS DATA?
• Large data volumes for secondary storage may
make system too expensive & sometimes
unfeasible.
• Relatively slow storage media may not allow
data transmission from secondary storage
devices to output devices in real time.
• BW limitation may not allow real time A/V
transmission over networks.
Data compression technology
• It is an example of coding technique.
• Coding means to code & transform the input
bit stream to a new bit stream based on some
principles. The source data is passed through a
adapt compressor (coder) & then stored in a
storage media like disk, CD, tape etc. It Is later
de-compressed or expanded in an expander
(decoder).
• The combination of coder-decoder is called
codec or compander.
• The compression ratio (CR) = size of
uncompressed data set in bits/bytes: size of
compressed data set in bits/bytes
Entropy encoding principle
Entropy:
• Entropy is the total information content (TIC) of any
information object
• = Useful information content (entropy) + redundant
information content (RIC)
• In ideal compression, all redundant information is
removed & all useful information is retained. This
happens in ENTROPY ENCODING which is lossless
compression. Here, some RIC may be left but no
entropy should be LOST. In source encoding, it may be
lossless or lossy. Lossless encoding is reversible.
•
• Data Compression Technique:
• Entropy encoding: It is of two types – repetitive
sequence & statistical encoding.
• a) Repetitive Sequence – Oldest technique of data
compression. The sequence of repetitive bits or bytes
is represented by no. of repetition & some special
character. The general form if this technique is called
Run length En-coding – special case being zero/ blank
character replacement.
• If a character C is repeated r times in the input data,
the sequence is represented by character C followed by
special character (C r) and followed by number of
repetition. RLE is beneficial for 4 or more characters
where a considerable CR is achieved.
• Ex: Sequence of characters: = A BB CCC DDDD EEEEE
……J can be replaced by A BB CCC D4 E5 F6 ……..I
9 J10.
• The original sequence is 64 characters, the compressed
one is 36 characters. So, CR = 64:36.
• Blanking is a special case of RLE. Often, blank spaces
are encountered in a text document. In some data
streams, repeated character may have zero amplitude
of signal e.g. in a communication channel, an audio
may have periods of silence. It works by specifying
number of zeros/blanks.
• S.P. = 000052.00 C.P. = 000034.00 Profit = 000018.00
• S.P 452.00 C.P. 434.00 Profit 418.00
9
Huffman coding is a method of
data compression that is
independent of the data type, that
is, the data could represent an
image, audio or spreadsheet. This
compression scheme is used in
JPEG and MPEG-2. Huffman
coding works by looking at the data
stream that makes up the file to be
compressed.