Image Compression
Image Compression
Processing
Chapter 8
Image Compression
Image
•
Compression
Digital images: take huge amount of data
• Storage, processing and communications requirements might
be impractical
• More efficient representation of digital images is necessary
• Image compression: reduces the amount of data required to
represent a digital image by removing redundant data
• Theoretical work (1940 Shannon): information, its
representation, transmission and compression
• Image compression is an enabling technology: multimedia,
video conferencing, remote sensing, Fax, ….
Fundamentals
• Data compression: process of reducing the amount of data
required to represent a given quantity of information
• Data and information are not the same.
• Data is used to convey information.
• Various amount of data may be used to represent the same
information
Fundamentals
• Suppose that two representations have the same amount of
information one using n1 units of data and the other one
n2 units of data.
• Compression ratio: CR=n1/n2
• Relative data redundancy of the first set is defined as:
• RD=1-1/ CR
• n2<<n1, CR approaches infinity and RD approaches one. This
means that the data n1 has been highly redundant.
• n2>>n1, CR approaches zero and RD approaches minus
infinity. This means that the data n2 contains more
data (expansion instead of compression).
Fundamentals
• A typical compression ratio: 10
• CR=10, RD=1-1/ 10=0.9
• 90% of the data in the first set is redundant
Probability of k th gray-level
Coding redundancy
• Suppose that gray-level rk is represented by l(rk ) bits
• Average length of code words used for the image:
Total bits=MNLavg
units of information
I(E) self-information
Probability of the
symbols of the
source
Elements of Information
• Theory
Average information per source output
• H: uncertainty or entropy of the source
• H is the average amount of information obtained by observing
a single source output
• If the source symbols are equally probable, the entropy is
maximized
Noiseless coding
theorem
What is the minimum average code word length
that can be achieved?
Noiseless coding theorem or Shannon’s first
theorem.
The minimum average code word length that can
be achieved is equal to the entropy of the source.
One of the ways that we can get closer and
closer to the entropy is to group the source
outputs together and build n- tuples (nth-
extension)
Noiseless coding
theorem
Entropy: H=0.918 bits/symbol
Average code word length per symbol (first extension) = 1
Average code word length per symbol (second extension) =
1.89/2=0.945
Fidelity Criteria
• Since information may be lost during compression a means
of quantifying the information loss is desirable.
• 2 type of criteria:
1. Objective fidelity criteria
2. Subjective fidelity criteria
• Objective fidelity criteria: the information loss is expressed
as a function of input image (original image) and output
image (compressed and decompressed)
Fidelity Criteria
• Root mean square (rms) error:
Symbol Inverse
Decoder Mapper
Image Formats,
Compression Standards
Huffman coding
• Huffman coding is the most popular technique for VLC
coding and removing the coding redundancies.
• The first step in Huffman coding is to create a series of source
reductions by ordering the probabilities of the symbols and
combining the lowest probabilities into a single symbol that
replaces them in the next source reduction.
• The second step is to code each reduced source.
Huffman coding
A2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0
A6 0.3 00 0.3 00 0.3 00 0.3 00 0.4 1
A1 0.1 011 0.1 011 0.2 010 0.3 01
A4 0.1 0100 0.1 0100 0.1 011
A3 0.06 01010 0.1 0101
A5 0.04 01011
Lavg=0.4+0.3x2+0.1x3+0.1x4+0.06x5+0.04x5=2.2 bits/symbol
H=2.14 bits/symbol
Huffman coding
Bit plane 0
(least significant)
Bit plane slicing
0 0 1
0 1 0
0 0 1
17 64 128 0 0 0
0 0 0
1 0 0 0 1
15 63 132 0
0 0 0 0 1
0 0
11 60 142 MSB 0 0
0 0 1 0
0 0 1 0 0
0 0 1 0
1 0
1 0 1
0 0 1 1 1
1 0 1
1 0 1
0 0
0 0 1 1 1
0 1 0
0 0 1 LSB 1 11
1 0 1
1 0 0 1 0 0
1 1 0
1 0 0
Bit-plane
• decomposition
Disadvantage: a small change in gray level can have a
significant impact in bit planes.
• Example: two adjacent pixels with values of 127 and
128.
– 127: 01,111,111
– 128: 10,000,000
– There is a transition (0 to 1 or 1 to 0) in every bit plane at the location
of these pixels.
– This transition reduces the coding efficiency.
Bit-plane
• decomposition
Alternative method: first represent the image by an m-bit
Gray Code.
• Property of Gray code: successive code words differ only in
one bit position.
• Small changes in the gray level are less likely to affect all bit
planes.
• Example: two adjacent pixels with values of 127 and 128.
– 127: 01,000,000
– 128: 11,000,000
• To convert am-1 am-2 ……a0 to Gray code:
DFT
Discrete Cosine
Transform) DCT
Walsh-Hadamard
Transform (WHT)
p0(u)=bm-1(u)
p1(u)=bm-1(u)+ bm-2(u)
Pm-1(u)=b1(u)+ b0(u)
Transform Coding
• Transformations that pack the most information into the
fewest coefficients provide the best approximation,
and consequently the smallest reconstruction error.
• Karhunen-Loeve Transform (KLT) is the optimal
transformation in an information packing sense.
• It minimizes the mean-square-error for any input image and
any number of retained coefficients.
• KLT is data dependent: obtaining the KLT basis images is
computationally complex.
Transform Coding
• KLT is seldom used in practice, instead DCT whose bases
images are fixed, normally is selected.
• DCT has been used in international standards for image
compression.
• It has been implemented in a single IC.
• Subimage size: affects transform coding error and
computational complexity.
– Normally it is selected as a power of 2 (simplifies
the computations)
Bit allocation
• Bit allocation: which transform coefficients are kept and
what is the precision that is used to represent them.
• Retaining coefficients is equivalent to applying a mask to the
coefficients. The mask is one where the coefficients are
kept and zero where the coefficients are discarded.
• Precision of representation: number of levels of scalar
quantizer used for a coefficient
• Retaining the coefficients:
1. Maximum variance (zonal coding)
2. Maximum magnitude (threshold coding)
Zonal Coding
• Uses the information theory concept of viewing information
as uncertainty.
• Transform coefficients of maximum variance carry the most
picture information and should be retained.
• How to calculate the variance:
1. From the (N/n)(N/n) transformed subimage arrays
2. An assumed image model
• Coefficients of maximum variance are located around the
origin of the transform
• The retained coefficients must be quantized and coded.
Threshold coding
• In zonal coding a single, fixed mask is used for all subimages.
• Threshold coding is adaptive: the location of the transform
coefficients retained for each subimage vary from one
subimage to another one.
• For each subimage, the transform coefficients of largest
magnitude make the most significant contribution to
reconstructed subimage quality.
• When the mask is applied to the subimage for which it was
derived, the resulting nxn array is reordered (zigzag scanning)
and the 1-D sequence which contains long runs of zeros is
run-length coded.
Threshold coding
• What is the threshold and how it is obtained?
1. A single global threshold for all subimages
2. A single threshold for each subimage
3. The threshold can be varied as a function of the location of each
coefficient within the subimage.
Lossless predictive
coding
In this technique there is no need to decompose
the image into bit planes.
This technique eliminates the interpixel
redundancies of closely spaced pixels.
How? By extracting and coding only the new
information in each pixel
New information: the difference between actual
and predicted value of that pixel
Lossless predictive
• coding
Predictors in encoder and decoder are the same.
• Various local, global and adaptive methods can be used to
generate the prediction
Lossless predictive
• coding
Linear predictor is common:
• Previous pixels are used to estimate the value of the current
pixel
• The previous pixels could be on the same row (column) with
the current pixel (1-D prediction) or around the current
pixel (2-D)
"
Xm
ˆ
f(n) = round ↵i f(n —
i) i=
1
Lossy compression
• Lossy compression: compromising the accuracy of the
reconstructed image in exchange for increased compression
• Good reconstructed images even with a compression factor of
30:1.
• The principle difference between lossy and lossless
compression is presence or absence of quantizer.
Lossy predictive
• coding
The prediction at the decoder and encoder should
be the same.
• This will prevent error built up at the decoder
output
Delta modulator
Optimal quantizer
t=q(s)
Quantizer is completely described by ti and si
the the first quadrant.
si: decision points
ti: reconstruction levels
Design problem: select si and ti for a particular
optimization criterion and input probability
density function p(s)
Optimal quantizer
If mean square error is used and p(s) is assumed to
be even then (Lloyd-Max quantizer):
Wavelet coding
• Image is wavelet transformed
• Many of the computed coefficients carry little visual
information, they can be quantized.
• The quantization can exploit positional correlation across
decomposition levels.
Wavelet coding
• Wavelet functions selected for wavelet compression, affect
the performance of wavelet compression system
• The most widely used wavelet functions for compression are
Daubechies wavelets.
• Another factor affecting wavelet coding computational
complexity and reconstruction error is the number of levels.
• In many applications, the resolution of image and the scale of
the lowest useful approximation normally determine the
number of transform levels.
Wavelet coding
Wavelet coding
• Largest factor effecting wavelet compression and
reconstruction error is coefficient quantization
• Effectiveness of the quantization can be improved
significantly by
1. introducing an enlarge quantization interval around zero called a dead
zone
2. adapting the size of quantization interval from scale to scale
JPEG 2000
• JPEG2000 extends JPEG to provide increased flexibility in
compression and access to compressed data
• First step of encoding process is to DC level shift the samples
of the image by subtracting 2Ssiz-1 (Ssiz is number of bits
representing each pixel)
• After the image has been level shifted it is optionally divided
into tiles
• Tiles are rectangular arrays of pixels
• One dimensional discrete wavelet transform of rows and
columns of each tile is then computed
• For lossless compression, transform is based on a
biorthogonal 5-3 coefficient scaling and wavelet vector
JPEG 2000
• A rounding procedure is defined for non-integer valued
transform coefficients
• In lossy compression, a 9-7 coefficient scaling and wavelet
vector is employed
• The transform produces four subbands, an approximation,
horizontal, vertical and diagonal
• Transform might be repeated on the approximation
component
• Coefficients in subbands are quantized
• The final steps of encoding are coefficient bit modeling,
arithmetic coding, bit stream layering and packetizing.
JPEG 2000
• Coefficients of each transformed tile subbands are arranged
into rectangular blocks called code blocks
• Code blocks are individually coded a bit plane at a time
• Each bit plane is processed in three passes: significant
propagation, magnitude refinement, and cleanup
• Outputs are arithmetically coded and grouped with similar
passes from other code blocks to form layers
Video compression
• standards
Video compression standards can be grouped into two
categories:
1. video teleconferencing standards (H.261, H. 262. H.263)
2. multimedia standards (MPEG1, MPEG2, MPEG4)
• Both groups are built around a hybrid block-based DPCM/
DCT coding scheme
• There are three basic types of encoded output frames
1. Frames that are coded without any reference to past frames (I
frames)
Video compression
standards
– Because I frames do not use temporal correlation, the compression
rate is low
– I frames provide the highest degree of random access, ease of editing,
and greatest resistance to errors
2. Predictive coded (P frames): are coded using motion
compensation prediction from the last I or P frame,
whichever happens to be closest.
– Compression efficiency of P frames is substantially higher than I
frames.
Video compression
3. standards
Bidirectionally predictive coded (B frames): achieve a high
level of compression by using motion compensation
prediction from the most recent frame and the closest future
frame.
– By using both past and future for prediction, generally we can get
better compression than if we only use prediction based on the past.
Video compression
standards
Lossy compression
• In a lossless compression, the reconstructed signal is identical
to the original signal
• Only a limited amount of compression can be obtained with
lossless compression
• There is a floor (entropy of the source) below which we
cannot drive the size of the compressed signal
• In some applications consequences of loss of information may
prohibit us from loss of information (bank records, medical
images)
Lossy compression
• If the resources are limited and we do not require absolute
integrity, we can improve the amount of compression by
accepting certain degree of loss during compression
• Lossless compression: rate is the only performance
measure
• In lossy compression rate by itself is not enough
• A distortion measure (difference between original and
reconstructed data) is also required
• Goal in lossy compression: incur minimum amount of
distortion while compressing to the lowest possible rate
• There is a tradeoff between minimizing rate and keeping
distortion small
Source Coding
• WhatTheorems
is the smallest rate, subject to a given distortion, at which the
information of the source can be represented?
• For every source there is a Rate-Distortion (RD) function that gives the
minimum rate achievable for a given distortion.
• RD function can be computed analytically for simple sources and
distortion measures and iteratively for more complex situation.
• RD functions have typical form shown below.
• RD function is a lower bound, and the source coding theorem tells us that
we can get as close as possible to this lower bound as we want.
Error free (lossless)
• compression
Applications: archiving of medical or business documents
where lossy compression is prohibited for legal reasons.
• Typical compression ratio: 2 to 10
• 2 operations are involved:
1. Reduce the interpixel redundancies using a mapping or
transformation
2. Coding the representation to eliminate coding
redundancies
b1 b2
a0 a1 a2
G3
• If b1 and b2 lie between a0 and a1 (pass mode): no transition
until b2. Then a0 moves to b2 and the coding continues.
• If a1 is detected before b2
– If distance between a1 and b1 (number of pixels) is less than or equal
to 3, we send the location of a1 with respect to b1, move a0
to a1 and the coding continues (vertical mode)
– If the distance between a1 and b1 is larger than 3, we go back to 1-D
run length coding and send the distance between a0 and a1 and
a1 and a2 (horizontal mode)
Still image
• compression
JPEG: most popular and comprehensive continuous tone, still
image compression standard.
• JPEG defines 3 different coding systems:
1. Baseline coding (lossy, very common)
2. Extended coding, for greater compression, higher precision or
progressive reconstruction applications
3. Lossless coding