Image Processing Cs
Image Processing Cs
SRI
KAILASH
WOMEN’S COLLEGE
(Affiliated to Periyar University)
Periyeri(village),Thalaivasal (Via)
Attur (Tk), Salem (Dt) -636112.
DEPARTMENT OF
COMPUTER SCIENCE
III CS -ODD SEM (2023-2026)
23UCSE07 - IMAGE
PROCESSING
2024-2025
SRI KAILASH WOMEN’S COLLEGE
IMAGE PROCESSING
UNIT-I
Digital Image Fundamentals: Image representation - Basic relationship between pixels, Elements of
DIP system -Applications of Digital Image Processing - 2D Systems - Classification of 2D Systems –
Mathematical Morphology- Structuring Elements- Morphological Image Processing - 2D Convolution - 2D
Convolution Through Graphical Method -2D Convolution Through Matrix Analysis
UNIT-II
2D Image transforms: Properties of 2D-DFT - Walsh transform - Hadamard transform- Haar
transform- Discrete Cosine Transform Karhunen- Loeve Transform -Singular Value Decomposition
UNIT-III
Image Enhancement: Spatial domain methods- Point processing Intensity transformations - Histogram
processing- Spatial filtering smoothing filter- Sharpening filters - Frequency domain methods: low pass
filtering, high pass Filtering- Homomorphic filter.
UNIT-IV
Image segmentation: Classification of Image segmentation techniques - Region approach – Clustering
techniques - Segmentation based on thresholding - Edge based segmentation - Classification of edges- Edge
detection - Hough transform- Active contour.
UNIT-V
Image Compression: Need for compression -Redundancy- Classification of image- Compression
schemes- Huffman coding- Arithmetic coding Dictionary based compression -Transform based compression,
Text Book:
1.S Jayaraman, S Esakkirajan, T Veerakumar, Digital image processing ,Tata McGraw Hill, 2015
2 .Gonzalez Rafel C, Digital Image Processing, Pearson Education, 2009
Reference Books:
1. Jain Anil K , Fundamentals of digital image processing: , PHI,1988
2. Kenneth R Castleman , Digital image processing:, Pearson Education,2/e,2003
3. Pratt William K , Digital Image Processing: , John Wiley,4/e,2007
SRI KAILASH WOMEN’S COLLEGE
UNIT-1
INTRODUCTION:
1. Digital Image
An image can be defined as two dimensional function, f(x,y). X and y are spatial coordinates
and the amplitude of f is called gray level (or intensitiy) of that point. If x, y and intensitiy values are all
finite, discrete quantities, this image can be defined as a digital image.
A digital image is a representation of a visual scene or object in a digital format, typically
consisting of a two-dimensional array of pixels. Digitalization implies that a digital image is
an approximation of a real scene.
Pixel values typically represent specific colors or shades, heights, opacities etc.
Image Formats:
SRI KAILASH WOMEN’S COLLEGE
Input: Image
Output: Image
Input: Image
Output: Attributes
Input: Attributes
Output: Understanding
1963: The first computerized digital image processing system, called the SAGE (Semi-
Automatic Ground Environment) system is developed by IBM for the US Air Force.
1965: NASA’s Jet Propulsion Laboratory develops the first digital image processing system
for satellite imagery.
1970s: The field of digital image processing begins to begins to be used in medical
applications with the development of new algorithms and techniques for image
enhancement, compression, and analysis.
2020s: The increasing use of computer vision and deep learning leads to a new wave of
applications in industry and society. Applications of Digital Image Processing
Image Enhancement: Techniques such as noise reduction, deblurring, and color correction
can be used.
Object or feature detection: Algorithms can be used to identify and extract specific objects
or features within an image, such as faces, vehicles, or text.
Image compression: Image compression techniques can be used to reduce the file size
while maintaining visual quality.
Medical imaging: Digital image processing is used extensively in medical applications,
such as X-ray and MRI image analysis, to help diagnose and treat illnesses.
Computer vision: Digital image processing is used for self-driving cars, security systems,
and robotics.
Augmented reality: Digital image processing is used to overlay virtual elements on real-
world images.
SRI KAILASH WOMEN’S COLLEGE
Image enhancement: Includes any operations that are applied to the image to improve its
overall visual quality. Image sharpening, contrast enhancement, and edge detection…
SRI KAILASH WOMEN’S COLLEGE
Image restoration: Includes any operations that are applied to the image to restore it to its
original form, for example, deblurring, inpainting, and denoising.
SRI KAILASH WOMEN’S COLLEGE
Morphological processing: These techniques are used to analyze and manipulate the shape
and structure of objects within an image.
Image segmentation: Involves partitioning the image into multiple regions, or segments,
which correspond to different objects or features in the scene.
SRI KAILASH WOMEN’S COLLEGE
1. Image Representation: Images can be represented in various formats, including binary, grayscale, and
color.
2. Pixel: A pixel is the smallest unit of an image, representing a single point in the image.
3. Resolution: The resolution of an image refers to the number of pixels in the image.
4. Bit Depth: The bit depth of an image refers to the number of bits used to represent each pixel.
Image Types
1. Binary Images: Binary images are images where each pixel is represented by a single bit (0 or 1).
2. Grayscale Images: Grayscale images are images where each pixel is represented by a range of gray levels.
3. Color Images: Color images are images where each pixel is represented by a combination of red, green,
Image Processing
1. Image Enhancement: Image enhancement techniques are used to improve the quality of an image.
2. Image Restoration: Image restoration techniques are used to restore an image to its original state.
Applications
1. Medical Imaging: Digital image processing is widely used in medical imaging applications, such as MRI
and CT scans.
2. Surveillance: Digital image processing is used in surveillance systems to detect and track objects.
3. Entertainment: Digital image processing is used in the entertainment industry to create special effects and
animation.
Animageprocessingsystemisacombinationofhardware,software,andalgorithmsthatwork
together to manipulate and analyze images. Here’s a breakdown of its key components by an image
below,
SRI KAILASH WOMEN’S COLLEGE
1. Image Sensors
Two elements are reqired to acqired digital image. The first is a physical divice that is
sensitive to the energy radiated by the object we wish to image.
The second, called a digitizer, is a device for converting the output of the physical sensing
device into digital form.
2. Image Processing Hardware
Usually consists of the digitizer, plus hardware that performs other primitive operations,
such as an arithmetic logic unit (ALU),that performs arithmetic and logical operations in
parallel on entire images.
This type of hardware sometimes is called a Front-end Sub-system, and it’s most
distinguishing characteristic is speed.
This unit performs functions that require fast data throughputs that the typical main
computer can’t handle.
SRI KAILASH WOMEN’S COLLEGE
3. Computer
A general-purpose computer an range from a PC to a super-computer. In dedicated
applications, sometimes custom computers are used to a chive level of performance
Ingeneralsystem,almostanywell-eqippedPC-typesmachineissuitableforoff-lineimage
processesing.
4. Image Processing Software
Moresophisticatedsoftwarepackagesallowtheintegrationofthosemodulesandgeneral-
5. Mass storage
Amustinimageprocessingapplications,animageofsize1024x1024pixels,inwhichthe
intensity of each pixel is an 8-bit quantity, requires one megabytes of storage space if the
Whendealingwiththousands,orevenmillions,ofimagesprovidingadequatestorageinan image
Digitalstorageforimageprocessingapplicationsfallsinto3principlecategories
storage is measured in
— Bytes(8bits)
SRI KAILASH WOMEN’S COLLEGE
— Kilobytes(1thousandsbytes)
— Megabytes(1millionbytes)
— Gigabytes(1billionbytes)
6. Imagedisplay
Mainly color TV monitors are driven by the outputs of image and graphics display cards
that are an integral part of the computer system
Seldomarethererequirementsforimagedisplayapplicationsthatcan’tbemetbydisplay cards
available commercially as part of the computer system
Insomecases,itisnecessarytohavestereodisplays,andtheseareimplementedintheform of
headgear containing two small display embedded in goggles worn by the user
7. Hardcopy
Devices for recoding images include laser printers, film cameras heat-sensitive devices,
inkjet units and digital units, such as optical and CD-ROM disks.
Films provides the highest possible resolution, but paper is the obvious medium if image
project equipment is used.
Thelatterapprochisgainingacceptanceasthestandardforimagepresentations.
8. Networking
Almostadefaultfunctioninanycomputersysteminusetoday.Becauseofthelargeamount of data
inherent in image processing applications, the key consideration in image transmission is
bandwidth.
Fortunately, this situation is imporving quickly as a result of optical fiber and other
broadband technologies, etc.
SRI KAILASH WOMEN’S COLLEGE
Neighborhood
Adjacency
Paths
Connectivity
Regions
Boundaries
Neighbors of a pixel – N4(p)
Any pixel p(x, y) has two vertical and two horizontal neighbors, given by
(x+1, y),
(x-1, y),
(x, y+1),
(x, y-1)
This set of pixels are called the 4-neighbors of P, and is denoted by N4(P).
x , y+1
x , y-1
x,y
Adjacency
Path
set of pixels lying in some adjacency definition
4-adjacency à 4-path
8-adjacency à 8-path
m-adjacency à m-path
path length ?
Number of pixels involved
Connectivity
Let Sà subset of pixels in an image
SRI KAILASH WOMEN’S COLLEGE
Two pixels p and q are said to be connected in S if there exist a path between them consisting entirely
of pixels in S.
For any pixel p in S the set of pixels that are connected to it in S is called connected component of S.
If S has only one connected component, then it is called connected set.
Region
A connected set is also called a Region.
Two regions (let Ri and Rj) are said to be adjacent if their union forms a connected set. Adjacent
Regions or joint regions
Regions that are not adjacent are said to be disjoint regions.
4- and 8-adjacency is considered when referring to regions (author)
Discussing a particular region, type of adjacency must be specified.
Fig2.25d the two regions are adjacent only if 8-adjacency is considered
Foreground and Background
Suppose an image contain K disjoint regions Rk , k=1,2,3,…K, none of which touches the image
border
Let Ru denote the union of all the K regions.
Let (Ru)c denote its compliment.
We call all the points in Ru the foreground and all the points in (Ru)c the background
Boundary
The boundary (border or contour) of a region R is the set of points that are adjacent to the points
in the complement of R.
Set of pixels in the region that have at least one background neighbor.
The boundary of the region R is the set of pixels in the region that have one or more neighbors
that are not in R.
Inner Border: Border of Foreground
Outer Border: Border of Background
If R happens to be entire Image?
There is a difference between boundary and edge in Digital Image Paradigm. The author refers this
discussion to chapter 10.
SRI KAILASH WOMEN’S COLLEGE
A system is said to be linear when it satisfies superposition and homogenate principles. Consider two systems
with inputs as x 1(t), x 2(t), and outputs as y1 (t), y2 (t) respectively. Then, according to the superposition and
homogenate principles,
T [a1 x1(t) + a2 x2(t)] = a1 T[x 1(t)] + a2 T[x2(t)]
∴,∴, T [a1 x1(t) + a2 x2(t)] = a1 y1 (t) + a2 y2 (t)
From the above expression, is clear that response of overall system is equal to response of individual system.
SRI KAILASH WOMEN’S COLLEGE
Example:
(t) = x 2(t)
Solution:
y1 (t) = T[x 1(t)] = x12(t)
y2 (t) = T[x 2(t)] = x22(t)
T [a1 x1(t) + a2 x2(t)] = [ a1 x1(t) + a2 x2(t)]2
Which is not equal to a1 y1 (t) + a2 y2 (t). Hence the system is said to be non linear.
A system is said to be time variant if its input and output characteristics vary with time. Otherwise, the system
is considered as time invariant.
Example:
y(n) = x(-n)
y(n, t) = T[x(n-t)] = x(-n-t)
y(n-t) = x(-(n-t)) = x(-n + t)
∴∴ y(n, t) y(n-t). Hence, the system is time variant.
linear Time variant (LTV) and linear Time Invariant (LTI) Systems
If a system is both linear and time variant, then it is called linear time variant (LTV) system.
SRI KAILASH WOMEN’S COLLEGE
If a system is both linear and time Invariant then that system is called linear time invariant (LTI) system.
For present value t=0, the system output is y(0) = 2x(0). Here, the output is only dependent upon present input.
Hence the system is memory less or static.
For present value t=0, the system output is y(0) = 2x(0) + 3x(-3).
Here x(-3) is past value for the present input for which the system requires memory to get this output. Hence,
the system is a dynamic system.
A system is said to be causal if its output depends upon present and past inputs, and does not depend upon
future input.
For non causal system, the output depends upon future inputs also.
For present value t=1, the system output is y(1) = 2x(1) + 3x(-2).
Here, the system output only depends upon present and past inputs. Hence, the system is causal.
For present value t=1, the system output is y(1) = 2x(1) + 3x(-2) + 6x(4) Here, the system output depends
upon future input. Hence the system is non-causal system.
SRI KAILASH WOMEN’S COLLEGE
A system is said to invertible if the input of the system appears at the output.
The system is said to be stable only when the output is bounded for bounded input. For a bounded input, if
the output is unbounded in the system then it is said to be unstable.
Let the input is u(t) (unit step bounded input) then the output y(t) = u2(t) = u(t) = bounded output.
MATHEMATICAL MORPHOLOGY
Mathematical Morphology is a tool for extracting image components that are useful for
representation and description. The technique was originally developed by Matheron and
Serra at the Ecole des Mines in Paris.
It is a set-theoretic method of image analysis providing a quantitative description of
geometrical structures. (At the Ecole des Mines they were interested in analysing geological
data and the structure of materials).
Morphology can provide boundaries of objects, their skeletons, and their convex hulls. It is
also useful for many pre- and post-processing techniques, especially in edge thinning and
pruning.
Generally speaking most morphological operations are based on simple expanding and
shrinking operations.
The primary application of morphology occurs in binary images, though it is also used on grey
level images. It can also be useful on range images.
(A range image is one where grey levels represent the distance from the sensor to the objects
in the scene rather than the intensity of light reflected from them).
Set operations
The two basic morphological set transformations are erosion and dilation
These transformations involve the interaction between an image A (the object of interest) and a
structuring set B, called the structuring element.
Typically the structuring element B is a circular disc in the plane, but it can be any shape. The image
and structuring element sets need not be restricted to sets in the 2D plane, but could be defined in 1,
2, 3 (or higher) dimensions.
The complement of A is denoted Ac, and the difference of two sets A and B is denoted A - B.
Dilation
The result is a new set made up of all points generated by obtaining the reflection of B about its origin and
then shifting this relection by x.
Consider the example where A is a rectangle and B is a disc centred on the origin. (Note that if B is not centred
on the origin we will get a translation of the object as well.) Since B is symmetric, .
Figure 3: A is dilated by the structuring element B.
This definition becomes very intuitive when the structuring element B is viewed as a convolution mask.
Erosion
Figure 4: A is eroded by the structuring element B to give the internal dashed shape.
SRI KAILASH WOMEN’S COLLEGE
Dilation and erosion are duals of each other with respect to set complementation and reflection. That is,
But the complement of the set of all xs that satisfy is just the set of all xs such
that . Thus
Two very important transformations are opening and closing. Now intuitively, dilation expands an
image object and erosion shrinks it.
Opening generally smooths a contour in an image, breaking narrow isthmuses and eliminating thin
protrusions.
Closing tends to narrow smooth sections of contours, fusing narrow breaks and long thin gulfs,
eliminating small holes, and filling gaps in contours.
The opening of A by B, denoted by , is given by the erosion by B, followed by the dilation by B, that
is
Figure 5: The opening (given by the dark dashed lines) of A (given by the solid lines. The structuring
element B is a disc. The internal dashed structure is A eroded by B.
Opening is like `rounding from the inside': the opening of A by B is obtained by taking the union of all
translates of B that fit inside A. Parts of A that are smaller than B are removed. Thus
Closing is the dual operation of opening and is denoted by . It is produced by the dilation of A by B,
followed by the erosion by B:
This is like `smoothing from the outside'. Holes are filled in and narrow valleys are `closed'.
Just as with dilation and erosion, opening and closing are dual operations. That is
SRI KAILASH WOMEN’S COLLEGE
.
Similarly
1.
A is a subset of .
2.
If C is a subset of D, then is a subset of .
3.
.
Property 3, in both cases, is known as idempotency. It means that any application of the operation
more than once will have no further effect on the result.
The morphological filter can be used to eliminate `salt and pepper' noise. Salt and
pepper noise is random, uniformly distributed small noisy elements often found corrupting real
images. The important thing to note is that morphological operations preserve the main geometric
structures of the object. Only features `smaller than' the structuring element are affected by
transformations. All other features at `larger scales' are not degraded. (This is not the case with linear
transformations, such as convolution).
The boundary of a set A, denoted , can be obtained by first eroding A with B, where B is a suitable
structuring element, and then performing the set difference between A and its erosion. That is
Region filling can be accomplished iteratively using dilations, complementation, and intersections.
Suppose we have an image A containing a subset whose elements are 8-connected boundary points of
a region. Beginning with a point p inside the boundary, the objective is to fill the entire region with
1s.
Since, by assumption, all non-boundary points are labeled 0, we begin by assigning 1 to p, and then construct
where X0 = p, and B is the `cross' structuring element shown in figure 8. The algorithm terminates
when Xk = Xk-1. The set union of Xk and A contains the filled set and its boundary.
Likewise, connected components can also be extracted using morphological operations. If Y represents
a connected component in an image A and a point p in Y is known, then the following iterative
expression yields all the elements of Y:
where X0 = p and B is a matrix of 1s. If Xk = Xk-1 the algorithm has converged and we let Y = Xk.
An important step in representing the structural shape of a planar region is to reduce it to a graph. This
is very commonly used in robot path planning. This reduction is most commonly achieved by reducing
the region to its skeleton.
The skeleton of a region is defined by the medial axis transformation (MAT). The MAT of a
region R with border B is defined as follows: for each point p in R, we find its closest neighbour in B.
SRI KAILASH WOMEN’S COLLEGE
If p has more than one such closest neighbour, then p belongs to the medial axis (or skeleton) of R. Of
course, closest depends on the metric used. Figure 9 shows some examples with the usual Euclidean
metric.
Direct implementation of the MAT is computationally prohibitive. However, the skeleton of a set can be
expressed in terms of erosions and openings. Thus, it can be shown that
where
B is a structuring element, indicates k successive erosions of A, and K is the last iterative step
before A erodes to an empty set.
Thus A can be reconstructed from its skeleton subsets Sk(A) using the equation
SRI KAILASH WOMEN’S COLLEGE
STRUCTURING ELEMENTS
Morphological operators that change the shape of particles process a pixel based on its number of
neighbors and the values of those neighbors.
A neighbor is a pixel whose value affects the values of nearby pixels during certain image processing
functions.
Morphological transformations use a 2D binary mask called a structuring element to define the size
and effect of the neighborhood on each pixel, controlling the effect of the binary morphological
functions on the shape and the boundary of a particle.
When to Use
Use a structuring element when you perform any primary binary morphology operation or the Separation
advanced binary morphology operation. You can modify the size and the values of a structuring element
to alter the shape of particles in a specific way.
Concepts
The size and contents of a structuring element specify which pixels a morphological operation takes
into account when determining the new value of the pixel being processed.
A structuring element must have an odd-sized axis to accommodate a center pixel, which is the pixel
being processed.
The contents of the structuring element are always binary, composed of 1 and 0 values. The most
common structuring element is a 3 × 3 matrix containing values of 1.
This matrix, shown below, is the default structuring element for most binary and grayscale
morphological transformations.
1 1 1
1 1 1
1 1 1
Three factors influence how a structuring element defines which pixels to process during a
morphological transformation:
SRI KAILASH WOMEN’S COLLEGE
Using structuring elements requires an image border. A 3 × 3 structuring element requires a minimum
border size of 1. In the same way, structuring elements of 5 × 5 and 7 × 7 require a minimum border
size of 2 and 3, respectively. Bigger structuring elements require corresponding increases in the image
border size.
Note NI Vision images have a default border size of 3. This border size enables you to
use structuring elements as large as 7 × 7 without any modification. If you plan to use
structuring elements larger than 7 × 7, specify a correspondingly larger border when
creating your image.
The size of the structuring element determines the speed of the morphological transformation. The smaller
the structuring element, the faster the transformation.
New P0 value
Note Pixels in the image do not physically shift in a horizontal pixel frame.
Functions that allow you to set the pixel frame shape merely process the pixel
values differently when you specify a hexagonal frame.
The following figure illustrates the difference between a square and hexagonal pixel frame when a
3 × 3 and a 5 × 5 structuring element are applied.
SRI KAILASH WOMEN’S COLLEGE
Square 3 × 3 Hexagonal 3 × 3
Square 5 × 5 Hexagonal 5 × 5
If a morphological function uses a 3 × 3 structuring element and a hexagonal frame mode, the
transformation does not consider the elements [2, 0] and [2, 2] when calculating the effect of the
neighbors on the pixel being processed.
If a morphological function uses a 5 × 5 structuring element and a hexagonal frame mode, the
transformation does not consider the elements [0, 0], [4, 0], [4, 1], [4, 3], [0, 4], and [4, 4].
The following figure illustrates a morphological transformation using a 3 × 3 structuring element and a
rectangular frame mode.
Structuring
Image
Element
0 1 0 p1 p2 p3
1 1 1 p4 p0 p5
SRI KAILASH WOMEN’S COLLEGE
0 1 0 p6 p7 p8
The following figure illustrates a morphological transformation using a 3 × 3 structuring element and a
hexagonal frame mode.
Structuring
Image
Element
0 1 0 p1 p2
0 1 0 p5 p6
The following table illustrates the effect of the pixel frame shape on a neighborhood given three structuring
element sizes. The gray boxes indicate the neighbors of each black center pixel.
3×3
5×5
SRI KAILASH WOMEN’S COLLEGE
7×7
The word ‘Morphology’ generally represents a branch of biology that deals with the form and structure
of animals and plants. However, we use the same term in ‘mathematical morphology’ to extract image
components useful in representing region shape, boundaries, etc.
Morphology is a comprehensive set of image processing operations that process images based on shapes
Morphological operations apply a structuring element to an input image, creating an output image of
the same size. In a morphological operation, the value of each pixel in the output image is based on a
comparison of the corresponding pixel in the input image with its neighbors.
Structuring Element: It is a matrix or a small-sized template that is used to traverse an image. The
structuring element is positioned at all possible locations in the image, and it is compared with the connected
pixels. It can be of any shape.
Fit: When all the pixels in the structuring element cover the pixels of the object, we call it Fit.
Hit: When at least one of the pixels in the structuring element cover the pixels of the object, we call it Hit.
Miss: When no pixel in the structuring element cover the pixels of the object, we call it miss.
Morphological Operations
Fundamentally morphological image processing is similar to spatial filtering. The structuring element
is moved across every pixel in the original image to give a pixel in a new processed image.
SRI KAILASH WOMEN’S COLLEGE
The value of this new pixel depends on the morphological operation performed. The two most widely
used operations are Erosion and Dilation.
1. Erosion
Erosion shrinks the image pixels, or erosion removes pixels on object boundaries. First, we traverse
the structuring element over the image object to perform an erosion operation, as shown in Figure 4.
The output pixel values are calculated using the following equation.
Pixel (output) = 1 {if FIT}
Pixel (output) = 0 {otherwise}
Figure 4. Erosion operation on an input image using a structuring element. (Source: Image by the author)
An example of Erosion is shown in Figure 5. Figure 5(a) represents original image, 5(b) and 5(c) shows
processed images after erosion using 3x3 and 5x5 structuring elements respectively.
SRI KAILASH WOMEN’S COLLEGE
Figure 5. Results of structuring element size in erosion. (Source: Image by the author)
Properties:
2. Dilation
Dilation expands the image pixels, or it adds pixels on object boundaries. First, we traverse the structuring
element over the image object to perform an dilation operation, as shown in Figure 7. The output pixel values
are calculated using the following equation.
Pixel (output) = 1 {if HIT}
Pixel (output) = 0 {otherwise}
Figure 7. Dilation operation on an input image using a structuring element. (Source: Image by the author)
An example of Dilation is shown in Figure 8. Figure 8(a) represents original image, 8(b) and 8(c) shows
processed images after dilation using 3x3 and 5x5 structuring elements respectively.
SRI KAILASH WOMEN’S COLLEGE
Figure 8. Results of structuring element size in dilation. (Source: Image by the author)
Properties:
Compound Operations
Most morphological operations are not performed using either dilation or erosion; instead, they are
performed by using both. Two most widely used compound operations are:
(a) Closing (by first performing dilation and then erosion), and
Figure 10. Output of Compound operations on an input object. (Source: Image by the author)
Extracting the boundary is an important process to gain information and understand the feature of an
image. It is the first process in preprocessing to present the image’s characteristics.
This process can help the researcher to acquire data from the image. We can perform boundary
extraction of an object by following the below steps.
Step 1. Create an image (E) by erosion process; this will shrink the image slightly. The kernel size of the
structuring element can be varied accordingly.
Step 2. Subtract image E from the original image. By performing this step, we get the boundary of our object.
2D Convolution
The Definition of 2D Convolution
Convolution involving one-dimensional signals is referred to as 1D convolution or just convolution.
Otherwise, if the convolution is performed between two signals spanning along two mutually
perpendicular dimensions (i.e., if signals are two-dimensional in nature), then it will be referred to as
2D convolution.
SRI KAILASH WOMEN’S COLLEGE
This concept can be extended to involve multi-dimensional signals due to which we can have multi-
dimensional convolution.
In the digital domain, convolution is performed by multiplying and accumulating the instantaneous
values of the overlapping samples corresponding to two input signals, one of which is flipped.
This definition of 1D convolution is applicable even for 2D convolution except that, in the latter case,
one of the inputs is flipped twice.
This kind of operation is extensively used in the field of digital image processing wherein the 2D
matrix representing the image will be convolved with a comparatively smaller matrix called 2D kernel.
An Example of 2D Convolution
Let's try to compute the pixel value of the output image resulting from the convolution of 5×5 sized image
matrix x with the kernel h of size 3×3, shown below in Figure 1.
Figure 1: Input matrices, where x represents the original image and h represents the kernel. Image created
by Sneha H.L.
To accomplish this, the step-by-step procedure to be followed is outlined below.
Step 1: Matrix inversion
This step involves flipping of the kernel along, say, rows followed by a flip along its columns, as shown in
Figure 2.
SRI KAILASH WOMEN’S COLLEGE
Figure 3a, 3b. Convolution results obtained for the output pixels at location (1,1) and (1,2). Image created
by Sneha H.L.
Figure 3c, 3d: Convolution results obtained for the output pixels at location (1,4) and (1,7). Image created
by Sneha H.L.
Advancing similarly, all the pixel values of the first row in the output image can be computed. Two
such examples corresponding to fourth and seventh output pixels of the output matrix are shown in
the figures 3c and 3d, respectively.
If we further slide the kernel along the same row, none of the pixels in the kernel overlap with those
in the image. This indicates that we are done along the present row.
Move Down Vertically, Advance Horizontally
The next step would be to advance vertically down by a single pixel before restarting to move
horizontally. The first overlap which would then occur is as shown in Figure 4a and by performing the
MAC operation over them; we get the result as 25 × 0 + 50 × 1 = 50.
Following this, we can slide the kernel in horizontal direction till there are no more values which
overlap between the kernel and the image matrices. One such case corresponding to the sixth pixel
value of the output matrix (= 49 × 0 + 130 × 1 + 70 × 1 + 100 × 0 = 200) is shown in Figure 4b.
SRI KAILASH WOMEN’S COLLEGE
Figure 4a, 4b. Convolution results obtained for the output pixels at location (2,1) and (2,6). Image created
by Sneha H.L.
This process of moving one step down followed by horizontal scanning has to be continued until the
last row of the image matrix. Three random examples concerned with the pixel outputs at the locations
(4,3), (6,5) and (8,6) are shown in Figures 5a-c.
SRI KAILASH WOMEN’S COLLEGE
Figure 5a. Convolution results obtained for the output pixels at (4,3). Image created by Sneha H.L.
Figure 5b. Convolution results obtained for the output pixels at (6,5). Image created by Sneha H.L.
Figure 5c. Convolution results obtained for the output pixels at (8,6). Image created by Sneha H.L.
Step
Hence the resultant output matrix will be:
SRI KAILASH WOMEN’S COLLEGE
Figure 6. Our example's resulting output matrix. Image created by Sneha H.L.
Zero Padding
The mathematical formulation of 2-D convolution is given by
y [i , j] = ∞ ∑ m = − ∞∞ ∑ n = − ∞ h[ m , n ] ⋅ x [i −m , j −n ] y[i,j]=∑m=−∞∞∑n=−
∞∞h[m,n]⋅x[i−m,j−n]
where, x represents the input image matrix to be convolved with the kernel matrix h to result in a new matrix y,
representing the output image. Here, the indices i and j are concerned with the image matrices while those
of m and n deal with that of the kernel. If the size of the kernel involved in convolution is 3 × 3, then the
indices m and n range from -1 to 1. For this case, an expansion of the presented formula results in
y [i , j] = ∞ ∑ m = − ∞ h[ m , − 1] ⋅ x [i −m , j + 1] + h[ m , 0] ⋅ x [i −m , j − 0]
+ h[ m , 1] ⋅ x [i −m , j − 1] y[i,j]=∑m=−∞∞h[m,−1]⋅x[i−m,j+1]+h[m,0]⋅x[i−m,j−0]+h[m,1]⋅x[i−
m,j−1]
SRI KAILASH WOMEN’S COLLEGE
y [i , j] = h[ − 1 , − 1] ⋅ x [i + 1 , j + 1] + h[ − 1 , 0] ⋅ x [i + 1 , j] + h[ − 1 , 1
j − 1] y[i,j]=h[−1,−1]⋅x[i+1,j+1]+h[−1,0]⋅x[i+1,j]+h[−1,1]⋅x[i+1,j−1]+h[0,−1]⋅x[i,j+1]+h[0,0]⋅x[i,j]+h[0,
1]⋅x[i,j−1]+h[1,−1]⋅x[i−1,j+1]+h[1,0]⋅x[i−1,j]+h[1,1]⋅x[i−1,j−1]
Figure 7: Zero-padding shown for the first pixel of the image (Drawn by me)
This process of adding extra zeros is known as zero padding and is required to be done in each case
where there are no image pixels to overlap the kernel pixels.
For our example, zero padding requires to be carried on for each and every pixel which lies along the
first two rows and columns as well as those which appear along the last two rows and columns (these
pixels are shown in blue font in Figure 8).
In general, the number of rows or columns to be zero-padded on each side of the input image is given
by (number of rows or columns in the kernel – 1).
SRI KAILASH WOMEN’S COLLEGE
Figure 8
2D CONVOLUTION THROUGH GRAPHICAL METHOD
Step-by-Step Process:
1. Kernel definition: Define the kernel/filter (e.g., 3x3 matrix).
2. Input image: Define the input image (e.g., 5x5 matrix).
3. Positioning: Position the kernel over the top-left corner of the input image.
4. Element-wise multiplication: Perform element-wise multiplication between the kernel and the
corresponding region of the input image.
5. Summation: Calculate the sum of the products.
6. Output: Store the result in the output image.
7. Sliding: Slide the kernel one pixel to the right and repeat steps 4-6.
8. Repeat: Continue sliding the kernel over the entire input image.
Graphical Representation:
The graphical method can be represented as a sliding window operation, where the kernel is slid over the input
image, performing element-wise multiplication and summation at each position.
Example:
Suppose we have a 3x3 kernel and a 5x5 input image. The graphical method would involve sliding the kernel
over the input image, performing element-wise multiplication and summation at each position.
Advantages:
1. Intuitive understanding: The graphical method provides an intuitive understanding of the convolution
process.
SRI KAILASH WOMEN’S COLLEGE
5-MARK QUESTIONS:
10-MARK QUESTIONS:
1. Explain the concept of 2D systems in digital image processing. Discuss its classification.
2. Describe the process of 2D convolution through graphical method.
3. Explain the matrix analysis approach for 2D convolution.
4. Discuss the applications of digital image processing in various fields.
5. Explain the concept of structuring elements in mathematical morphology. Discuss its role in morphological
image processing.
SRI KAILASH WOMEN’S COLLEGE
MCQ
1. What is the primary purpose of image representation?
A) To compress images
B) To enhance images
C) To represent images in a digital format
D) To segment images
Answer: C) To represent images in a digital format
2. Which of the following is a type of image representation?
A) Binary
B) Grayscale
C) Color
D) All of the above
Answer: D) All of the above
3. What is the term for the number of bits used to represent each pixel?
A) Bit depth
B) Pixel depth
C) Image depth
D) None of the above
Answer: A) Bit depth
4. Which of the following image representations uses 1 bit per pixel?
A) Binary
B) Grayscale
C) Color
D) None of the above
Answer: A) Binary
5. What is the term for the number of pixels in an image?
A) Resolution
B) Size
C) Depth
D) None of the above
SRI KAILASH WOMEN’S COLLEGE
Answer: A) Resolution
Basic Relationship between Pixels (5)
6. What is the term for the pixels that are directly adjacent to a given pixel?
A) Neighbors
B) Adjacent pixels
C) Connected pixels
D) All of the above
Answer: D) All of the above
7. Which of the following is a type of pixel neighborhood?
A) 4-connected
B) 8-connected
C) Both A and B
D) Neither A nor B
Answer: C) Both A and B
8. What is the term for the distance between two pixels?
A) Euclidean distance
B) Manhattan distance
C) Both A and B
D) Neither A nor B
Answer: C) Both A and B
9. Which of the following is used to measure the similarity between two images?
A) Mean squared error
B) Peak signal-to-noise ratio
C) Both A and B
D) Neither A nor B
Answer: C) Both A and B
10. What is the term for the process of assigning a value to each pixel based on its neighbors?
A) Filtering
B) Thresholding
C) Segmentation
D) None of the above
Answer: A) Filtering
SRI KAILASH WOMEN’S COLLEGE
C) A technique for analyzing and manipulating images based on shape and structure
D) None of the above
Answer: C) A technique for analyzing and manipulating images based on shape and structure
27. Which of the following is a morphological operation?
A) Erosion
B) Dilation
C) Both A and B
D) Neither A nor B
Answer: C) Both A and B
28. What is the term for the process of shrinking an image using a structuring element?
A) Erosion
B) Dilation
C) Opening
D) Closing
Answer: A) Erosion
29. Which of the following is a type of structuring element?
A) Disk-shaped
B) Square-shaped
C) Both A and B
D) Neither A nor B
Answer: C) Both A and B
30. What is the term for the process of expanding an image using a structuring element?
A) Erosion
B) Dilation
C) Opening
D) Closing
Answer: B) Dilation
2D Convolution (5)
31. What is 2D convolution?
A) A technique for image compression
B) A technique for image enhancement
C) A mathematical operation that combines two images
SRI KAILASH WOMEN’S COLLEGE
A) Efficient computation
B) Easy implementation
C) Both A and B
D) Neither A nor B
Answer: C) Both A and B
43. What is the term for the matrix representation of the kernel in 2D convolution?
A) Toeplitz matrix
B) Convolution matrix
C) Both A and B
D) Neither A nor B
Answer: A) Toeplitz matrix
44. Which of the following is a type of matrix operation used in 2D convolution?
A) Matrix multiplication
B) Matrix addition
C) Both A and B
D) Neither A nor B
Answer: A) Matrix multiplication
45. What is the primary advantage of the matrix analysis approach for 2D convolution?
A) Efficient computation
B) Visual understanding
C) Both A and B
D) Neither A nor B
Answer: A) Efficient computation
Miscellaneous (5)
46. What is the term for the process of enhancing the quality of an image?
A) Image enhancement
B) Image restoration
C) Both A and B
D) Neither A nor B
Answer: A) Image enhancement
47. Which of the following is an application of digital image processing?
A) Medical imaging
SRI KAILASH WOMEN’S COLLEGE
B) Surveillance
C) Both A and B
D) Neither A nor B
Answer: C) Both A and B
48. What is the term for the process of analyzing and manipulating images based on shape and structure?
A) Mathematical morphology
B) Image processing
C) Both A and B
D) Neither A nor B
Answer: A) Mathematical morphology
49. Which of the following is a type of image representation?
A) Binary
B) Grayscale
C) Both A and B
D) Neither A nor B
Answer: C) Both A and B
50. What is the term for the number of pixels in an image?
A) Resolution
B) Size
C) Both A and B
D) Neither A nor B
Answer: A) Resolution
UNIT I COMPLETED
SRI KAILASH WOMEN’S COLLEGE
UNIT-2
2D Image transforms:
Types of 2D Image Transforms
1. Fourier Transform: Decomposes an image into its frequency components, useful for filtering, analysis,
and feature extraction.
2. Discrete Cosine Transform (DCT): Used in image and video compression (e.g., JPEG, MPEG).
3. Wavelet Transform: Represents images at multiple scales, useful for denoising, compression, and feature
extraction.
4. Hough Transform: Used for detecting lines, circles, and other shapes in images.
Applications of 2D Image Transforms
1. Image Filtering: Removing noise, enhancing features, and improving image quality.
2. Image Compression: Reducing the size of images while preserving their quality.
3. Feature Extraction: Extracting relevant features from images for object recognition, classification, and
tracking.
4. Image Registration: Aligning multiple images of the same scene taken at different times or from different
viewpoints.
Benefits of 2D Image Transforms
1. Improved Image Quality: Enhancing image features and removing noise.
2. Reduced Data Size: Compressing images while preserving their quality.
3. Efficient Feature Extraction: Extracting relevant features for object recognition and classification.
4. Robust Image Analysis: Analyzing images in the frequency domain or other transformed domains.
Some popular libraries for implementing 2D image transforms include:
1. OpenCV: A computer vision library with built-in functions for image transforms.
2. Matlab: A programming language with built-in functions for image processing and analysis.
3. Python libraries: Such as NumPy, SciPy, and scikit-image, which provide functions for image transforms.
PROPERTIES OF 2D
Properties of 2D Images
1. Spatial Domain: 2D images are represented as a function of spatial coordinates (x, y).
2. Pixel-based: 2D images are composed of pixels, each with a value representing intensity or color.
3. Finite Size: 2D images have a finite size, defined by their width and height.
Properties of 2D Transforms
SRI KAILASH WOMEN’S COLLEGE
1. Linearity: Many 2D transforms, such as the Fourier transform, are linear, meaning that the transform of a
sum is the sum of the transforms.
2. Shift Invariance: Some 2D transforms, such as the magnitude of the Fourier transform, are shift-invariant,
meaning that the transform does not change when the image is shifted.
3. Rotation Invariance: Some 2D transforms, such as the Fourier transform magnitude, can be made
rotation-invariant, meaning that the transform does not change when the image is rotated.
4. Scalability: 2D transforms can be applied to images of various sizes and resolutions.
Properties of Specific 2D Transforms
1. Fourier Transform: Decomposes an image into its frequency components, with properties such as:
- Frequency domain representation
- Periodicity
- Symmetry
2. Discrete Cosine Transform (DCT): Used in image and video compression, with properties such as:
- Energy compaction
- Decorrelation
- Fast computation
DFT
Discrete Fourier Transformation(DFT): Understanding Discrete Fourier Transforms is the
essential objective here. The Inverse is merely a mathematical rearrangement of the other and is
quite simple.
Fourier Transforms is converting a function from the time domain to the frequency. One may
assert that Discrete Fourier Transforms do the same, except for discretized signals.
The difference has been explained below:
DFTs are calculated for sequences of finite length while DTFTs are for infinite lengths. This is
why the summation in DTFTs ranges from -∞ to +∞.
DTFTs are characterized by output frequencies that are continuous in nature, i.e., ω. DFTs, on
the other hand, give an output that has discretized frequencies.
DTFTs are equal to DFTs only for sampled values of ω. That is the only way by which we
derive one from the other.
SRI KAILASH WOMEN’S COLLEGE
The general expressions for DFT and IDFT are as follows. Note that the integral values of k are
taken starting from 0 and counting till N-1. k is simply a variable used to refer to the sampled
value of the function.
since IDFT is the inverse of DFT, so k is not used. Instead, 'n' is used. Many find it confusing
which is which. Take it as a stress-free activity to associate DFTs with a capital 'X' and IDFTs
with the small case 'x'.
Equation for DFT:
X(k)=∑n=0N−1 x[n].e−j2πknNX(k)=∑n=0N−1 x[n].eN−j2πkn
Equation for IDFT:
x(n)=∑k=0N−1 X[k].ej2πknNx(n)=∑k=0N−1 X[k].eNj2πkn
The first thing that comes to mind for coding the above expression is to start with a summation. In
practice, this is achieved by running a loop and iterating over different values of n (in DFT)
and k (in IDFT).
find different values of the output. When k=1, one may compute X[k=1] quite easily.
such as plotting the magnitude spectrum, one must compute the same for different values of k as
well. Therefore, one must introduce two loops or a pair of nested loops.
SRI KAILASH WOMEN’S COLLEGE
Yet another concern is how to translate the second half of the expression, which is the Euler's
constant raised to a complex exponent. Readers must recall the formula which helps describe the
Euler's constant raised to a complex number in terms of sines and cosines. This is as follows-
eiθ=cos(θ)−jsin(θ) eiθ=cos(θ)−jsin(θ)
This leaves us to interpret the second half of the summation term as follows-
e−j2πkn/N=cos(2πknN)−jsin(2πknN)e−j2πkn/N=cos(N2πkn)−jsin(N2πkn)
It is possible to import libraries (in the case of C) where one might have a problem with ensuring
code legibility when it comes to writing this expression.
A rather intuitive perspective may be implemented as well - express the sequences as matrices and
use the vector form of DFT and IDFT for calculations. This is best worked out in MATLAB.
Algorithm (DFT):
Initialize all required libraries.
Prompt the user to input the number of points in the DFT.
Now you may initialize the arrays and accordingly ask for the input sequence. This is purely
due to the inability to declare an empty array in C. Dynamic memory allocation is one of the
solutions. However, simply reordering the prompt is a fair solution in itself.
Implement 2 loops that calculate the value of X(k) for a specific value of k and n. Keep in
mind that Euler's formula will be used to substitute for e-j2kπn/N. This requires a division where
we calculate the real and imaginary bits of the expression separately.
Display the result as you run the calculation.
WALSH TRANSFORM
What is the Walsh Transform?
The Walsh Transform is a mathematical operation that decomposes a signal or image into a set of
orthogonal basis functions, known as Walsh functions.
These functions take on only two values, +1 and -1, making them useful for binary or logical
operations.
How does the Walsh Transform work?
The Walsh Transform works by representing a signal or image as a linear combination of Walsh
functions.
The coefficients of this linear combination are calculated using the inner product of the signal or
image with each Walsh function.
Properties of Walsh Functions
SRI KAILASH WOMEN’S COLLEGE
1. Orthogonality: Walsh functions are orthogonal to each other, meaning that their inner product is zero.
2. Binary: Walsh functions take on only two values, +1 and -1.
3. Sequency: Walsh functions can be ordered by sequency, which is a measure of the number of zero
crossings.
4. Completeness: Walsh functions form a complete set, meaning that any signal or image can be represented
as a linear combination of Walsh functions.
Applications of the Walsh Transform
1. Image Compression: The Walsh Transform can be used for lossless image compression by representing
images using a subset of Walsh functions.
2. Signal Processing: The Walsh Transform can be used for signal filtering and analysis by representing
signals in the Walsh domain.
3. Data Compression: The Walsh Transform can be used for compressing binary data by representing data
using a subset of Walsh functions.
4. Coding Theory: The Walsh Transform is used in coding theory to construct error-correcting codes.
Advantages of the Walsh Transform
1. Fast Computation: The Walsh Transform can be computed using fast algorithms, making it suitable for
real-time applications.
2. Simple Implementation: The Walsh Transform can be implemented using simple logical operations,
making it suitable for hardware implementation.
3. Low Computational Complexity: The Walsh Transform has low computational complexity, making it
suitable for applications with limited computational resources.
Limitations of the Walsh Transform
1. Limited Representation: The Walsh Transform is limited in its ability to represent signals or images with
complex structures.
2. Not Suitable for All Applications: The Walsh Transform is not suitable for all applications, such as those
that require representation of signals or images with continuous values.
The Walsh transform of a function f on Vn (with the values of f taken to be real numbers 0 and 1 )
is the map W(f):Vn→R , defined by
W(f)(w)=∑x∈Vnf(x)(−1)w⋅x,
which defines the coefficients of f with respect to the orthonormal basis of the group
characters Qx(w)=(−1)w⋅x ; f can be recovered by the inverse Walsh transform
f(x)=2−n∑w∈VnW(f)(w)(−1)w⋅x.
SRI KAILASH WOMEN’S COLLEGE
The Walsh spectrum of f is the list of the 2n Walsh coefficients given by (2.1) as w varies.
The simplest Boolean functions are the constant functions 0 and 1. Obviously, W(0)(u)=0 and the Walsh
coefficients for the function 1 are given by the next lemma.
Lemma 2.6
If w∈Vn , we have
∑u∈Vn(−1)u⋅w={2nifw=0,0else .
Proof
First, if w=0, then all summands are 1. Now, assume w≠0, and consider
the hyperplanes H={u∈Vn:u⋅w=0}, H̄={u∈Vn:u⋅w=1}. Obviously, these hyperplanes generate a partition
of Vn. Moreover, for any u∈H, the summand is 1, and for any u∈H̄, the summand is −1. Since
the cardinalities of H,H̄ are the same, that is 2n−1, we have the lemma.
HADAMARD TRANSFORM
The Hadamard transform and the Haar transform, to be considered in the next section, share a
significant computational advantage over the previously considered DFT, DCT, and DST
transforms.
Their unitary matrices consist of and the transforms are computed via additions and subtractions
only, with no multiplications being involved.
Hence, for processors for which multiplication is a time-consuming operation a sustained saving is
obtained.
The Hadamard unitary matrix of order n is the matrix , generated by the following iteration
rule:
(4.5.1)
where
(4.5.2)
where A(i,j) is the (i,j) element of A,i,j=1,2...,N.. Thus, according to (4.5.1), (4.52) it is
SRI KAILASH WOMEN’S COLLEGE
And for
The Hadamard transform has good to very good energy packing properties. Fast algorithms for its
The starting point for the definition of the Haar transform is the Haar functions which are
defined in the closed interval .The order k of the function is uniquely decomposed into two
integers p,q
(4.6.1)
where
Table (4.5.1 ) summarizes the respective values for The Haar functions are
SRI KAILASH WOMEN’S COLLEGE
(4.6.2)
The Haar transform matrix of order L consists of rows resulting from the preceding functions computed at
(4.6.3)
The energy packing properties of the Haar transform are not very good. However, its importance for
us lies beyond that. We will use it as the vehicle to take us from the world of unitary transforms to
that of multiresolution analysis.
let us look carefully at the Haar transform matrix. We readily observe its sparse nature with a number
of zeros, whose location reveals an underlying cyclic shift mechanism. To satisfy our curiosity as to
why this happens, let us look at the haar transform from a different perspective
SRI KAILASH WOMEN’S COLLEGE
The KLT is also known as the Principal Component Analysis (PCA) transform in some contexts.
Both transforms are useful in various applications, including signal processing, image compression,
and feature extraction.
Loève Transform
What is the Karhunen-Loève Transform?
The Karhunen-Loève Transform (KLT) is a mathematical operation that represents a random process
or signal as a sum of orthogonal basis functions, optimized for the specific signal or process. It's a powerful
tool for signal processing, image analysis, and data compression.
How does the KLT work?
The KLT works by finding the eigenfunctions and eigenvalues of the covariance function of the signal or
process. These eigenfunctions form an orthogonal basis, and the signal can be represented as a linear
combination of these basis functions.
Properties of the KLT
1. Optimality: The KLT is optimal for decorrelating signals and minimizing mean squared error.
2. Orthogonality: The KLT basis functions are orthogonal to each other.
3. Signal-specific: The KLT basis functions are optimized for the specific signal or process.
4. Uncorrelated coefficients: The KLT coefficients are uncorrelated, making it useful for analysis and
compression.
Applications of the KLT
1. Signal Processing: The KLT is used in signal processing for filtering, analysis, and compression.
2. Image Processing: The KLT is used in image processing for image compression, feature extraction, and
denoising.
3. Data Compression: The KLT is used in data compression to reduce the dimensionality of data.
4. Feature Extraction: The KLT is used in feature extraction to identify patterns and structures in data.
Benefits of the KLT
SRI KAILASH WOMEN’S COLLEGE
1. Efficient Representation: The KLT provides an efficient representation of signals and images.
2. Decorrelation: The KLT decorrelates signals, making it useful for analysis and compression.
3. Optimal Compression: The KLT is optimal for compressing signals and images.
Limitations of the KLT
1. Computational Complexity: The KLT can be computationally expensive to compute.
2. Signal-specific: The KLT basis functions are optimized for the specific signal or process, making it less
flexible than other transforms.
Overall, the KLT is a powerful tool in signal processing and image analysis, and its applications
continue to grow in various fields.
SINGULAR VALUE DECOMPOSITION
What is SVD?
SVD is a factorization technique that decomposes a matrix A into three matrices:
1. U (orthogonal matrix): columns are left-singular vectors of A
2. Σ (diagonal matrix): contains singular values of A
3. V (orthogonal matrix): columns are right-singular vectors of A
Mathematical Representation
A Properties of SVD
= U Σ V^T
1. Rank Reduction: SVD helps reduce the rank of a matrix.
2. Dimensionality Reduction: SVD can be used for dimensionality reduction.
3. Data Compression: SVD can be used for data compression.
4. Orthogonality: U and V are orthogonal matrices.
Applications of SVD
1. Image Compression: SVD can be used to compress images.
2. Latent Semantic Analysis: SVD is used in latent semantic analysis for text analysis.
3. Recommendation Systems: SVD is used in recommendation systems.
4. Data Analysis: SVD is used in data analysis for dimensionality reduction and feature extraction.
5. Signal Processing: SVD is used in signal processing for noise reduction and filtering.
Benefits of SVD
1. Robustness to Noise: SVD is robust to noise in data.
2. Efficient Computation: SVD can be computed efficiently.
3. Insight into Data Structure: SVD provides insight into the structure of data.
SRI KAILASH WOMEN’S COLLEGE
5 Marks Questions
10 Marks Questions
1. Derive the 2D Discrete Fourier Transform (DFT) and explain its properties.
2. Explain the Walsh transform and its applications in image processing.
3. Discuss the properties and applications of the Discrete Cosine Transform (DCT).
4. Explain the Singular Value Decomposition (SVD) and its applications in image processing.
5. Compare and contrast the different 2D image transforms (DFT, DCT, Walsh, Hadamard, Haar, KLT,
SVD).
6. Explain the properties and applications of the Hadamard transform.
7. Derive the 2D Discrete Cosine Transform (DCT) and explain its properties.
8. Discuss the applications of the Karhunen-Loève Transform (KLT) in image processing.
9. Explain the Haar wavelet transform and its applications.
10. Compare the performance of different 2D image transforms in terms of energy compaction and
decorrelation.
SRI KAILASH WOMEN’S COLLEGE
MCQ
1. What is the primary purpose of applying transforms to images?
a) To compress images
b) To enhance images
c) To analyze and manipulate image data
d) To store images
Answer: c) To analyze and manipulate image data
2. Which transform is useful for frequency analysis of images?
a) DFT
b) DCT
c) Walsh
d) Haar
Answer: a) DFT
3. What is the Discrete Cosine Transform (DCT) used for?
a) Image compression
b) Image enhancement
c) Image analysis
d) All of the above
Answer: d) All of the above
4. Which transform is optimal for decorrelating data?
a) KLT
b) DFT
c) DCT
d) Walsh
Answer: a) KLT
5. What is the Haar transform used for?
a) Image compression
b) Feature extraction
c) Multiresolution analysis
d) All of the above
Answer: d) All of the above
SRI KAILASH WOMEN’S COLLEGE
c) DCT
d) Walsh
Answer: a) SVD
17.What is the primary application of the KLT?
a) Image compression
b) Feature extraction
c) Decorrelation
d) Image analysis
Answer: c) Decorrelation
18.Which transform has properties such as separability and periodicity?
a) DFT
b) DCT
c) Walsh
d) Haar
.Answer: a) DFT
19.What is the primary advantage of the SVD?
a) Energy compaction
b) Decorrelation
c) Dimensionality reduction
d) Fast computation
Answer: c) Dimensionality reduction
20.Which transform is useful for logical filtering?
a) Walsh
b) Haar
c) DFT
d) DCT
Answer: a) Walsh
21.What is the difference between the Haar transform and the Walsh transform?
a) Haar uses rectangular waveforms, Walsh uses sinusoidal waveforms
b) Haar uses sinusoidal waveforms, Walsh uses rectangular waveforms
c) Haar is used for image compression, Walsh is used for feature extraction
d) Haar is used for feature extraction, Walsh is used for image compression
SRI KAILASH WOMEN’S COLLEGE
Answer: b) Haar uses sinusoidal waveforms is incorrect, Haar and Walsh both use rectangular waveforms
but differ in sequency
22.Which transform is optimal for image compression?
a) KLT
b) DCT
c) SVD
d) DFT
Answer: a) KLT
23.What is the primary application of the DFT?
a) Image compression
24.What is the purpose of the Singular Value Decomposition (SVD) in image processing?
a) Image compression
b) Image denoising
c) Feature extraction
d) All of the above
Answer: d) All of the above
25.Which transform is widely used in image and video compression standards due to its energy compaction
property?
a) DFT
b) DCT
c) Walsh
d) Haar
Answer: b) DCT
26.What is the primary advantage of the Karhunen-Loève Transform (KLT)?
a) Fast computation
b) Simple implementation
c) Decorrelation
d) Energy compaction
Answer: c) Decorrelation
27.Which transform is useful for multiresolution analysis of images?
a) Haar
b) Walsh
SRI KAILASH WOMEN’S COLLEGE
c) DFT
d) DCT
Answer: a) Haar
28.What is the primary application of the Discrete Cosine Transform (DCT)?
a) Image analysis
b) Image compression
c) Feature extraction
d) Image denoising
Answer: b) Image compression
29.Which transform has properties such as orthogonality and completeness?
.a) Walsh
b) Haar
c) DFT
d) DCT
Answer: a) Walsh
30.What is the purpose of the Haar transform in image processing?
a) Image compression
b) Feature extraction
c) Multiresolution analysis
d) All of the above
Answer: d) All of the above
31.Which transform is optimal for decorrelating data?
a) KLT
b) DCT
c) SVD
d) DFT
Answer: a) KLT
32.What is the primary advantage of the Discrete Fourier Transform (DFT)?
a) Energy compaction
b) Decorrelation
c) Frequency analysis
d) Fast computation
SRI KAILASH WOMEN’S COLLEGE
a) Energy compaction
b) Decorrelation
c) Multiresolution analysis
d) Fast computation
Answer: c) Multiresolution analysis
39.Which transform is widely used in image compression standards?
a) DFT
b) DCT
c) Walsh
d) Haar
Answer: b) DCT
40.What is the primary application of the Karhunen-Loève Transform (KLT)?
a) Image compression
b) Feature extraction
c) Decorrelation
d) Image analysis
Answer: c) Decorrelation
41.Which transform has properties such as orthogonality and energy compaction?
a) DCT
b) DFT
c) Walsh
d) Haar
Answer: a) DCT
42.What is the purpose of the Discrete Cosine Transform (DCT) in image processing?
a) Image analysis
b) Image compression
c) Feature extraction
d) Image denoising
Answer: b) Image compression
43.Which transform is optimal for image compression?
a) KLT
b) DCT
SRI KAILASH WOMEN’S COLLEGE
c) SVD
d) DFT
Answer: a) KLT
44. What is the primary purpose of applying the Discrete Fourier Transform (DFT) to an image?
a) Image compression
b) Image enhancement
c) Frequency analysis
d) Image denoising
Answer: c) Frequency analysis
45. Which transform is used in JPEG image compression?
a) DFT
b) DCT
c) Walsh
d) Haar
Answer: b) DCT
46. What is the advantage of using the Singular Value Decomposition (SVD) in image processing?
a) Fast computation
b) Simple implementation
c) Dimensionality reduction
d) Energy compaction
Answer: c) Dimensionality reduction
47. Which transform is useful for image fusion?
a) DFT
b) DCT
c) SVD
d) Wavelet transform
Answer: d) Wavelet transform
48. What is the primary application of the Karhunen-Loève Transform (KLT)?
a) Image compression
b) Feature extraction
c) Decorrelation
d) Image analysis
SRI KAILASH WOMEN’S COLLEGE
Answer: c) Decorrelation
49. Which transform has properties such as orthogonality and completeness?
a) Walsh
b) Haar
c) DFT
d) DCT
Answer: a) Walsh
50. What is the purpose of the Discrete Cosine Transform (DCT) in video compression?
a) Image analysis
b) Motion estimation
c) Energy compaction
d) Image denoising
Answer: c) Energy compaction
UNIT II COMPLETED
SRI KAILASH WOMEN’S COLLEGE
UNIT-3
IMAGE ENHANCEMENT:
Image enhancement is the process of making images more useful (such as making images more
visually appealing, bringing out specific features, removing noise from images and highlighting
interesting details in images).
Spatial domain techniques manipuletes the pixels of an image directly. This process happens in the
image’s coordinate system, also known as the spatial domain.
Frequency domain techniques transforms an image from the spatial domain to the frequency domain. In
this process, Mathematical transformations (such as the Fourier transform) are used. The image can be
modified by manipulating its frequency components.
Note: In this blog post, only techniques that operate in spatial domain will be explained. Also, grey
levels are assumed to be given in the range [0.0, 1.0].
Spatial Domain
Most spatial domain enhancement operations can be reduced to the form g (x, y) = T[ f (x, y)] where f
(x, y) is the input image, g (x, y) is the processed image and T is some operator defined over some
neighbourhood of (x, y).
POINT PROCESSING INTENSITY TRANSFORMATIONS
When the neighborhood is pixel itself, simplest spatial domain operations occur. Point processing
operation take the form s = T(r) where s refers to the processed image pixel value and r refers to the
original image pixel value.
1. Negative Images
s = intensity_max — r
2. Thresholding
SRI KAILASH WOMEN’S COLLEGE
Thresholding
Thresholding transformations are useful for segmentation in which we want to isolate an object of
interest from a background.
Logarithmic Transformations
The log transformation maps a narrow range of low input grey level values into a wider range of output
values.
Log functions are particularly useful when the input grey level values may have an extremely large
range of values
The Fourier transform of an image is put through a log transform to reveal more detail
C is generally set to 1.
Map a narrow range of dark input values into a wider range of output values or vice versa.
C is generally set to 1.
Highlights a specific range of grey levels, other levels can be suppressed or maintained.
By isolating particular bits of the pixel values in an image we can highlight interesting aspects of that
image.
SRI KAILASH WOMEN’S COLLEGE
HISTOGRAM PROCESSING
The histogram of an image shows us the distribution of grey levels in the image. Useful in image processing,
especially in segmentation and enhancement.
SRI KAILASH WOMEN’S COLLEGE
Histogram Equalisation
Spreading out the frequencies in an image (or equalising the image) is a simple way to improve dark or
washed out images.
SRI KAILASH WOMEN’S COLLEGE
Histogram Matching
It is useful sometimes to be able to specify the shape of the histogram that we wish the processed image
to have.
SRI KAILASH WOMEN’S COLLEGE
Histogram matching
SRI KAILASH WOMEN’S COLLEGE
Neighborhood Operations
Neighborhood
Neighborhood operations operate on a larger neighborhood of pixels than point operations as discussed in
Image Processing #3. They are mostly a square around a central pixel and any size rectangle and any shape
filter are possible.
Median: Set the pixel value to the midpoint in the neighborhood (set). Sometimes the median works
better than the average.
Spatial Filtering
Spatial filtering is a technique used to enhance the image based on the spatial characteristics of the
image. It can be used for image sharpening, edge detection, blurring, image sharpening and noise
reduction.
Linear spatial filters apply a linear operation to an image, such as convolution with a kernel or mask.
They are used to enhance or extract features from an image, such as edges or textures such as Sobel
and Prewitt, and linear image enhancement techniques, such as histogram equalization.
Nonlinear spatial filters apply a nonlinear operation to an image. They are used to enhance or extract
features from an image in a more complex way than linear filters. Examples include median filters,
which are used to remove noise from an image by replacing each pixel with the median value of the
pixels in its neighborhood, and morphological filters, which are used to extract specific shapes or
structures from an image.
This process is repeated for every pixel in the original image to generate the filtered image.
SRI KAILASH WOMEN’S COLLEGE
Smoothing spatial filters average all of the pixels in a neighbourhood around a central value.
Details begin to disappear after filtering with an averaging filter of increasing sizes (3, 5, 10 .. etc.).
More effective smoothing filters can be generated by allowing different pixels in the neighbourhood
different weights in the averaging function. Pixels closer to the central pixel are more important. This is
often referred to as a weighted averaging.
SRI KAILASH WOMEN’S COLLEGE
Gaussian Filters
Gaussian filters remove “high frequency components” from the image. So it is called “low-pass filter”.
SRI KAILASH WOMEN’S COLLEGE
Convolution with self is another Gaussian. So can smooth with small width kernel, repeat, and get same
result as larger-width kernel would have.
Gaussian filter
The 2D Gaussian can be expressed as the dot product of two functions, one a function of x and the other
a function of y.
Separability example
Allow pixels wrap around the image. (Can cause some strange image artefacts.)
There are two closely related concepts that must be understood clearly when performing linear spatial
filtering. One of them is correlation and the other one is convolution.
Correlation is the process of moving a filter mask over the image and computing the sum of products at
each position. The filtering so far is referred to as correlation with the filter itself referred to as
the correlation kernel.
correlation formula
It is similar operation to correlation with just one subtle difference. The filter is rotated by 180°. For
symmetric filters it makes no difference.
SRI KAILASH WOMEN’S COLLEGE
convolution formula
Correlation vs Convolution
SHARPENING SPATIAL FILTERS
Sharpening spatial filters seek to highlight fine detail, remove blurring from images and highlight
edges. Sharpening filters are based on spatial differentiation.
Spatial Differentiation
SRI KAILASH WOMEN’S COLLEGE
It is just the difference between subsequent values and measures the rate of change of the function.
1st derivative
2nd Derivative
SRI KAILASH WOMEN’S COLLEGE
Takes into account the values both before and after the current value
2nd order derivatives have a stronger response to fine detail e.g. thin lines.
2nd order derivatives produce a double response at step changes in grey level.
The 2nd derivative is more useful for image enhancement than the 1st derivative. It gives stronger
response to fine detail and has simpler implementation.
SRI KAILASH WOMEN’S COLLEGE
Sharpening Filters
Laplacian Filter
There are lots of slightly different versions of the Laplacian that can be used:
SRI KAILASH WOMEN’S COLLEGE
The Laplacian highlights edges and other discontinuities. However, the result of a Laplacian filtering is
not enhanced image. Laplacian should be subtracted from the original image to generate final sharpened
enhanced image.
g(x,y) = f(x,y)-Laplacian
Sobel Operators
SRI KAILASH WOMEN’S COLLEGE
Successful image enhancement is typically not achieved using a single operation. A range of techniques are
combined in order to achieve a final result.
SHARPENING FILTERS
The filtering process is to move the filter point-by-point in the image function f (x, y) so that the
center of the filter coincides with the point (x, y). At each point (x, y), the filter’s response is
calculated based on the specific content of the filter and through a predefined relationship called
‘template’.
If the pixel in the neighborhood is calculated as a linear operation, it is also called ‘linear spatial
domain filtering’, otherwise, it’s called ‘nonlinear spatial domain filtering’. Figure 2.3.1 shows the
process of spatial filtering with a 3 × 3 template (also known as a filter, kernel, or window).
SRI KAILASH WOMEN’S COLLEGE
The coefficients of the filter in linear spatial filtering give a weighting pattern. For example, for Figure
2.3.1, the response ‘R’ to the template is:
R = w(-1, -1) * f (x-1, y-1) + w(-1, 0) * f (x-1, y) + …+ w( 0, 0) * f (x, y) +…+ w(1, 0) * f (x+1, y) + w (1,
1) * f( x+1, y+1)
In mathematics, this is known as element-wise matrix multiplication. For a filter with a size of (2a+1, 2b+1),
the output response can be calculated with the following function:
SRI KAILASH WOMEN’S COLLEGE
Smoothing Filters
Image smoothing is a digital image processing technique that reduces and suppresses image noises.
In the spatial domain, neighborhood averaging can generally be used to achieve the purpose of
smoothing. Commonly seen smoothing filters include average smoothing, Gaussian smoothing, and
adaptive smoothing.
Average Smoothing
First, let’s take a look at the smoothing filter in its simplest form — average template and its
implementation.
The points in the 3 × 3 neighborhood centered on the point (x, y) are altogether involved in
determining the (x, y) point pixel in the new image ‘g’. All coefficients being 1 means that they
contribute the same (weight) in the process of calculating the g(x, y) value.
The last coefficient, 1/9, is to ensure that the sum of the entire template elements is 1. This keeps the
new image in the same grayscale range as the original image (e.g., [0, 255]). Such a ‘w’ is called an
average template.
How it works?
The intensity values of adjacent pixels are similar, and the noise causes grayscale jumps at noise
points.
SRI KAILASH WOMEN’S COLLEGE
It is reasonable to assume that occasional noises do not change the local continuity of an image.
Take the image below for example, there are two dark points in the bright area.
For the borders, we can add a padding using the “replicate” approach. When smoothing the image with a
3×3 average template, the resulting image is the following.
SRI KAILASH WOMEN’S COLLEGE
The two noises are replaced with the average of their surrounding points. The process of reducing
the influence of noise is called smoothing or blurring.
Gaussian Smoothing
The average smoothing treats the same to all the pixels in the neighborhood. In order to reduce the
blur in the smoothing process and obtain a more natural smoothing effect, it is natural to think to
increase the weight of the template center point and reduce the weight of distant points.
So that the new center point intensity is closer to its nearest neighbors. The Gaussian template is
based on such consideration.
The commonly used 3 × 3 Gaussian template is shown below.
Adaptive Smoothing
The average template blurs the image while eliminating the noise. Gaussian template does a better
job, but the blurring is still inevitable as it’s rooted in the mechanism. A more desirable way is
selective smoothing, that is, smoothing only in the noise area, and not smoothing in the noise-free
area. This way potentially minimizes the influence of the blur. It is called
Sharpening Filters
Image sharpening filters highlight edges by removing blur. It enhances the grayscale transition of an
image, which is the opposite of image smoothing.
SRI KAILASH WOMEN’S COLLEGE
The arithmetic operators of smoothing and sharpening also testifies the fact. While linear smoothing
is based on the weighted summation or integral operation on the neighborhood, the sharpening is
based on the derivative (gradient) or finite difference.
Some applications of where sharpening filters are used include:
Photo enhancement
With the sharpening enhancement, two numbers with the same absolute value represent the same response,
so w1 is equivalent to the following template w2:
Taking a further look at the structure of the Laplacian template, we see that the template is isotropic
for a 90-degree rotation. Laplace operator performs well for edges in the horizontal direction and the
vertical direction, thus avoiding the hassle of having to filter twice.
FREQUENCY DOMAIN METHODS:
SRI KAILASH WOMEN’S COLLEGE
In frequency-domain methods are based on Fourier Transform of an image. Roughly, the term
frequency in an image tells about the rate of change of pixel values.
Below diagram depicts the conversion of image from spatial domain to frequency domain using
Fourier Transformation-
5 Marks Questions
1. What is the difference between point processing and spatial filtering in image enhancement?
2. What is histogram equalization, and how is it used in image enhancement?
3. What are the different types of spatial filters used in image enhancement?
4. What is the purpose of low-pass filtering in image enhancement?
5. What is homomorphic filtering, and how is it used in image enhancement?
10 Marks Questions
MCQ
1. What is the primary goal of image enhancement?
a) Compress images
b) Improve image quality
c) Segment images
d) Detect objects
Answer: b) Improve image quality
2. Which technique adjusts image contrast?
a) Histogram equalization
b) Spatial filtering
c) Frequency domain filtering
d) Point processing
Answer: a) Histogram equalization
3. What does low-pass filtering do?
a) Enhances edges
b) Removes noise
c) Sharpens images
d) Compresses images
Answer: b) Removes noise
4. Which filter smooths an image?
a) Laplacian filter
b) Gaussian filter
c) Sobel filter
d) Prewitt filter
Answer: b) Gaussian filter
5. What is homomorphic filtering used for?
a) Image compression
b) Image denoising
c) Contrast enhancement
d) Illumination correction
Answer: d) Illumination correction
6. Which filter enhances edges?
SRI KAILASH WOMEN’S COLLEGE
a) Low-pass filter
b) High-pass filter
c) Histogram equalization
d) Spatial filtering
Answer: b) High-pass filter
7. What is histogram specification?
a) Adjusts contrast
b) Matches histograms
c) Removes noise
d) Sharpens images
Answer: b) Matches histograms
8. Which filter sharpens an image?
a) Average filter
b) Laplacian filter
c) Gaussian filter
d) Median filter
Answer: b) Laplacian filter
9. What is the advantage of frequency domain filtering?
a) Fast computation
b) Simple implementation
c) Flexibility
d) Accuracy
Answer: a) Fast computation
10. Which technique reduces noise?
a) Smoothing filter
b) Sharpening filter
c) Histogram equalization
d) Contrast stretching
Answer: a) Smoothing filter
11. What is contrast stretching?
a) Adjusts brightness
b) Adjusts contrast
SRI KAILASH WOMEN’S COLLEGE
c) Removes noise
d) Sharpens images
Answer: b) Adjusts contrast
12. Which filter removes salt and pepper noise?
a) Average filter
b) Median filter
c) Gaussian filter
d) Laplacian filter
Answer: b) Median filter
13. What is the purpose of image enhancement in medical imaging?
a) Improve image quality
b) Compress images
c) Segment images
d) Detect tumors
Answer: a) Improve image quality
14. Which technique enhances images with low contrast?
a) Histogram equalization
b) Contrast stretching
c) Spatial filtering
d) Frequency domain filtering
Answer: a) Histogram equalization
15. What is the primary application of homomorphic filtering?
a) Image compression
b) Image denoising
c) Contrast enhancement
d) Illumination correction
Answer: d) Illumination correction
16. Which filter detects edges?
a) Laplacian filter
b) Gaussian filter
c) Sobel filter
d) Prewitt filter
SRI KAILASH WOMEN’S COLLEGE
a) Histogram equalization
b) Homomorphic filtering
c) Contrast stretching
d) Spatial filtering
Answer: b) Homomorphic filtering
23. What is the purpose of frequency domain filtering?
a) Remove noise
b) Enhance edges
c) Adjust contrast
d) Analyze image frequency components
Answer: d) Analyze image frequency components
24. Which filter removes Gaussian noise?
a) Average filter
b) Median filter
c) Gaussian filter
d) Wiener filter
Answer: d) Wiener filter
25. What is the advantage of histogram equalization?
a) Fast computation
b) Simple implementation
c) Contrast enhancement
d) Edge enhancement
Answer: c) Contrast enhancement
26. What is the primary application of image enhancement in digital photography?
a) Improve image quality
b) Compress images
c) Segment images
d) Detect objects
Answer: a) Improve image quality
27. Which technique is used to enhance the details in an image?
a) Histogram equalization
b) Contrast stretching
SRI KAILASH WOMEN’S COLLEGE
c) Spatial filtering
d) Unsharp masking
Answer: d) Unsharp masking
28. What is the purpose of low-pass filtering in image enhancement?
a) Enhance edges
b) Remove noise
c) Sharpen images
d) Compress images
Answer: b) Remove noise
29. Which filter is used to detect edges in an image?
a) Laplacian filter
b) Gaussian filter
c) Sobel filter
d) Prewitt filter
Answer: c) Sobel filter
30. What is the primary advantage of frequency domain filtering?
a) Fast computation
b) Simple implementation
c) Flexibility
d) Accuracy
Answer: a) Fast computation
31. Which technique is used to adjust the contrast of an image?
a) Histogram equalization
b) Contrast stretching
c) Spatial filtering
d) Frequency domain filtering
Answer: a) Histogram equalization
32. What is the purpose of image enhancement in medical imaging?
a) Improve image quality
b) Compress images
c) Segment images
d) Detect tumors
SRI KAILASH WOMEN’S COLLEGE
a) Fast computation
b) Simple implementation
c) Flexibility
d) Accuracy
Answer: b) Simple implementation
39. Which technique is used to adjust the brightness of an image?
a) Histogram equalization
b) Contrast stretching
c) Point processing
d) Spatial filtering
Answer: c) Point processing
40. What is the purpose of histogram specification?
a) Adjust contrast
b) Match histograms
c) Remove noise
d) Sharpen images
Answer: b) Match histograms
41. What is the primary goal of image enhancement?
a) Improve image quality
b) Compress images
c) Segment images
d) Detect objects
Answer: a) Improve image quality
42. Which technique is used to enhance the quality of images captured in low-light conditions?
a) Histogram equalization
b) Contrast stretching
c) Spatial filtering
d) Homomorphic filtering
Answer: d) Homomorphic filtering
43. What is the purpose of low-pass filtering?
a) Enhance edges
b) Remove noise
SRI KAILASH WOMEN’S COLLEGE
c) Sharpen images
d) Compress images
Answer: b) Remove noise
44. Which filter is used to sharpen an image?
a) Average filter
b) Laplacian filter
c) Gaussian filter
d) Median filter
Answer: b) Laplacian filter
45. What is the primary application of image enhancement in surveillance?
a) Object detection
b) Face recognition
c) Image compression
d) All of the above
Answer: d) All of the above
46. Which technique is used to enhance images with varying illumination?
a) Histogram equalization
b) Homomorphic filtering
c) Contrast stretching
d) Spatial filtering
Answer: b) Homomorphic filtering
47. What is the purpose of frequency domain filtering?
a) Remove noise
b) Enhance edges
c) Adjust contrast
d) Analyze image frequency components
Answer: d) Analyze image frequency components
48. Which filter is used to remove Gaussian noise?
a) Average filter
b) Median filter
c) Gaussian filter
d) Wiener filter
SRI KAILASH WOMEN’S COLLEGE
UNIT-4
IMAGE SEGMENTATION
Image Segmentation divides an image into segments where each pixel in the image is mapped to an
object. This task has multiple variants such as instance segmentation, panoptic segmentation and
semantic segmentation.
Inputs
Image segmentation is the process of dividing an image into set of pixels to make the image less
complex. Pixels within a set have one or more attributes (texture, intensity, color) in common.
1. THRESHOLDING METHODS-
1.1 Global Thresholding-
This method is used when the object are easily differentiated from each other, so we can use a single
value as threshold for the entire image.
SRI KAILASH WOMEN’S COLLEGE
Threshold value should not be too high or too low, it must be optimal.
For binary image if pixel value less than threshold value it converts that pixel value to black else it converts
into white.
2.CLUSTERING-BASED SEGMENTATION-
In clustering-based segmentation the pixels in the image are divided into groups, where some property
of the pixels in each group is similar.
Clustering-based segmentation use K-means algorithm.
This algorithm for Image Segmentation helps to enhance high performance and efficiency. The user
has to specify the number of cluster.
Selection is based on number of clusters determined using datasets from images by using frame size
and the absolute value between the means of clusters.
SRI KAILASH WOMEN’S COLLEGE
3.Edge-based Segmentation-
Edges are defined as sudden change of intensity levels in a digital image. This technique is based on
discontinuity in an image.
Edge detection is used to detect the boundaries or to find size or location of an object in an image.
Edge detection techniques can be further classify as-
Sobel operator calculates the gradient approximation of image intensity function to detect edges. The
following kernels are used for convolution with the input images.
This method detects smooth edges easily and is simple and time efficient. It does not accurately detects
thick and rough edges and also doesn’t preserve diagonal direction points. It has high noise sensitivity.
1.2 Prewitt edge detection-
The orientation and magnitude of an image is detected by this method. It detects the vertical and
horizontal edges of an image. It uses following kernels for convolution with the input images.
Prewitt is quite similar to the sobel edge detection technique, but it is a bit easy to implement than
sobel. This operator can sometimes generate noisy results.
1.3 Robert edge detection-
The sum of squares of the differences between diagonal pixels are calculated through discrete
differentiation. After which the gradient approximation is decided. The following 2x2 kernels for
convolution with the input images.
It detects the edges and orientation easily while preserving the diagonal direction points. It has high
noise sensitivity.
SRI KAILASH WOMEN’S COLLEGE
Laplacian function-
SRI KAILASH WOMEN’S COLLEGE
4.Region-based segmentation-
SRI KAILASH WOMEN’S COLLEGE
REGION APPROACH
Region-Based Segmentation
In this segmentation, we grow regions by recursively including the neighboring pixels that are similar and
connected to the seed pixel. We use similarity measures such as differences in gray levels for regions with
homogeneous gray levels. We use connectivity to prevent connecting different parts of the image.
SRI KAILASH WOMEN’S COLLEGE
Nearest clustering
Average Clustering
Farthest Clustering
Clustering by division or Divisive splitting
In this approach, we follow the top-down approach, which means we assign the pixel closest to the cluster.
The algorithm for performing the agglomerative clustering as follows:
Construct a single cluster containing all points.
SRI KAILASH WOMEN’S COLLEGE
Where j is the number of clusters, and i will be the points belong to the jth cluster. The above objective function
is called within-cluster sum of square (WCSS) distance.
A good way to find the optimal value of K is to brute force a smaller range of values (1-10) and plot the graph
of WCSS distance vs K. The point where the graph is sharply bent downward can be considered the optimal
value of K. This method is called Elbow method.
SEGMENTATION BASED ON THRESHOLDING
Image segmentation is the technique of subdividing an image into constituent sub-regions or distinct
objects. The level of detail to which subdivision is carried out depends on the problem being solved.
That is, segmentation should stop when the objects or the regions of interest in an application have
been detected.
Segmentation of non-trivial images is one of the most difficult tasks in image processing.
Segmentation accuracy determines the eventual success or failure of computerized analysis
procedures. Segmentation procedures are usually done using two approaches - detecting discontinuity
in images and linking edges to form the region (known as edge-based segmenting), and detecting
similarity among pixels based on intensity levels (known as threshold-based segmenting).
Mathematically, we can define the problem of segmentation as follows. Let R represent the entire
spatial region occupied by an image. Image segmentation tries to divide the region R into sub-regions
R1 ,R2 , .... Rn , such that:⋃i=1nRi=R ⋃i=1nRi=R
Ri is a connected set for i =1,2,....,n.
Ri⋂Rj=ϕ Ri⋂Rj=ϕ for all i and j.
Q(Ri) = TRUE for i = 1,2,...,n.
Q(Ri U Rj) = FALSE for any adjacent regions Ri and Rj.
⋃i=1nRi=R⋃i=1nRi=R
Ri is a connected set for i =1,2,....,n.
Ri⋂Rj=ϕ Ri⋂Rj=ϕ for all i and j.
Q(Ri) = TRUE for i = 1,2,...,n.
Q(Ri U Rj) = FALSE for any adjacent regions Ri and Rj.
Here, Q(Ri) is a logical predicate defined over the regions in the set Ri, and \phi represents the null set.
Thresholding
Thresholding is one of the segmentation techniques that generates a binary image (a binary image is
one whose pixels have only two values - 0 and 1 and thus requires only one bit to store pixel intensity)
from a given grayscale image by separating it into two regions based on a threshold value.
SRI KAILASH WOMEN’S COLLEGE
Hence pixels having intensity values greater than the said threshold will be treated as white or 1 in the
output image and the others will be black or 0.
Suppose the above is the histogram of an image f(x,y). We can see one peak near level 40 and another
at 180. So there are two major groups of pixels - one group consisting of pixels having a darker shade
and the others having a lighter shade.
So there can be an object of interest set in the background. If we use an appropriate threshold value,
say 90, will divide the entire image into two distinct regions.
In other words, if we have a threshold T, then the segmented image g(x,y) is computed as shown
below:
g(x,y)=1iff(x,y)>Tandg(x,y)=0iff(x,y)≤T. g(x,y)=1iff(x,y)>Tandg(x,y)=0iff(x,y)≤T.
So the output segmented image has only two classes of pixels - one having a value of 1 and others
having a value of 0.
If the threshold T is constant in processing over the entire image region, it is said to be global
thresholding. If T varies over the image region, we say it is variable thresholding.
SRI KAILASH WOMEN’S COLLEGE
Multiple-thresholding classifies the image into three regions - like two distinct objects on a
background. The histogram in such cases shows three peaks and two valleys between them. The
segmented image can be completed using two appropriate thresholds T1 and T2.
g(x,y)=aiff(x,y)>T2andg(x,y)=bifT1<f(x,y)≤T2andg(x,y)=cif f(x,y)≤T1g(x,y)=aiff(x,y)>T2andg(x,y)
=bifT1<f(x,y)≤T2andg(x,y)=cif f(x,y)≤T1
Global Thresholding
When the intensity distribution of objects and background are sufficiently distinct, it is possible to use
a single or global threshold applicable over the entire image. The basic global thresholding algorithm
iteratively finds the best threshold value so segmenting.
The algorithm is explained below.
1. Select an initial estimate of the threshold T.
2. Segment the image using T to form two groups G1 and G2: G1 consists of all pixels with intensity
values > T, and G2 consists of all pixels with intensity values ≤ T.
3. Compute the average intensity values m1 and m2 for groups G1 and G2.σ
4. Compute the new value of the threshold T as T = (m1 + m2)/2
5. Repeat steps 2 through 4 until the difference in the subsequent value of T is smaller than a pre-defined
value δ.
6. Segment the image as g(x,y) = 1 if f(x,y) > T and g(x,y) = 0 if f(x,y) ≤ T.
SRI KAILASH WOMEN’S COLLEGE
This algorithm works well for images that have a clear valley in their histogram. The larger the value
of δ, the smaller will be the number of iterations. The initial estimate of T can be made equal to the
average pixel intensity of the entire image.
The above simple global thresholding can be made optimum by using Otsu's method. Otsu's method
is optimum in the sense that it maximizes the between-class variance.
The basic idea is that well-thresholded classes or groups should be distinct with respect to the intensity
values of their pixels and conversely, a threshold giving the best separation between classes in terms
of their intensity values would be the best or optimum threshold.
Variable Thresholding
There are broadly two different approaches to local thresholding. One approach is to partition the
image into non-overlapping rectangles.
Then the techniques of global thresholding or Otsu's method are applied to each of the sub-images.
Hence in the image partitioning technique, the methods of global thresholding are applied to each sub-
image rectangle by assuming that each such rectangle is a separate image in itself.
This approach is justified when the sub-image histogram properties are suitable (have two peaks with
a wide valley in between) for the application of thresholding techniques but the entire image histogram
is corrupted by noise and hence is not ideal for global thresholding.
The other approach is to compute a variable threshold at each point from the neighborhood pixel
properties. Let us say that we have a neighborhood Sxy of a pixel having coordinates (x,y). If the mean
and standard deviation of pixel intensities in this neighborhood be mxy and σxy , then the threshold at
each point can be computed as:
Txy=aσxy+bmxyTxy=aσxy+bmxy
where a and b are arbitrary constants. The above definition of the variable threshold is just an example. Other
definitions can also be used according to the need.
The segmented image is computed as:
g(x,y)=1iff(x,y)>Txyg(x,y)=0iff(x,y)≤Txy. g(x,y)=1iff(x,y)>Txyg(x,y)=0iff(x,y)≤Txy.
Moving averages can also be used as thresholds. This technique of image thresholding is the most general one
and can be applied to widely different cases.
Example 1:
% Matlab program to perform Otsu's thresholding
image=(imread("coins.jpg"));
figure(1);
SRI KAILASH WOMEN’S COLLEGE
imshow(image);
title("Original image.");
[counts,x] = imhist(image,16);
thresh= otsuthresh(counts);
otsu=imbinarize(image,thresh);
figure(2);
imshow(otsu);
title("Image segmentation with Otsu thresholding.");
Output:
EDGE
BASED SEGMENTATION
Edge-Based Segmentation is a technique in image processing used to identify and delineate the
boundaries within an image.
It focuses on detecting edges, which are areas in an image where there is a sharp contrast or change in
intensity, such as where two different objects meet.
Simply put, it's about finding the parts of the image where there's a sharp contrast, such as where an
object ends and the background begins.
How Edge-Based Segmentation Works
Edge-based segmentation techniques work by identifying areas in an image where there is a rapid
change in intensity or color. These changes often mark the edges of objects or regions within the
image.
SRI KAILASH WOMEN’S COLLEGE
Techniques such as gradient-based methods (like Sobel or Prewitt operators) detect changes in
intensity, while other methods like Canny edge detection apply more sophisticated filtering to get
clearer, more defined edges.
1. Image Gradient Calculation
The first step in any edge detection algorithm is calculating the gradient of the image. The gradient at
a pixel is a vector pointing in the direction of the greatest intensity change. Mathematically, this is
calculated using partial derivatives.
G_x is the gradient in the x (horizontal) direction.
G_y is the gradient in the y (vertical) direction.
These gradients are typically calculated using filters (or kernels) like Sobel or Prewitt.
2. Edge Magnitude Calculation
The next step is to calculate the magnitude of the gradient at each pixel. This tells us how strong the
edge is. The magnitude M can be calculated using the Pythagorean theorem:
3. Edge Direction
Once the magnitude is calculated, the direction of the edge can also be determined using:
4. Thresholding
After calculating the gradient magnitude and direction, the next step is to apply thresholding. This step
helps in identifying only the strong edges by filtering out weak gradient values.
5. Non-Maximum Suppression (Optional)
To further refine the edges, non-maximum suppression is applied. This step ensures that only the local maxima
are retained as edges by looking at neighboring pixels and suppressing non-edge pixels.
Common Algorithms for Edge-Based Segmentation
There are several techniques you can use for edge-based segmentation, each offering different levels of
precision. Here are some of the most common algorithms, with their edge-based segmentation python
implementations:
1. Sobel Operator
The Sobel operator calculates the gradient of image intensity at each pixel, highlighting areas of rapid intensity
change (i.e., edges). It does so by applying convolution filters in both horizontal (x) and vertical (y) directions.
Code Example:
import cv2
import numpy as np
import matplotlib.pyplot as plt
SRI KAILASH WOMEN’S COLLEGE
Parameter Description
Code Example:
import cv2
import matplotlib.pyplot as plt
Parameter Description
apertureSize The size of the Sobel kernel used internally (default: 3).
L2gradient Flag to use a more accurate L2 norm for gradient magnitude calculation.
Output Cleaned-up edge-detected image with better accuracy and less noise.
3. Prewitt Operator
The Prewitt operator is another gradient-based method, similar to Sobel, but it applies a simpler kernel. It is
less sensitive to noise and can be a good choice for images with moderate noise levels.
Code Example:
import cv2
import numpy as np
SRI KAILASH WOMEN’S COLLEGE
Parameter Description
Laplacian of Gaussian (LoG) is a combination of Gaussian smoothing and the Laplacian operator to detect
edges based on second-order derivatives. This method helps detect finer details in the image.
Code Example:
import cv2
import matplotlib.pyplot as plt
Parameter Description
ksize Kernel size for Gaussian smoothing (larger size = more smoothing).
3. Facial Recognition
4. Object Tracking
5. Image Compression
CLASSIFICATION OF EDGES- EDGE DETECTION
Edge detection is a fundamental image processing technique for identifying and locating the
boundaries or edges of objects in an image. It is used to identify and detect the discontinuities in the
image intensity and extract the outlines of objects present in an image.
The edges of any object in an image (e.g. flower) are typically defined as the regions in an image
where there is a sudden change in intensity.
The goal of edge detection is to highlight these regions.
There are various types of edge detection techniques, which include the following:
Sobel Edge Detection
Canny Edge Detection
Laplacian Edge Detection
Prewitt Edge Detection
Roberts Cross Edge Detection
Scharr edge detection
Edge Detection Concepts
Edge Models
Edge models are theoretical constructs used to describe and understand the different types of edges
that can occur in an image. These models help in developing algorithms for edge detection by
categorizing the types of intensity changes that signify edges. The basic edge models
are Step, Ramp and Roof. A step edge represents an abrupt change in intensity, where the image
intensity transitions from one value to another in a single step. A ramp edge describes a gradual
transition in intensity over a certain distance, rather than an abrupt change. A roof edge represents a
peak or ridge in the intensity profile, where the intensity increases to a maximum and then decreases.
SRI KAILASH WOMEN’S COLLEGE
From
left to right, models (ideal representations) of a step, a ramp, and a roof edge, and their corresponding
intensity profiles. (Source: Digital Image Processing by R. C. Gonzalez & R. E. Woods)
Image Intensity Function
The image intensity function represents the brightness or intensity of each pixel in a grayscale image.
In a color image, the intensity function can be extended to include multiple channels (e.g., red, green,
blue in RGB images).
1. Step Edges: Abrupt changes in intensity, where the intensity changes from one constant value to another.
2. Ramp Edges: Gradual changes in intensity, where the intensity changes over a distance.
3. Roof Edges: Changes in intensity that occur over a small distance, often seen in line or curve features.
4. Line Edges: Narrow, linear features that can be detected as edges.
Edge Detection:
Edge detection is a process used to identify and locate edges within an image. Edges are significant
because they often represent boundaries between different objects or regions in an image. Edge
SRI KAILASH WOMEN’S COLLEGE
detection is a fundamental step in many image processing and computer vision applications, such as:
1. Object recognition: Edges can help identify the shape and structure of objects.
2. Image segmentation: Edges can be used to separate objects from the background.
3. Feature extraction: Edges can provide valuable features for further analysis.
Common Edge Detection Techniques:
1. Sobel Operator: Uses two 3x3 convolution kernels to detect horizontal and vertical edges.
2. Prewitt Operator: Similar to the Sobel operator, but uses different kernels.
3. Laplacian of Gaussian (LoG): Uses the Laplacian operator to detect edges, often applied after Gaussian
smoothing.
4. Canny Edge Detector: A multi-step process that includes noise reduction, gradient calculation, non-
maximum suppression, and double thresholding.
5. Zero-Crossing: Detects edges by finding zero-crossings of the second derivative of the image intensity
function.
Applications:
1. Object detection: Edge detection can help identify objects in an image or video.
2. Image segmentation: Edge detection can be used to separate objects from the background.
3. Robotics: Edge detection can be used in robotics for obstacle detection and navigation.
4. Medical imaging: Edge detection can be used to analyze medical images and detect abnormalities.
HOUGH TRANSFORM
The Hough Transform is a feature extraction technique used in image analysis and computer vision to
detect lines, circles, and other shapes within an image. It works by transforming the image into a
parameter space, where shapes can be identified and extracted.
How it Works:
1. Edge Detection: The Hough Transform typically starts with edge detection, where edges are identified in
the image.
2. Parameter Space: The Hough Transform maps the edges in the image to a parameter space, where each
point in the image corresponds to a curve or surface in the parameter space.
3. Voting: Each edge point in the image votes for a set of parameters that could have generated it.
4. Accumulator Array: The votes are accumulated in an array, where the peaks in the array correspond to the
parameters of the shapes in the image.
Types of Hough Transforms:
1. Standard Hough Transform (SHT): Used for detecting lines and curves.
SRI KAILASH WOMEN’S COLLEGE
5-mark questions
10-mark questions
1. Explain the different types of image segmentation techniques, including region-based, edge-based, and
thresholding-based approaches.
2. Describe the active contour model and its application in image segmentation.
3. Discuss the advantages and disadvantages of different image segmentation techniques, including region-
based, edge-based, and thresholding-based approaches.
4. Explain the concept of clustering in image segmentation and describe a clustering algorithm, such as K-
means or hierarchical clustering.
5. Describe the Hough transform and its application in detecting lines and circles in images. Explain its
advantages and limitations.
SRI KAILASH WOMEN’S COLLEGE
MCQ:
1.What is the primary goal of image segmentation?
a) Image compression
b) Image enhancement
c) Object detection
d) Image division
Answer: d) Image division
2. Which technique involves grouping pixels based on similarity?
a) Thresholding
b) Edge detection
c) Region-based segmentation
d) Clustering
Answer: c) Region-based segmentation
3. What is thresholding in image segmentation?
a) Separating objects from background based on intensity
b) Detecting edges in an image
c) Grouping pixels based on similarity
d) Compressing an image
Answer: a) Separating objects from background based on intensity
4. Which edge detection operator is commonly used?
a) Sobel
b) Laplacian
c) Gaussian
d) Prewitt
Answer: a) Sobel
5. What is the Hough transform used for?
a) Edge detection
b) Line detection
c) Circle detection
d) All of the above
Answer: d) All of the above
SRI KAILASH WOMEN’S COLLEGE
b) To segment organs
c) To analyze medical images
d) All of the above
Answer: d) All of the above
12. Which technique is used for segmenting images based on texture?
a) Thresholding
b) Edge detection
c) Region-based segmentation
d) Texture-based segmentation
Answer: d) Texture-based segmentation
13. What is the advantage of edge-based segmentation?
a) Robust to noise
b) Accurate boundary detection
c) Fast computation
d) All of the above
Answer: b) Accurate boundary detection
14. Which algorithm is used for image segmentation based on graph theory?
a) Graph cut
b) K-means
c) Hierarchical clustering
d) DBSCAN
Answer: a) Graph cut
15. What is the purpose of active contour in image segmentation?
a) To detect edges
b) To segment objects
c) To track objects
d) All of the above
Answer: d) All of the above
16. What is the difference between semantic segmentation and instance segmentation?
a) Semantic segmentation labels each pixel with a class label, while instance segmentation labels each object
instance.
SRI KAILASH WOMEN’S COLLEGE
b) Semantic segmentation labels each object instance, while instance segmentation labels each pixel with a
class label.
c) Semantic segmentation is used for images, while instance segmentation is used for videos.
d) None of the above.
Answer: a) Semantic segmentation labels each pixel with a class label, while instance segmentation labels
each object instance.
17. Which technique is used for image segmentation based on deep learning?
a) Convolutional neural networks (CNNs)
b) Recurrent neural networks (RNNs)
c) Long short-term memory (LSTM) networks
d) All of the above
Answer: a) Convolutional neural networks (CNNs)
18. What is the advantage of using deep learning for image segmentation?
a) High accuracy
b) Fast computation
c) Robustness to noise
d) All of the above
Answer: d) All of the above
19. Which dataset is commonly used for evaluating image segmentation algorithms?
a) PASCAL VOC
b) COCO
c) ImageNet
d) All of the above
Answer: d) All of the above
20. What is the purpose of post-processing in image segmentation?
a) To refine the segmentation results
b) To reduce noise
c) To improve accuracy
d) All of the above
Answer: d) All of the above
21. What is the difference between supervised and unsupervised image segmentation?
a) Supervised segmentation uses labeled data, while unsupervised segmentation does not use labeled data.
SRI KAILASH WOMEN’S COLLEGE
b) Supervised segmentation is used for images, while unsupervised segmentation is used for videos.
c) Supervised segmentation is faster, while unsupervised segmentation is more accurate.
d) None of the above.
Answer: a) Supervised segmentation uses labeled data, while unsupervised segmentation does not use labeled
data.
22. Which technique is used for image segmentation based on clustering?
a) K-means
b) Hierarchical clustering
c) DBSCAN
d) All of the above
Answer: d) All of the above
23. What is the advantage of using clustering for image segmentation?
a) Fast computation
b) Robustness to noise
c) Ability to handle high-dimensional data
d) All of the above
Answer: d) All of the above
24. Which metric is commonly used to evaluate the performance of image segmentation algorithms?
a) Accuracy
b) Precision
c) Recall
d) Intersection over Union (IoU)Answer: d) Intersection over Union (IoU)
25. What is the purpose of image segmentation in autonomous vehicles?
a) To detect objects
b) To track objects
c) To segment roads
d) All of the above
Answer: d) All of the above
26. What is the difference between image segmentation and object detection?
a) Image segmentation labels each pixel, while object detection labels each object.
b) Image segmentation is used for images, while object detection is used for videos.
c) Image segmentation is faster, while object detection is more accurate.
SRI KAILASH WOMEN’S COLLEGE
a) Improved accuracy
b) Ability to focus on relevant regions
c) Reduced computational complexity
d) All of the above
Answer: d) All of the above
43. Which dataset is commonly used for evaluating image segmentation models?
a) PASCAL VOC
b) COCO
c) Cityscapes
d) All of the above
Answer: d) All of the above
44. What is the purpose of image segmentation in robotics?
a) Object recognition
b) Scene understanding
c) Navigation
d) All of the above
Answer: d) All of the above
45. Which technique is used for image segmentation based on 3D data?
a) 3D convolutional neural networks (CNNs)
b) Point cloud-based segmentation
c) Voxel-based segmentation
d) All of the above
Answer: d) All of the above
46. What is the advantage of using 3D data for image segmentation?
a) Improved accuracy
b) Ability to handle complex scenes
c) Robustness to occlusion
d) All of the above
Answer: d) All of the above
47. Which application area benefits from image segmentation?
a) Medical imaging
b) Autonomous vehicles
SRI KAILASH WOMEN’S COLLEGE
c) Robotics
d) All of the above
Answer: d) All of the above
48. What is the role of image segmentation in medical diagnosis?
a) Detecting diseases
b) Segmenting organs
c) Analyzing medical images
d) All of the above
Answer: d) All of the above
49. Which technique is used for image segmentation based on multimodal data?
a) Multimodal fusion
b) Multimodal registration
c) Multimodal segmentation
d) All of the above
Answer: d) All of the above
50. What is the advantage of using multimodal data for image segmentation?
a) Improved accuracy
b) Ability to handle complex scenes
c) Robustness to noise
d) All of the above
Answer: d) All of the above
UNIT IV COMPLETED
SRI KAILASH WOMEN’S COLLEGE
UNIT-5
IMAGE COMPRESSION
Image compression addresses the problem of reducing the amount of data required to represent a
digital image.
It is a process intended to yield a compact representation of an image, thereby reducing the image
storage/transmission requirements.
Compression is achieved by the removal of one or more of the three basic data redundancies:
1. Coding Redundancy
2. Interpixel Redundancy
3. Psychovisual Redundancy Coding redundancy is present when less than optimal code words
are used.
Interpixel redundancy results from correlations between the pixels of an image.
Psychovisual redundancy is due to data that is ignored by the human visual system (i.e. visually non
essential information). Image compression techniques reduce the number of bits required to represent
an image by taking advantage of these redundancies.
An inverse process called decompression (decoding) is applied to the compressed data to get the
reconstructed image. The objective of compression is to reduce the number of bits as much as possible,
while keeping the resolution and the visual quality of the reconstructed image as close to the original
image as possible. Image compression systems are composed of two distinct structural blocks: an
encoder and a decoder.
Image compression has two prime categories –
1. Lossless image compression
2. lossy image compression.
Lossless Compression
Lossless compression refers to a process of resizing the images into a smaller version. This technique
does not fiddle with the image quality.
Though it is an excellent method to resize your image files, the outcome may still not be too small.
That is because lossless compression does not eliminate any part of the image.
For example, it will convert an image of 15 MB to 10 MB. However, it will still be too large to display
on a webpage.
SRI KAILASH WOMEN’S COLLEGE
Lossless image compression is particularly useful when compressing text. That is because a small
change in the original version can dramatically change the text or data meaning.
Pros
Image parts remain intact
Zero loss in image quality
It is a reversible process
Cons
The image output is too large
Decoding is challenging
Why should
you compress images?
Lossy Compression
Lossy compression reduces the image size by removing some of the image parts. It eliminates the tags
that are not very essential.
If you opt for this method, you can get a significantly smaller version of an image with a minimal
quality difference. Additionally, you can enjoy a faster loading speed.
Lossy compression works with a quality parameter to measure the change in quality. In most cases,
you have to set this parameter. If it is lower than 90, the images may appear low quality to the human
eye.
SRI KAILASH WOMEN’S COLLEGE
For example, you can convert an image of 15 MB into 2200 Kb as well as 400 Kb.
Image optimization services like Gumlet do not require you to enter the quality parameter. We use a
new technique developed through machine learning - Perceptually Lossless Compression.
The system automatically identifies the required parameter for lossy image compression.
Pros
Get a highly reduced image size
Fast load time
Ideal option for websites
Cons
Loses image components
It is irreversible
NEED FOR COMPRESSION
1. Data Storage: Compression reduces storage requirements, making it possible to store more data in a smaller
space.
2. Data Transmission: Compression reduces the amount of data to be transmitted, resulting in faster
transmission times and lower bandwidth requirements.
3. Multimedia: Compression enables efficient storage and transmission of multimedia content, such as images,
videos, and audio files.
Benefits of Compression:
1. Reduced Storage Requirements: Compression reduces the amount of storage space required, making it
possible to store more data.
2. Faster Transmission Times: Compression reduces the amount of data to be transmitted, resulting in faster
transmission times.
3. Lower Bandwidth Requirements: Compression reduces the bandwidth required for data transmission,
making it possible to transmit data over slower networks.
4. Improved Performance: Compression can improve system performance by reducing the amount of data to
be processed.
Types of Compression:
1. Lossless Compression: Compression that preserves the original data without any loss of quality.
2. Lossy Compression: Compression that discards some of the data to achieve a smaller file size, often used
for multimedia content.
Applications of Compression:
SRI KAILASH WOMEN’S COLLEGE
2. Grayscale Images: Images that consist of various shades of gray, ranging from black to white.
3. Color Images: Images that consist of multiple colors, typically represented using RGB (Red, Green, Blue)
color models.
4. Multispectral Images: Images that capture data across multiple spectral bands, often used in remote sensing
and medical imaging.
5. Hyperspectral Images: Images that capture detailed spectral information, often used in remote sensing,
agriculture, and mineralogy.
Image Classification based on Content:
1. Natural Images: Images of natural scenes, such as landscapes, animals, and people.
2. Medical Images: Images used in medical diagnosis and treatment, such as X-rays, CT scans, and MRI
scans.
3. Document Images: Images of documents, such as scanned papers, receipts, and invoices.
4. Satellite Images: Images captured by satellites, often used in remote sensing and earth observation.
Image Classification based on Application:
1. Medical Imaging: Images used in medical diagnosis and treatment.
2. Surveillance: Images used for security and monitoring purposes.
3. Object Recognition: Images used for object detection and recognition.
4. Scene Understanding: Images used to understand the context and content of a scene.
Image Classification Techniques:
1. Supervised Learning: Training a model using labeled data to classify images.
2. Unsupervised Learning: Training a model using unlabeled data to discover patterns and relationships.
3. Deep Learning: Using deep neural networks to classify images, often achieving state-of-the-art
performance.
Applications:
1. Image Search: Classifying images to enable efficient search and retrieval.
2. Object Detection: Classifying images to detect specific objects or patterns.
3. Medical Diagnosis: Classifying medical images to aid in diagnosis and treatment.
4. Surveillance: Classifying images to detect anomalies or suspicious activity.
SRI KAILASH WOMEN’S COLLEGE
COMPRESSION SCHEMES
Compression schemes are algorithms or techniques used to reduce the size of data, such as images,
videos, or text files. Here are some common compression schemes:
Lossless Compression Schemes:
1. Run-Length Encoding (RLE): Replaces sequences of identical pixels with a single pixel value and a count.
2. Huffman Coding: Assigns shorter codes to more frequently occurring symbols.
3. Lempel-Ziv-Welch (LZW) Compression: Builds a dictionary of substrings and replaces each occurrence
with a reference to the dictionary.
4. Arithmetic Coding: Encodes data using a single number, representing the probability of each symbol.
Lossy Compression Schemes:
1. Discrete Cosine Transform (DCT): Used in JPEG and MPEG compression to convert spatial data into
frequency data.
2. Quantization: Reduces the precision of the data to reduce the amount of data.
3. Transform Coding: Transforms the data into a more compact representation.
4. Wavelet Compression: Uses wavelet transforms to compress data.
Image Compression Schemes:
1. JPEG (Joint Photographic Experts Group): A widely used compression scheme for photographic images.
2. PNG (Portable Network Graphics): A lossless compression scheme for images, often used for graphics and
icons.
3. GIF (Graphics Interchange Format): A lossless compression scheme for images, often used for animations.
Video Compression Schemes:
1. MPEG (Moving Picture Experts Group): A widely used compression scheme for video content.
2. H.264/AVC: A video compression scheme that provides high compression efficiency and is widely used in
various applications.
3. H.265/HEVC: A video compression scheme that provides even higher compression efficiency than
H.264/AVC.
Audio Compression Schemes:
1. MP3 (MPEG Audio Layer 3): A widely used compression scheme for audio content.
2. AAC (Advanced Audio Coding): A compression scheme that provides high audio quality at lower bitrates.
3. AC-3: A compression scheme used for surround sound audio.
Applications:
1. Data Storage: Compression schemes reduce storage requirements.
SRI KAILASH WOMEN’S COLLEGE
2. Data Transmission: Compression schemes reduce transmission times and bandwidth requirements.
3. Multimedia: Compression schemes enable efficient storage and transmission of multimedia content.
HUFFMAN CODING
Huffman Coding Algorithm and Its application in compressing an image. We will first go through the
algorithm, later code it in python, and test it on an image.
Huffman Coding is one of the lossless compression algorithms, its main motive is to minimize the
data’s total code length by assigning codes of variable lengths to each of its data chunks based on its
frequencies in the data.
High-frequency chunks get assigned with shorter code and lower-frequency ones with relatively
longer code, making a compression factor ≥ 1.
Shannon’s source coding theorem expresses that an independent and identically distributed random
variable data code rate (average code length for symbols) cannot be smaller than the Shannons entropy.
It is proven that Huffman Coding Algorithm provides optimality following Shannon’s source coding
theorem, ie after encoding it provides the lowest possible bit rate.
Huffman Coding
Following are the two steps in Huffman Coding
Building Huffman Tree
Assigning codes to Leaf Nodes
Building Huffman Tree
First Compute probabilities for all data chunks, build nodes for each of the data chunks and push all
nodes into a list.
SRI KAILASH WOMEN’S COLLEGE
Now pop the least two probabilistic nodes and create a parent node out of them, with probability as
the sum of both their probabilities, now add this parent node to the list.
Now repeat the process with the current set of nodes until you create a parent with probability = 1.
ARITHMETIC CODING:
Arithmetic coding is a form of entropy encoding that represents a stream of symbols as a single
number, often between 0 and 1. Here's how it works:
1. Probability calculation: Calculate the probability of each symbol in the data.
2. Interval creation: Create an interval for each symbol based on its probability.
3. Encoding: Encode the data by representing it as a single number within the interval.
Dictionary-Based Compression:
Dictionary-based compression techniques work by building a dictionary of substrings or patterns in the data
and replacing each occurrence with a reference to the dictionary.
Types of Dictionary-Based Compression:
1. Lempel-Ziv-Welch (LZW) Compression: Builds a dictionary of substrings and replaces each occurrence
with a reference to the dictionary.
2. LZ77 Compression: Uses a sliding window to find repeated patterns in the data and replaces them with a
reference to the previous occurrence.
3. LZ78 Compression: Builds a dictionary of substrings and replaces each occurrence with a reference to the
dictionary.
Advantages of Dictionary-Based Compression:
1. High compression ratio: Dictionary-based compression can achieve high compression ratios, especially for
data with repeated patterns.
2. Fast compression: Dictionary-based compression can be fast, especially for data with simple patterns.
Applications:
1. Text compression: Dictionary-based compression is often used for text compression, such as compressing
log files or text documents.
2. Image compression: Dictionary-based compression can be used for image compression, especially for
images with repeated patterns.
Comparison:
1. Arithmetic coding: More efficient for data with skewed probability distributions.
2. Dictionary-based compression: More efficient for data with repeated patterns.
Real-World Examples:
1. GIF images: Use LZW compression to compress images.
2. ZIP files: Use dictionary-based compression to compress files.
3. Text compression: Dictionary-based compression is often used to compress text data.
1. Transformation: Apply a mathematical transformation to the data, such as a discrete cosine transform
(DCT) or a wavelet transform.
2. Quantization: Quantize the transformed data to reduce the precision and amount of data.
3. Encoding: Encode the quantized data using entropy coding techniques, such as Huffman coding or
arithmetic coding.
Types of Transform-Based Compression:
1. Discrete Cosine Transform (DCT): Used in JPEG and MPEG compression to convert spatial data into
frequency data.
2. Wavelet Transform: Used in JPEG 2000 and other compression schemes to provide a more efficient
representation of the data.
3. Fourier Transform: Used in some compression schemes to convert data into the frequency domain.
Advantages:
1. High compression ratio: Transform-based compression can achieve high compression ratios, especially for
data with correlated samples.
2. Efficient encoding: Transform-based compression can encode data efficiently, especially for large datasets.
Applications:
1. Image compression: Transform-based compression is widely used in image compression, such as JPEG
and JPEG 2000.
2. Video compression: Transform-based compression is used in video compression, such as MPEG and
H.264/AVC.
3. Audio compression: Transform-based compression is used in audio compression, such as MP3 and AAC.
Examples:
1. JPEG compression: Uses DCT to convert spatial data into frequency data, followed by quantization and
Huffman coding.
2. JPEG 2000 compression: Uses wavelet transform to provide a more efficient representation of the data,
followed by quantization and arithmetic coding.
Benefits:
1. Improved compression ratio: Transform-based compression can improve the compression ratio compared
to other compression techniques.
2. Efficient encoding: Transform-based compression can encode data efficiently, especially for large datasets.
SRI KAILASH WOMEN’S COLLEGE
5-Mark Questions
1. What is the need for image compression? Explain with examples. (5 marks)
2. Describe the different types of redundancy in images. (5 marks)
3. Classify images based on their characteristics. (5 marks)
4. Explain the basic steps involved in Huffman coding. (5 marks)
5. Describe the concept of dictionary-based compression. (5 marks)
10-Mark Questions
1. Explain the different compression schemes used for image compression, including lossless and lossy
compression. (10 marks)
2. Describe the arithmetic coding technique and its advantages over other compression techniques. (10 marks)
3. Explain the transform-based compression technique, including the use of DCT and wavelet transforms. (10
marks)
4. Compare and contrast Huffman coding and arithmetic coding techniques. (10 marks)
5. Discuss the applications of image compression in various fields, including medical imaging, surveillance,
and entertainment. (10 marks)
SRI KAILASH WOMEN’S COLLEGE
MCQ
1. What is the primary goal of image compression?
a) To reduce the size of an image
b) To improve the quality of an image
c) To change the format of an image
d) To increase the size of an image
Answer: a) To reduce the size of an image
2. Which of the following is a type of lossless compression?
a) Huffman coding
b) JPEG compression
c) MPEG compression
d) MP3 compression
Answer: a) Huffman coding
3. What is the purpose of quantization in image compression?
a) To reduce the precision of the data
b) To increase the precision of the data
c) To change the format of the data
d) To encrypt the data
Answer: a) To reduce the precision of the data
4. Which of the following is a type of transform-based compression?
a) Discrete Cosine Transform (DCT)
b) Huffman coding
c) Arithmetic coding
d) Dictionary-based compression
Answer: a) Discrete Cosine Transform (DCT)
5. What is the advantage of using Huffman coding?
a) High compression ratio
b) Fast compression
c) Simple implementation
d) All of the above
Answer: d) All of the above
6. Which of the following is a type of dictionary-based compression?
SRI KAILASH WOMEN’S COLLEGE
d) Neither a nor b
Answer: c) Both a and b
23. What is the purpose of the discrete cosine transform (DCT) in JPEG compression?
a) To convert spatial data into frequency data
b) To compress the data
c) To encrypt the data
d) To decompress the data
Answer: a) To convert spatial data into frequency data
24. Which of the following is a type of lossless image compression standard?
a) PNG
b) JPEG
c) GIF
d) BMP
Answer: a) PNG
25. What is the advantage of using arithmetic coding over Huffman coding?
a) Higher compression ratio
b) Faster compression
c) Simpler implementation
d) None of the above
Answer: a) Higher compression ratio
26. Which of the following is a type of dictionary-based compression algorithm?
a) Lempel-Ziv-Welch (LZW) compression
b) Huffman coding
c) Arithmetic coding
d) Run-length encoding (RLE)
Answer: a) Lempel-Ziv-Welch (LZW) compression
27. What is the purpose of quantization in transform-based compression?
a) To reduce the precision of the transform coefficients
b) To increase the precision of the transform coefficients
c) To change the format of the transform coefficients
d) To encrypt the transform coefficients
Answer: a) To reduce the precision of the transform coefficients
SRI KAILASH WOMEN’S COLLEGE
28. Which of the following is a type of image compression technique used in medical imaging?
a) Lossless compression
b) Lossy compression
c) Both a and b
d) Neither a nor b
Answer: c) Both a and b
29. What is the advantage of using transform-based compression over dictionary-based compression?
a) Higher compression ratio
b) Faster compression
c) Simpler implementation
d) None of the above
Answer: a) Higher compression ratio
30. Which of the following is a type of image compression standard used in digital cameras?
a) JPEG
b) PNG
c) GIF
d) TIFF
Answer: a) JPEG
31. What is the purpose of image compression in digital storage systems?
a) To reduce storage requirements
b) To improve image quality
c) To aid in image retrieval
d) All of the above
Answer: d) All of the above
32. Which of the following is a type of compression technique used in image compression?
a) Lossless compression
b) Lossy compression
c) Both a and b
d) Neither a nor b
Answer: c) Both a and b
33. What is the advantage of using lossless compression over lossy compression?
a) Higher compression ratio
SRI KAILASH WOMEN’S COLLEGE
d) Neither a nor b
Answer: c) Both a and b
39. Which of the following is a type of lossless image compression algorithm?
a) Huffman coding
b) Arithmetic coding
c) Dictionary-based compression
d) All of the above
Answer: d) All of the above
40. What is the purpose of quantization in image compression?
a) To reduce the precision of the data
b) To increase the precision of the data
c) To change the format of the data
d) To encrypt the data
Answer: a) To reduce the precision of the data
41. Which of the following is a type of image compression standard used in medical imaging?
a) JPEG
b) JPEG 2000
c) PNG
d) DICOM
Answer: d) DICOM
42. What is the advantage of using lossless compression in medical imaging?
a) Higher compression ratio
b) Preservation of image quality
c) Faster compression
d) Simpler implementation
Answer: b) Preservation of image quality
43. Which of the following is a type of transform-based compression technique?
a) Discrete Cosine Transform (DCT)
b) Discrete Wavelet Transform (DWT)
c) Both a and b
d) Neither a nor b
Answer: c) Both a and b
44. What is the purpose of image compression in digital photography?
a) To reduce storage requirements
b) To improve image quality
SRI KAILASH WOMEN’S COLLEGE
UNIT V COMPLETED
SRI KAILASH WOMEN’S COLLEGE