0% found this document useful (0 votes)
48 views16 pages

Unit 4 - Notes

image processing notes

Uploaded by

ayanj9990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views16 pages

Unit 4 - Notes

image processing notes

Uploaded by

ayanj9990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Region of interest (ROIs)

• A region of interest (ROI) is a particular area or region in an image or that you want to
filter.
• ROI can be represented as a binary mask image. In the mask image, pixels that belong to
the ROI are set to 1 and pixels outside the ROI are set to 0.
• Finding Region of interest (ROIs) in pictures and videos is a fundamental component of
computer vision. It is necessary to analyze and process certain regions of an image or video,
known as ROIs, because they contain significant information.

What is the Region of Interest in Computer Vision?

In "Computer Vision," the term "region of interest" (ROI) designates a particular area or region
in an image or video that includes "crucial information" that has to be examined and processed.
A portion of the overall picture with "special interest" is considered as the "ROI."

The ROI is frequently identified by its position, size, form, or other visual characteristics, but it
can also be a subset of the full picture or video.

ROIs are used in various computer vision applications that include object recognition, tracking,
and segmentation. In tracking, an ROI may be described as the region where the item is
anticipated to travel, but in object recognition, an ROI can be defined as the area that includes an
object of interest.

Types of ROIs (rectangular, circular, polygonal, etc.)

Depending on the particular needs of the application, ROIs can be created in a variety of forms,
including different sizes, shapes, and kinds. The following are examples of the most typical ROIs
used in computer vision:

[Link]:
The width and height of a rectangular ROI determine its shape, and its location is frequently
determined by its centre or top-left corner. Since rectangular ROIs are simple to define and
implement, they are frequently used in computer vision applications.
[Link]:
Circular ROIs have a radius and a centre, which determine the position and size of the circle
within the image. They can be used for tasks such as object detection, feature extraction, or
analysis of spatial patterns. Analyzing the properties of objects with circular shapes or extracting
features from circular patterns.

3. Polygonal ROIs:

Polygonal ROIs are defined by a group of interconnected vertices. When an object's shape is
erratic or non-uniform, like in facial recognition or gesture identification, polygonal ROIs are
frequently utilized.

4. Elliptical ROIs:
An elliptical region of interest (ROI) is a specific area within an image or dataset that is defined
by an elliptical shape. Elliptical ROIs are characterized by parameters such as the coordinates of
their center, major and minor axes lengths, and the angle of rotation. These parameters determine
the position, size, and orientation of the ellipse within the image.

5. Freeform ROIs:

Freeform ROIs can be used to extract ROIs that are particular to the user's requirements since
they are specified by a user-defined shape.

Why ROIs are Used in Computer Vision?

• The usage of ROIs in computer vision is particularly significant in applications such as


object detection, tracking, and segmentation.
• Overall, the use of ROIs in computer vision allows algorithms to focus their analysis on
the most relevant and informative parts of an image or video, reducing processing time
and improving accuracy.

Importance of selecting the right ROI

The ROI specifies which region of the image or video the algorithm will examine, and picking
the incorrect ROI may produce results that are incorrect or irrelevant.
• The size of the area should be one of the main considerations when choosing an ROI.
• If the ROI is too high, the algorithm might take longer to process the data and might be
more likely to include unrelated data in the analysis.
• On the other hand, if the ROI is too small, crucial features might be missed, leading to
unreliable results.

What Is Feature Extraction?

Feature extraction refers to the process of transforming raw data into numerical features that can be
processed while preserving the information in the original data set. It yields better results than
applying machine learning directly to the raw data.

Feature extraction in digital image processing refers to the process of identifying and extracting
meaningful information or features from raw images.

Feature extraction can be accomplished manually or automatically:

• Automatic Feature Extraction involves the use of algorithms and computational techniques to
identify and extract relevant features from images without human intervention. Process: In this
approach, feature detection and extraction algorithms are applied to the image data to automatically
locate and characterize specific patterns, structures, or attributes of interest.

• Manual feature extraction involves human expertise specialistst manually identifying and selecting
features from the image based on their knowledge, expertise. Process: In this approach, users
interactively identify and mark regions, points, or structures within the image that are relevant to the
analysis task.

Color, Shape and Texture: Feature Extraction using OpenCV

What image processing does is extract only useful information from the image, hence reducing the
amount of data but retaining the pixels that describe the image characteristics.
1. Color

Understanding the color space in which your images are set is utmost important to extract the
right features.

Using OpenCV, we can convert the color space of an image to one of several options offered like
HSV, LAB, Grayscale, YCrCb, CMYK etc. A simple breakdown of each color space:

a. HSV (Hue-Saturation-Value)

• Hue: Hue is useful for categorizing and distinguishing between different colors. describes the
dominant wavelength and is the channel to specify the color

• Saturation: describes the purity/shades of the hue/color

• Value: describes the intensity of the color

RGB vs HSV color space

b. LAB

• L: describes the lightness of the color, used interchangeably with intensity

• A: color component ranging from Green to Magenta

• B: color component ranging from Blue to Yellow


RGB vs LAB color space

c. YCrCb

• Y: luminance obtained from RGB color space after gamma correction

• Cr: describes how far the red (R) component is away from luminance

• Cb: describes how far the blue (B) component is away from luminance

RGB vs YCrCb color space

The importance of these color spaces could sometimes be underrated. To obtain relevant
information from images, these color spaces provide the opportunity to identify if the features
look more distinct in each of them.
2. Shape

Extract shapes within an image. For example, you are on a task to differentiate between different
types of wine glass. Color may not play an important role here, but shapes could tell us a lot about
them.
Local Binary Pattern

• There are lots of different types of texture descriptors are used to extract features of an
image.
• Local Binary Patterns (LBP) is a texture descriptor that encodes the local texture patterns
of an image by comparing the intensity of a central pixel with its neighboring pixels.
• It works by thresholding the neighboring pixel intensities around a central pixel and then
representing the result as a binary number.
• LBP generates a histogram of local binary patterns, which characterizes the distribution of
different texture patterns in the image.
• These histograms can be used as features for tasks such as texture classification, face
recognition, object detection, and segmentation.
The rule for finding LBP of an image is as follows:

1. Set a pixel value as center pixel.


2. Collect its neighbourhood pixels (Here I am taking a 3 x 3 matrix so; total number of
neighbourhood pixel is 8)
3. Threshold it’s neighbourhood pixel value to 1 if its value is greater than or equal to centre
pixel value otherwise threshold it to 0.
4. After thresholding, collect all threshold values from neighbourhood either clockwise or
anti-clockwise. The collection will give you an 8-digit binary code. Convert the binary
code into decimal.
5. Replace the center pixel value with resulted decimal and do the same process for all pixel
values present in image.

Let’s take an example to understand it properly. Let’s take a pixel value from the above output
to find its binary pattern from its local neighbourhood. So, I am taking a value ‘149’ (present at
15th row and 19nd column) and its 8 neighbourhood pixels to form a 3 x 3 matrix.
Collect the thresholding values either clockwise or anti-clockwise. Here, I am collecting them
clockwise from top-left. So, after collecting, the binary value will be as follows:

Then, convert the binary code into decimal and place it at center of matrix.
1 x 27 + 1 x 26 + 1 x 25 + 0 x 24 + 0 x 23 + 0 x 22 + 0 x 21 +1 x 20
= 128 + 64 + 32 + 0 + 0 + 0 + 0 + 1
= 225
Now, the resulted matrix will look like,

3. Texture

We may want to extract texture features as we have exhausted the color and shape features.

Gray-level co-occurrence matrix(GLCM) and Local Binary Pattern(LBP) are both texture
features

• Grey Level Co-occurrence Matrix (GLCM), also known as Texture Descriptors, is a


statistical method used to describe the texture of an image in digital image processing. It
captures the spatial relationships between pixels of similar or different intensity values
within an image.
• Also known as the gray-level spatial dependence matrix.
• The GLCM functions characterize the texture of an image
• After you create the GLCMs you can derive several statistics from them. These statistics
provide information about the texture of an image. The following table lists the statistics.

Statistic Description
Contrast Measures the local variations
in the gray-level co-
occurrence matrix.
Correlation Measures the joint probability
occurrence of the specified
pixel pairs.
Energy Provides the sum of squared
elements in the GLCM. Also
known as uniformity or the
angular second moment.
Homogeneity Measures the closeness of the
distribution of elements in the
GLCM to the GLCM
diagonal.

• Gray Level Co-occurrence Matrix (GLCM) is used for texture analysis.

• We consider two pixels at a time, called the reference and the neighbour pixel.

• We define a particular spatial relationship between the reference and neighbour pixel before
calculating the GLCM.

• For eg, we may define the neighbour to be 1 pixel to the right of the current pixel, or it can
be 3 pixels above, or 2 pixels diagonally (one of NE, NW, SE, SW) from the reference.

• Once a spatial relationship is defined, we create a GLCM of size (Range of Intensities x


Range of Intensities) all initialised to 0. For eg, a 8 bit single channel Image will have
a 256x256 GLCM.

• We then traverse through the image and for every pair of intensities we find for the defined
spatial relationship, we increment that cell of the matrix.

Each entry of the GLCM[i,j] holds the count of the number of times that pair of intensities
appears in the image with the defined spatial relationship.
The matrix may be made symmetrical by adding it to its transpose and normalised to that each
cell expresses the probability of that pair of intensities occurring in the image.

Once the GLCM is calculated, we can find texture properties from the matrix to represent the
textures in the image.

GLCM Properties
The properties can be calculated over the entire matrix or by considering a window which is
moved along the matrix.

Mean

Variance

Correlation

Contrast

IDM (Inverse Difference Moment)

ASM (Angular Second Moment)

Entropy

Max Probability

Energy

Dissimilarity

So these are 3 parameters used in calculating GLCM:

Distance (d): The displacement between two pixels.

Angle (θ): The direction in which pixel pairs are considered, typically in 0°, 45°, 90°, and 135°.

Number of Gray Levels (G): The number of discrete intensity levels in the image.
Understanding Gray Level Co-occurrence Matrix (GLCM)

“Gray Level” in GLCM :Pixel intensities as “grays”: Each pixel in a grayscale image holds
an intensity value, typically ranging from 0 (black) to 255 (white) for 8-bit images. These
intensity values are often referred to as “grays” in the context of GLCM.

“Co-occurrence Matrix” in GLCM Capturing Spatial Relationships Neighborly analysis: The


key concept of GLCM lies in analyzing how often gray levels (intensities) occur together
within an image, specifically considering neighboring pixels.

Pixel offsets and directions: By counting co-occurrences in a specific pixel offset (often 1 or 2)

and specific directions (e.g., horizontal, vertical, diagonal), GLCM captures the spatial

arrangement of textures.

Constructing the Co-occurrence Matrix :Imagine a table with rows and columns labeled
with all possible grayscale values (e.g., 0–255). Each cell represents the frequency with
which a specific pair of gray levels co-occur at a given offset and direction. For example, if
cell (50, 60) has a value of 20, it means that the gray level 50 occurs 20 times next to a pixel
with intensity 60 at the specified offset and direction.

GLCM Calculation with GL intensity 4, Offset 1 Pixel and Angle 0 degree (Horizontal)
Variable Length Coding or entropy coding
• Method to minimize the coding redundancy

• VLC, also called entropy coding, is a technique where each symbol is assigned a code that
may have a different number of bits.
• Variable Length Coding (VLC) is a method used to represent data more efficiently by
assigning shorter codes to frequently occurring symbols and longer codes to less frequently
occurring symbols.
• Compression Technique:It's a compression technique that reduces the size of the data by
replacing fixed-length codes with variable-length codes based on the statistical properties of
the data.
• Codebook Generation:

VLC requires a codebook, which is a mapping of symbols (such as pixel values in an image) to
variable-length codes.

The codebook is generated by analyzing the frequency distribution of symbols in the data

• Decoding:

To decode a variable-length encoded data stream, the receiver needs access to the same
codebook used for encoding.

The encoded data is scanned bit by bit, and the longest matching code from the codebook is
identified and decoded into its corresponding symbol.

This process continues until the entire encoded data stream is decoded back into its original
form.

Applications:

Variable Length Coding is widely used in image and video compression standards such as
JPEG, MPEG, and H.264 to achieve efficient data representation and compression.

It is also used in various data transmission and storage systems where efficient use of
bandwidth or storage space is crucial.

A major advantage of VLC is that it does not degrade the signal quality in any way. The
reconstituted signal will exactly match the input signal.

Methods of VLC-
Lossless Predictive Coding
In Lossless Predictive Coding
• Lossless Predictive Coding is a technique used in digital image processing for
compressing images without losing any information
• A new pixel value is obtained by finding the difference between the predicted
pixel value and the current pixel.
• The new information of a pixel is defined as the difference between the actual and
predicted value of that pixel.
• It is based on eliminating the interpixel redundancies of closely spaced pixels by
extracting and coding only the new information in each pixel.

Lossless predictive coding is a type of lossless image comparison algorithm.

Lossless Coding Model:


• The predictor uses a pixel from the input image that is F(n) to estimate a pixel’s
future value based on information from the past input. Past inputs are historical
data
• The predicted value, denoted by f^(n) and [f cap], which is the anticipated value, is
then rounded to the nearest integer.
• The prediction error is coded using f(n) and f^(n)
• Error e(n)=f(n)-f^n
The prediction error represents the difference between the actual pixel value and the
predicted value.
Encoding:
The prediction errors are then encoded using entropy coding techniques such as
Huffman coding or arithmetic coding.
The Decoder:
Decoding:
To reconstruct the original image, the prediction errors are decoded back into their
original values.
The predicted values are then added to the decoded prediction errors to reconstruct the
original pixel values.

Lossy Predictive Coding

• Lossy Predictive Coding is a compression technique used in digital image processing to


reduce the size of image data by sacrificing some information, resulting in a loss of image
quality. It achieves compression by predicting pixel values and encoding the prediction
errors, but unlike lossless predictive coding, it introduces some loss of information in the
process.
• Here's how Lossy Predictive Coding works:
Prediction:

• Similar to lossless predictive coding, each pixel value in the image is predicted based on
neighboring pixels using prediction methods such as spatial prediction,

Prediction Error:

• After prediction, the actual pixel value is compared to the predicted value to compute the
prediction error, similar to lossless predictive coding.
Quantization:

• In lossy predictive coding, quantization is applied to the prediction errors to reduce their
precision and magnitude.
• Quantization involves dividing the prediction errors by a quantization step size and
rounding the result to a limited number of bits.
• This process introduces some loss of information by discarding fine details and reducing
the dynamic range of the prediction errors.

Encoding:

• The quantized prediction errors are then encoded using entropy coding techniques such as
Huffman coding or arithmetic coding, similar to lossless predictive coding.
• However, since quantization introduces loss of information, lossy predictive coding
achieves higher compression ratios compared to lossless predictive coding.

Decoding:

• To reconstruct the original image, the encoded prediction errors are decoded back into
their quantized values.
• The quantized values are then added to the predicted values to reconstruct the
approximate original pixel values.

Lossiness:

• Lossy Predictive Coding sacrifices some image quality to achieve higher compression
ratios compared to lossless predictive coding.
• The degree of loss depends on the quantization step size and other parameters used in
the encoding process.

Applications:

• Lossy Predictive Coding is commonly used in image and video compression standards
such as JPEG (Joint Photographic Experts Group) and MPEG (Moving Picture Experts
Group) to achieve high compression ratios while maintaining acceptable visual quality.

Common questions

Powered by AI

The main difference between Lossless and Lossy Predictive Coding is that Lossless coding compresses data without any loss of information, keeping the original quality intact. In contrast, Lossy Predictive Coding introduces quantization of prediction errors, reducing precision to achieve higher compression ratios at the expense of some loss in image quality .

Analyzing images in different color spaces allows different aspects of colors to be isolated. RGB is straightforward but limited for human perception tasks. HSV separates color information (hue) from intensity, aiding in color distinction and segmentation. LAB isolates luminance from chrominance, helping in color constancy and manipulation. These diverse perspectives improve feature extraction and segmentation .

Color space conversion using OpenCV enhances feature extraction by allowing different aspects of an image to be accentuated and analyzed separately. For instance, converting to HSV helps in distinguishing color properties like hue and saturation, whereas LAB can isolate luminance. This flexibility enables more effective extraction of meaningful features by focusing on image characteristics most relevant to the task .

The primary components of GLCM include contrast, correlation, energy (uniformity), and homogeneity. These components are derived from GLCM and help characterize texture properties by capturing spatial dependencies between pixel intensities. Contrast indicates local variations, correlation measures how pixel pairs are related, energy reflects textural uniformity, and homogeneity assesses closeness to the diagonal in GLCM. Together, these metrics provide comprehensive insights into image texture .

Quantization in Lossy Predictive Coding reduces the precision of prediction errors by dividing them by a step size and rounding. This process decreases data size but also introduces information loss, which impacts image quality by discarding fine details, leading to decreased visual accuracy. The amount of information loss and quality degradation depend on the quantization step size .

Variable Length Coding (VLC) optimizes data compression by assigning shorter codes to frequently occurring symbols and longer codes to less frequent ones, based on statistical occurrence. This approach reduces coding redundancy without degrading signal quality, making it highly effective for applications like JPEG and MPEG, where efficient bandwidth and storage use are crucial .

Huffman coding contributes to efficient data encoding in predictive techniques by utilizing variable-length codes optimized for symbol frequency. It assigns shorter codes to more frequent symbols, thus reducing the data size needed for storage or transmission. This method is particularly useful for encoding prediction errors, as it maximizes compression efficiency while maintaining lossless data integrity in predictive coding .

Understanding "Gray Level" is critical for constructing a GLCM as it involves analyzing pixel intensity co-occurrences. Each gray level corresponds to a pixel intensity, and the matrix reflects how frequently pairs of these intensities occur together within an image. This understanding enables the computation of texture statistics from the GLCM, necessary for detailed texture analysis .

Local Binary Patterns (LBP) contribute to texture analysis by encoding local spatial patterns of image textures into numerical codes. This encoding is performed by comparing each pixel to its neighbors and generating a binary pattern. The resulting LBP histograms reflect the distribution of these binary patterns and can be used effectively for texture classification, face recognition, and segmentation due to their ability to capture micro-patterns within textures .

Choosing the right ROI size is crucial in computer vision because it determines which region of the image the algorithm will examine. If the ROI is too large, the algorithm might take longer to process and include irrelevant data, reducing efficiency and accuracy. Conversely, if the ROI is too small, important features might be missed, leading to incorrect or unreliable results .

You might also like