0% found this document useful (0 votes)
8 views43 pages

Module 7

Mmm

Uploaded by

abhay.lohia2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views43 pages

Module 7

Mmm

Uploaded by

abhay.lohia2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Module:7

Scene Analysis
[Link]
SCOPE
What is Scene Analysis?
• Scene Analysis in image processing refers to the
process of interpreting and understanding the
content of an image or a sequence of images to
identify objects, their relationships, and the
environment in which they exist.
• It mimics human visual understanding and is a key
component of computer vision systems.

[Link] 2
Goals of Scene Analysis:
[Link] Detection: Identify known or unknown objects
in the scene.
[Link] Recognition: Classify objects (e.g., person,
car, tree).
[Link] Localization: Determine the position of each
object.
[Link] Understanding: Infer relationships and
interactions (e.g., a person riding a bike).
[Link] Segmentation: Assign a class label to each
pixel.
6.3D Scene Reconstruction: Rebuild 3D
representation from 2D images.
[Link] 3
Components of Scene Analysis:
• xc

Component Description
Edge detection, filtering, feature extraction (e.g.,
Low-Level Processing
corners, textures)
Grouping features into regions or objects
Mid-Level Processing
(segmentation, object proposals)
Interpretation using AI/ML to understand relationships
High-Level Processing
and context

[Link] 4
Techniques Used
• sf
Technique Purpose
Feature Extraction (SIFT, ORB, etc.) Detect keypoints and descriptors
Segmentation (e.g., Graph cuts,
Separate regions of interest
Watershed, U-Net)
Object Detection (e.g., YOLO, Faster R-
Locate and classify objects
CNN)
Scene Classification (e.g., CNN-based Determine scene type (indoor, street,
models) etc.)
Infer 3D structure using stereo vision or
Depth Estimation
depth sensors
Track motion of pixels across frames for
Optical Flow
dynamic scenes

[Link] 5
Example Pipeline:
[Link]: Video frame or image
[Link]: Resize, denoise, normalize
[Link] Extraction: SIFT/ORB features
[Link]/Object Detection: YOLO or DeepLab
[Link] Interpretation: Use rules or deep learning to
describe relationships

[Link] 6
Evaluation Metrics:
• Accuracy of object detection and classification
• IoU (Intersection over Union) for segmentation
• Precision / Recall / F1 Score
• Scene Classification Accuracy

[Link] 7
Detection of Known Objects by
Linear Filters
Overview:
Linear filters detect known objects by enhancing specific patterns in an image. These
filters are kernels (matrices) that slide over the image and perform convolution.
🔹 Steps:
[Link] a filter that mimics the known object’s structure (e.g., edge, circle, specific
shape).
[Link] the filter with the image.
[Link] the result to detect matching areas.
[Link]-process to refine detections (e.g., non-max suppression).
🔹 Example:
• Detect vertical bars using a vertical edge filter like the Sobel operator.
• Template matching using a matched filter (cross-correlation with object template).
🔹

[Link] 8
Detection of Known Objects by
Linear Filters
🔹Equations:
• For convolution:

[Link] 9
Detection of Unknown Objects –
Detailed Description
Detection of unknown objects involves identifying
anomalies, novel patterns, or unexpected regions
in an image that differ significantly from the rest of the
scene. Unlike detection of known objects (where
templates or trained models are used), here the focus
is on unsupervised or semi-supervised methods, as
prior knowledge of object appearance is not available.
This is especially important in:
• Surveillance and security
• Medical imaging (e.g., tumor detection)
• Industrial inspection (defect detection)
• Autonomous systems (unforeseen
[Link] obstacle detection) 10
Techniques for Detecting
Unknown Objects
1. Blob Detection
Blob detection refers to identifying regions in an image
that are significantly brighter or darker than their
surroundings and have roughly uniform texture or
intensity.
🔹 Techniques:
• Laplacian of Gaussian (LoG):
• Combines Gaussian smoothing and Laplacian edge detection.
• Highlights regions of rapid intensity change (blobs).
• Particularly useful for detecting circular blobs.

[Link] 11
Techniques for Detecting
Unknown Objects
Difference of Gaussian (DoG):
• An approximation of LoG.
• Subtracts two Gaussian blurred images with different
standard deviations:

• Faster and scale-invariant.


Applications:
• Biological cell detection
• Bright spots or defects in X-rays
• Keypoint detection (e.g., SIFT uses DoG)
[Link] 12
Techniques for Detecting
Unknown Objects
2. Saliency Detection
Saliency detection focuses on identifying the most visually distinctive regions in an image that are likely to attract human
attention. These regions are usually candidates for unknown or unexpected objects.
🔹 How It Works:
• Compute contrast with respect to surroundings (local/global contrast).
• Analyze color, intensity, and orientation differences.
• Output a saliency map indicating regions of interest.
🔹 Algorithms:
• Itti-Koch-Niebur model (uses feature maps)
• Spectral Residual method (based on Fourier transform)
• Deep learning-based saliency models (SalGAN, DeepGaze)
🔹 Applications:
• Foreground object detection
• Weakly supervised object localization
• Preprocessing for object proposals

[Link] 13
Techniques for Detecting
Unknown Objects
3. Region Growing
Region growing is a segmentation method that starts with seed points and grows
regions by adding neighboring pixels that meet certain similarity criteria (e.g.,
intensity, texture).
🔹 Steps:
[Link] seed points (manually or automatically).
[Link] neighboring pixels.
[Link] similar pixels to the region.
[Link] when no more similar pixels are found.
🔹 Criteria:
• Absolute intensity difference
• Statistical similarity (mean, variance)
• Texture pattern similarity
🔹 Applications:
• Tumor segmentation in MRI
• Segmenting unknown objects from background
• Scene understanding
[Link] 14
Techniques for Detecting
Unknown Objects
4. Clustering Techniques
Clustering groups pixels or regions with similar properties. It is unsupervised and useful for detecting unknown object-like regions.
🔹 Common Methods:
• K-means Clustering:
• Partitions the image into K groups based on color or intensity.
• The cluster with significantly different properties can represent unknown regions.
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
• Groups dense areas and marks sparse/noisy points as outliers.
• Good for finding irregularly shaped unknown objects.
🔹 Features Used:
• Color/intensity
• Texture descriptors (e.g., GLCM)
• Local Binary Patterns (LBP)
🔹 Applications:
• Satellite image segmentation
• Detection of foreign objects in quality inspection
• Segmenting complex scenes

[Link] 15
Techniques for Detecting
Unknown Objects
Example Scenario: Surveillance System
Imagine a static CCTV camera monitoring a restricted zone. Normally, it sees
only walls and fixed objects. When a person walks into the frame, they
appear as an anomaly:
• Saliency Detection: Highlights the moving person due to contrast in motion
and appearance.
• LoG/DoG Blob Detection: Identifies the new object as a distinct region of
brightness/texture.
• Region Growing: Starts from high-saliency pixels and expands to define the
entire person.
• K-means Clustering: May group the person as a unique cluster, different
from background clusters.
• This system can alert security personnel that an unknown object (intruder)
has been detected, without ever being trained on human shapes.
[Link] 16
Evaluation metrics
• AD

Metric Description
Precision & Recall Measure correctness and completeness of detection
F1-Score Harmonic mean of precision and recall
IoU (Intersection over Measures overlap between predicted region and ground
Union) truth
False Positive Rate Identifying background as unknown objects
Detection Time Important for real-time applications

[Link] 17
Hough Transform
• The Hough Transform is a pivotal algorithm in
computer vision and image processing, enabling the
detection of geometrical shapes such as lines, circles,
and ellipses within images.
• By transforming image space into parameter space, the
Hough Transform leverages a voting mechanism to
identify shapes through local maxima in an accumulator
array.
• Typically, this method detect lines and edges, utilizing
parameters like rho and theta to represent straight lines
in polar coordinates.
• This algorithm is essential in various applications, from18
[Link]
Hough Transform
• Hough Transform is a computer vision technique that detects shapes like lines
and circles in an image.
• It converts these shapes into mathematical representations in parameter
space, making it easier to identify them even if they’re broken or obscured.
• This method is valuable for image analysis, pattern recognition, and object
detection.
• The Hough Transform algorithm line detection is a feature extraction method
in image analysis, computer vision, and digital image processing.
• It uses a voting mechanism to identify bad examples of objects inside a given
class of forms.
• This voting mechanism is carried out in parameter space. First, the HT
algorithm produces object candidates as local maxima in an accumulator
space. [Link]
19
Hough Transform
• Why is it Needed?
• In many circumstances, a pre-processing stage can use
an edge detector to obtain picture points or pixels on
the required curve in the image space.
However, there may be missing points or pixels on the
required curves due to flaws in either the image data
or the edge detector and spatial variations between the
ideal line/circle/ellipse and the noisy edge points
acquired by the edge detector.
• As a result, grouping the extracted edge characteristics
into an appropriate collection of lines, circles, or
ellipses is frequently difficult.
[Link]
20
Original Image Image after applying edge detection
technique. Red circles show that the line
is breaking there.

[Link] 21
• dv

[Link] 22
• The Hough transform can detect lines of any orientation
and can work well in images with a large amount of
noise.
• To understand how this algorithm works we first need to
understand how lines are defined in a polar system.
• A line is described by ρ the perpendicular distance from
the origin and θ the angle made by the perpendicular
with the axis as shown in figure below.

[Link] 23
[Link] 24
• as

[Link] 25
From the above equation, we can say that all the points
having the same values of ρ and θ constitute a single
line.
The basis of our algorithm is computing the value of ρ for
each point in the image for all possible values of θ.
• We start by creating a parameter space (Hough Space).
• The parameter space is a 2D matrix of ρ and θ, where
theta ranges between 0–180.
• We run this algorithm after detecting the edges of the
image using an edge detection algorithm such as Canny
edges.
• The pixels with a value of 255 are considered edges

[Link] 26
• . We then scan the image pixel by pixel to find these
pixels and using values of theta from 0 to 180 we
compute rho for each pixel.
• For pixels on the same line/edge, the valve of theta and
rho will be the same. W
• e upvote these indices in the Hough Space by 1.
• Finally, the value of ρ and theta with votes above a
certain threshold are considered as lines. Consider the
Hough Space defined by H[ρ, θ].

[Link] 27
• [Link]
m-3e5f6875b9b8
• [Link]
ete-guide-on-hough-transform/

[Link] 28
Corner Detection
A corner in an image is a point where the intensity
changes significantly in two or more directions. It
usually occurs at the intersection of two edges and can
be visually identified as a sharp turn or distinct
point, like the corner of a square or chessboard.
• Mathematically, corners are regions with high
gradient variations in both the x and y directions.
They are considered repeatable, stable, and
distinctive, which makes them ideal for computer
vision tasks such as matching and tracking.

[Link] 29
Why Detect Corners?
Corners are:
• Invariant to translation, rotation, and small
changes in scale
• Good keypoints for tracking and recognition
• Easily localizable

[Link] 30
1. Harris Corner Detector
One of the most widely used classical corner detection
methods.
The Harris detector is based on measuring how much
the image intensity changes when a window is moved
in different directions.
• It uses the second moment matrix (also called the
structure tensor):

[Link] 31
Corner Response Function:
R=det⁡(M)−k⋅(trace(M))2
Where:
• det⁡(M)=Ix2Iy2−(IxIy)2
• trace(M)= Ix2+ Iy2
• k is an empirical constant (typically 0.04 to 0.06)
Interpretation:
• If R is large positive, it's a corner.
• If R is small or negative, it's not a corner.
[Link] 32
2. Shi-Tomasi Corner Detector
An improvement over Harris; used in Kanade-Lucas-
Tomasi (KLT) trackers.
Instead of using the Harris response R, it considers the
minimum eigenvalue of the matrix M.
Corner ⟺ min⁡(λ1,λ2)> threshold Where:
• λ1,λ2​are eigenvalues of matrix M
Advantages:
• More accurate and stable corner detection than Harris
• Well-suited for feature tracking (e.g., in video)
[Link] 33
Image tagging
• image tagging simply entails setting keywords for
the elements that are contained in a visual.
• For example, a wedding photo will likely have the tags
‘wedding’, ‘couple’, ‘marriage’, and the like.
• But depending on the system, it may also have tags like
colors, objects, and other specific items and
characteristics in the image — including abstract terms
like ‘love’, ‘relationship’, and more.

[Link] 34
Image tagging
Image tagging is the process of automatically
assigning descriptive labels or keywords (tags) to
an image based on its visual content. These tags can
describe objects (e.g., "dog", "car"), scenes (e.g.,
"beach", "city"), actions (e.g., "running", "eating"),
emotions, or any relevant semantic concept.
It plays a crucial role in:
• Image search and retrieval
• Content moderation
• Photo organization
• Accessibility tools (e.g., alt-text generation)
[Link] 35
Approaches to Image Tagging
1. Manual Tagging
• Performed by human annotators.
• Time-consuming and not scalable.
• Used to create ground truth datasets for training
models.

[Link] 36
Approaches to Image Tagging
2. Rule-Based Systems (Traditional Computer Vision)
Before deep learning, image tagging was done using handcrafted features and
classical classifiers.
Features:
• Color histograms
• Texture descriptors (e.g., GLCM, LBP)
• Shape features (e.g., edges, contours)
Classifiers:
• Support Vector Machines (SVM)
• k-NN, Decision Trees
• Naive Bayes Limitations:
• Poor performance in complex, real-world images
• Not robust to scale, occlusion, or lighting variation

[Link] 37
Approaches to Image Tagging
2. Rule-Based Systems (Traditional Computer Vision)
Before deep learning, image tagging was done using handcrafted features and
classical classifiers.
Features:
• Color histograms
• Texture descriptors (e.g., GLCM, LBP)
• Shape features (e.g., edges, contours)
Classifiers:
• Support Vector Machines (SVM)
• k-NN, Decision Trees
• Naive Bayes Limitations:
• Poor performance in complex, real-world images
• Not robust to scale, occlusion, or lighting variation

[Link] 38
3. Auto Tagging
• AI-powered image tagging — also known as
auto tagging — is at the forefront of innovating the way
we work with visuals.
• It allows you to add contextual information to your
images, videos and live streams, making the discovery
process easier and more robust.

[Link] 39
4. Deep Learning-Based Tagging (Modern
Approach)
Why Deep Learning?
• It learns hierarchical representations directly from
images, making it more robust and accurate.

[Link] 40
Deep Learning Methods for Image Tagging

1. Convolutional Neural Networks (CNNs)


Usage:
•Input image → CNN → Fully connected layers → Multi-label output (sigmoid)
Pretrained Networks:
•VGGNet
•ResNet
•EfficientNet
•Inception
These models are often fine-tuned on datasets like MS-COCO, ImageNet, or Open Images for
tagging tasks.
2. Multi-Label Classification
• Unlike single-label classification (only one class), image tagging often needs multiple labels per
image.
• Output:
• Each tag has a sigmoid-activated neuron
• Tags are independent (not softmax)
[Link] 41
Deep Learning Methods for Image Tagging

3. Attention Mechanisms
🔹 Purpose:
• Focus the model on important image regions relevant to each tag.
🔹 Techniques:
• Class Activation Mapping (CAM)
• Grad-CAM
• Self-attention (Transformers)
4. Image Captioning + Tag Extraction
• Some systems generate captions and extract tags from them using NLP.
• Combines vision and language models (e.g., CNN + RNN or Vision Transformers + GPT).
5. Vision-Language Pretrained Models (VLPMs)
Examples:
• CLIP (Contrastive Language–Image Pretraining) by OpenAI
• BLIP, ALIGN, Flamingo
• These models jointly understand images and language and are capable of zero-shot tagging.

[Link] 42
• [Link]

[Link] 43

You might also like