0% found this document useful (0 votes)

8 views43 pages

Module 7

Mmm

Uploaded by

abhay.lohia2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views43 pages

Module 7

Mmm

Uploaded by

abhay.lohia2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Module:7

Scene Analysis
[Link]
SCOPE
What is Scene Analysis?
• Scene Analysis in image processing refers to the
process of interpreting and understanding the
content of an image or a sequence of images to
identify objects, their relationships, and the
environment in which they exist.
• It mimics human visual understanding and is a key
component of computer vision systems.

[Link] 2
Goals of Scene Analysis:
[Link] Detection: Identify known or unknown objects
in the scene.
[Link] Recognition: Classify objects (e.g., person,
car, tree).
[Link] Localization: Determine the position of each
object.
[Link] Understanding: Infer relationships and
interactions (e.g., a person riding a bike).
[Link] Segmentation: Assign a class label to each
pixel.
6.3D Scene Reconstruction: Rebuild 3D
representation from 2D images.
[Link] 3
Components of Scene Analysis:
• xc

Component Description
Edge detection, filtering, feature extraction (e.g.,
Low-Level Processing
corners, textures)
Grouping features into regions or objects
Mid-Level Processing
(segmentation, object proposals)
Interpretation using AI/ML to understand relationships
High-Level Processing
and context

[Link] 4
Techniques Used
• sf
Technique Purpose
Feature Extraction (SIFT, ORB, etc.) Detect keypoints and descriptors
Segmentation (e.g., Graph cuts,
Separate regions of interest
Watershed, U-Net)
Object Detection (e.g., YOLO, Faster R-
Locate and classify objects
CNN)
Scene Classification (e.g., CNN-based Determine scene type (indoor, street,
models) etc.)
Infer 3D structure using stereo vision or
Depth Estimation
depth sensors
Track motion of pixels across frames for
Optical Flow
dynamic scenes

[Link] 5
Example Pipeline:
[Link]: Video frame or image
[Link]: Resize, denoise, normalize
[Link] Extraction: SIFT/ORB features
[Link]/Object Detection: YOLO or DeepLab
[Link] Interpretation: Use rules or deep learning to
describe relationships

[Link] 6
Evaluation Metrics:
• Accuracy of object detection and classification
• IoU (Intersection over Union) for segmentation
• Precision / Recall / F1 Score
• Scene Classification Accuracy

[Link] 7
Detection of Known Objects by
Linear Filters
Overview:
Linear filters detect known objects by enhancing specific patterns in an image. These
filters are kernels (matrices) that slide over the image and perform convolution.
🔹 Steps:
[Link] a filter that mimics the known object’s structure (e.g., edge, circle, specific
shape).
[Link] the filter with the image.
[Link] the result to detect matching areas.
[Link]-process to refine detections (e.g., non-max suppression).
🔹 Example:
• Detect vertical bars using a vertical edge filter like the Sobel operator.
• Template matching using a matched filter (cross-correlation with object template).
🔹

[Link] 8
Detection of Known Objects by
Linear Filters
🔹Equations:
• For convolution:

[Link] 9
Detection of Unknown Objects –
Detailed Description
Detection of unknown objects involves identifying
anomalies, novel patterns, or unexpected regions
in an image that differ significantly from the rest of the
scene. Unlike detection of known objects (where
templates or trained models are used), here the focus
is on unsupervised or semi-supervised methods, as
prior knowledge of object appearance is not available.
This is especially important in:
• Surveillance and security
• Medical imaging (e.g., tumor detection)
• Industrial inspection (defect detection)
• Autonomous systems (unforeseen
[Link] obstacle detection) 10
Techniques for Detecting
Unknown Objects
1. Blob Detection
Blob detection refers to identifying regions in an image
that are significantly brighter or darker than their
surroundings and have roughly uniform texture or
intensity.
🔹 Techniques:
• Laplacian of Gaussian (LoG):
• Combines Gaussian smoothing and Laplacian edge detection.
• Highlights regions of rapid intensity change (blobs).
• Particularly useful for detecting circular blobs.

[Link] 11
Techniques for Detecting
Unknown Objects
Difference of Gaussian (DoG):
• An approximation of LoG.
• Subtracts two Gaussian blurred images with different
standard deviations:

• Faster and scale-invariant.

Applications:
• Biological cell detection
• Bright spots or defects in X-rays
• Keypoint detection (e.g., SIFT uses DoG)
[Link] 12
Techniques for Detecting
Unknown Objects
2. Saliency Detection
Saliency detection focuses on identifying the most visually distinctive regions in an image that are likely to attract human
attention. These regions are usually candidates for unknown or unexpected objects.
🔹 How It Works:
• Compute contrast with respect to surroundings (local/global contrast).
• Analyze color, intensity, and orientation differences.
• Output a saliency map indicating regions of interest.
🔹 Algorithms:
• Itti-Koch-Niebur model (uses feature maps)
• Spectral Residual method (based on Fourier transform)
• Deep learning-based saliency models (SalGAN, DeepGaze)
🔹 Applications:
• Foreground object detection
• Weakly supervised object localization
• Preprocessing for object proposals

[Link] 13
Techniques for Detecting
Unknown Objects
3. Region Growing
Region growing is a segmentation method that starts with seed points and grows
regions by adding neighboring pixels that meet certain similarity criteria (e.g.,
intensity, texture).
🔹 Steps:
[Link] seed points (manually or automatically).
[Link] neighboring pixels.
[Link] similar pixels to the region.
[Link] when no more similar pixels are found.
🔹 Criteria:
• Absolute intensity difference
• Statistical similarity (mean, variance)
• Texture pattern similarity
🔹 Applications:
• Tumor segmentation in MRI
• Segmenting unknown objects from background
• Scene understanding
[Link] 14
Techniques for Detecting
Unknown Objects
4. Clustering Techniques
Clustering groups pixels or regions with similar properties. It is unsupervised and useful for detecting unknown object-like regions.
🔹 Common Methods:
• K-means Clustering:
• Partitions the image into K groups based on color or intensity.
• The cluster with significantly different properties can represent unknown regions.
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
• Groups dense areas and marks sparse/noisy points as outliers.
• Good for finding irregularly shaped unknown objects.
🔹 Features Used:
• Color/intensity
• Texture descriptors (e.g., GLCM)
• Local Binary Patterns (LBP)
🔹 Applications:
• Satellite image segmentation
• Detection of foreign objects in quality inspection
• Segmenting complex scenes

[Link] 15
Techniques for Detecting
Unknown Objects
Example Scenario: Surveillance System
Imagine a static CCTV camera monitoring a restricted zone. Normally, it sees
only walls and fixed objects. When a person walks into the frame, they
appear as an anomaly:
• Saliency Detection: Highlights the moving person due to contrast in motion
and appearance.
• LoG/DoG Blob Detection: Identifies the new object as a distinct region of
brightness/texture.
• Region Growing: Starts from high-saliency pixels and expands to define the
entire person.
• K-means Clustering: May group the person as a unique cluster, different
from background clusters.
• This system can alert security personnel that an unknown object (intruder)
has been detected, without ever being trained on human shapes.
[Link] 16
Evaluation metrics
• AD

Metric Description
Precision & Recall Measure correctness and completeness of detection
F1-Score Harmonic mean of precision and recall
IoU (Intersection over Measures overlap between predicted region and ground
Union) truth
False Positive Rate Identifying background as unknown objects
Detection Time Important for real-time applications

[Link] 17
Hough Transform
• The Hough Transform is a pivotal algorithm in
computer vision and image processing, enabling the
detection of geometrical shapes such as lines, circles,
and ellipses within images.
• By transforming image space into parameter space, the
Hough Transform leverages a voting mechanism to
identify shapes through local maxima in an accumulator
array.
• Typically, this method detect lines and edges, utilizing
parameters like rho and theta to represent straight lines
in polar coordinates.
• This algorithm is essential in various applications, from18
[Link]
Hough Transform
• Hough Transform is a computer vision technique that detects shapes like lines
and circles in an image.
• It converts these shapes into mathematical representations in parameter
space, making it easier to identify them even if they’re broken or obscured.
• This method is valuable for image analysis, pattern recognition, and object
detection.
• The Hough Transform algorithm line detection is a feature extraction method
in image analysis, computer vision, and digital image processing.
• It uses a voting mechanism to identify bad examples of objects inside a given
class of forms.
• This voting mechanism is carried out in parameter space. First, the HT
algorithm produces object candidates as local maxima in an accumulator
space. [Link]
19
Hough Transform
• Why is it Needed?
• In many circumstances, a pre-processing stage can use
an edge detector to obtain picture points or pixels on
the required curve in the image space.
However, there may be missing points or pixels on the
required curves due to flaws in either the image data
or the edge detector and spatial variations between the
ideal line/circle/ellipse and the noisy edge points
acquired by the edge detector.
• As a result, grouping the extracted edge characteristics
into an appropriate collection of lines, circles, or
ellipses is frequently difficult.
[Link]
20
Original Image Image after applying edge detection
technique. Red circles show that the line
is breaking there.

[Link] 21
• dv

[Link] 22
• The Hough transform can detect lines of any orientation
and can work well in images with a large amount of
noise.
• To understand how this algorithm works we first need to
understand how lines are defined in a polar system.
• A line is described by ρ the perpendicular distance from
the origin and θ the angle made by the perpendicular
with the axis as shown in figure below.

[Link] 23
[Link] 24
• as

[Link] 25
From the above equation, we can say that all the points
having the same values of ρ and θ constitute a single
line.
The basis of our algorithm is computing the value of ρ for
each point in the image for all possible values of θ.
• We start by creating a parameter space (Hough Space).
• The parameter space is a 2D matrix of ρ and θ, where
theta ranges between 0–180.
• We run this algorithm after detecting the edges of the
image using an edge detection algorithm such as Canny
edges.
• The pixels with a value of 255 are considered edges

[Link] 26
• . We then scan the image pixel by pixel to find these
pixels and using values of theta from 0 to 180 we
compute rho for each pixel.
• For pixels on the same line/edge, the valve of theta and
rho will be the same. W
• e upvote these indices in the Hough Space by 1.
• Finally, the value of ρ and theta with votes above a
certain threshold are considered as lines. Consider the
Hough Space defined by H[ρ, θ].

[Link] 27
• [Link]
m-3e5f6875b9b8
• [Link]
ete-guide-on-hough-transform/

[Link] 28
Corner Detection
A corner in an image is a point where the intensity
changes significantly in two or more directions. It
usually occurs at the intersection of two edges and can
be visually identified as a sharp turn or distinct
point, like the corner of a square or chessboard.
• Mathematically, corners are regions with high
gradient variations in both the x and y directions.
They are considered repeatable, stable, and
distinctive, which makes them ideal for computer
vision tasks such as matching and tracking.

[Link] 29
Why Detect Corners?
Corners are:
• Invariant to translation, rotation, and small
changes in scale
• Good keypoints for tracking and recognition
• Easily localizable

[Link] 30
1. Harris Corner Detector
One of the most widely used classical corner detection
methods.
The Harris detector is based on measuring how much
the image intensity changes when a window is moved
in different directions.
• It uses the second moment matrix (also called the
structure tensor):

[Link] 31
Corner Response Function:
R=det⁡(M)−k⋅(trace(M))2
Where:
• det⁡(M)=Ix2Iy2−(IxIy)2
• trace(M)= Ix2+ Iy2
• k is an empirical constant (typically 0.04 to 0.06)
Interpretation:
• If R is large positive, it's a corner.
• If R is small or negative, it's not a corner.
[Link] 32
2. Shi-Tomasi Corner Detector
An improvement over Harris; used in Kanade-Lucas-
Tomasi (KLT) trackers.
Instead of using the Harris response R, it considers the
minimum eigenvalue of the matrix M.
Corner ⟺ min⁡(λ1,λ2)> threshold Where:
• λ1,λ2are eigenvalues of matrix M
Advantages:
• More accurate and stable corner detection than Harris
• Well-suited for feature tracking (e.g., in video)
[Link] 33
Image tagging
• image tagging simply entails setting keywords for
the elements that are contained in a visual.
• For example, a wedding photo will likely have the tags
‘wedding’, ‘couple’, ‘marriage’, and the like.
• But depending on the system, it may also have tags like
colors, objects, and other specific items and
characteristics in the image — including abstract terms
like ‘love’, ‘relationship’, and more.

[Link] 34
Image tagging
Image tagging is the process of automatically
assigning descriptive labels or keywords (tags) to
an image based on its visual content. These tags can
describe objects (e.g., "dog", "car"), scenes (e.g.,
"beach", "city"), actions (e.g., "running", "eating"),
emotions, or any relevant semantic concept.
It plays a crucial role in:
• Image search and retrieval
• Content moderation
• Photo organization
• Accessibility tools (e.g., alt-text generation)
[Link] 35
Approaches to Image Tagging
1. Manual Tagging
• Performed by human annotators.
• Time-consuming and not scalable.
• Used to create ground truth datasets for training
models.

[Link] 36
Approaches to Image Tagging
2. Rule-Based Systems (Traditional Computer Vision)
Before deep learning, image tagging was done using handcrafted features and
classical classifiers.
Features:
• Color histograms
• Texture descriptors (e.g., GLCM, LBP)
• Shape features (e.g., edges, contours)
Classifiers:
• Support Vector Machines (SVM)
• k-NN, Decision Trees
• Naive Bayes Limitations:
• Poor performance in complex, real-world images
• Not robust to scale, occlusion, or lighting variation

[Link] 37
Approaches to Image Tagging
2. Rule-Based Systems (Traditional Computer Vision)
Before deep learning, image tagging was done using handcrafted features and
classical classifiers.
Features:
• Color histograms
• Texture descriptors (e.g., GLCM, LBP)
• Shape features (e.g., edges, contours)
Classifiers:
• Support Vector Machines (SVM)
• k-NN, Decision Trees
• Naive Bayes Limitations:
• Poor performance in complex, real-world images
• Not robust to scale, occlusion, or lighting variation

[Link] 38
3. Auto Tagging
• AI-powered image tagging — also known as
auto tagging — is at the forefront of innovating the way
we work with visuals.
• It allows you to add contextual information to your
images, videos and live streams, making the discovery
process easier and more robust.

[Link] 39
4. Deep Learning-Based Tagging (Modern
Approach)
Why Deep Learning?
• It learns hierarchical representations directly from
images, making it more robust and accurate.

[Link] 40
Deep Learning Methods for Image Tagging

1. Convolutional Neural Networks (CNNs)

Usage:
•Input image → CNN → Fully connected layers → Multi-label output (sigmoid)
Pretrained Networks:
•VGGNet
•ResNet
•EfficientNet
•Inception
These models are often fine-tuned on datasets like MS-COCO, ImageNet, or Open Images for
tagging tasks.
2. Multi-Label Classification
• Unlike single-label classification (only one class), image tagging often needs multiple labels per
image.
• Output:
• Each tag has a sigmoid-activated neuron
• Tags are independent (not softmax)
[Link] 41
Deep Learning Methods for Image Tagging

3. Attention Mechanisms
🔹 Purpose:
• Focus the model on important image regions relevant to each tag.
🔹 Techniques:
• Class Activation Mapping (CAM)
• Grad-CAM
• Self-attention (Transformers)
4. Image Captioning + Tag Extraction
• Some systems generate captions and extract tags from them using NLP.
• Combines vision and language models (e.g., CNN + RNN or Vision Transformers + GPT).
5. Vision-Language Pretrained Models (VLPMs)
Examples:
• CLIP (Contrastive Language–Image Pretraining) by OpenAI
• BLIP, ALIGN, Flamingo
• These models jointly understand images and language and are capable of zero-shot tagging.

[Link] 42
• [Link]

[Link] 43

AI Unit5
No ratings yet
AI Unit5
33 pages
End Sem
No ratings yet
End Sem
8 pages
Conclusion
No ratings yet
Conclusion
32 pages
Image Analysis Basic-Unit1
No ratings yet
Image Analysis Basic-Unit1
41 pages
Lecture 2 - Image Feature Extraction
No ratings yet
Lecture 2 - Image Feature Extraction
30 pages
Digital Image Processing: Instructor: Namrata Vaswani
No ratings yet
Digital Image Processing: Instructor: Namrata Vaswani
27 pages
1
No ratings yet
1
22 pages
Unit 2 Computer Vision & Image Processsing
100% (2)
Unit 2 Computer Vision & Image Processsing
16 pages
Lecture 1 AI Summary
No ratings yet
Lecture 1 AI Summary
31 pages
Robotics Vision Sensors Guide
No ratings yet
Robotics Vision Sensors Guide
55 pages
CV Unit 3
No ratings yet
CV Unit 3
41 pages
Image Segmentation Explained
No ratings yet
Image Segmentation Explained
16 pages
Summary of Computer Vision Cyril Stanissh
No ratings yet
Summary of Computer Vision Cyril Stanissh
13 pages
COMP3411 Week 7 - Computer Vision
No ratings yet
COMP3411 Week 7 - Computer Vision
58 pages
Understanding Computer Vision Challenges
No ratings yet
Understanding Computer Vision Challenges
13 pages
Block 4 Output
No ratings yet
Block 4 Output
101 pages
Object Identify Recog. CV
No ratings yet
Object Identify Recog. CV
12 pages
Unit II
No ratings yet
Unit II
97 pages
Computer Vision Algorithms and Applications 2nd Edition Richard Szeliski Full Access
No ratings yet
Computer Vision Algorithms and Applications 2nd Edition Richard Szeliski Full Access
163 pages
Mod3 Part1
No ratings yet
Mod3 Part1
32 pages
CV 2 Marks
No ratings yet
CV 2 Marks
5 pages
Computer Vision 1731163352
No ratings yet
Computer Vision 1731163352
153 pages
Tempest 160314194757
No ratings yet
Tempest 160314194757
28 pages
Unit 4 Computer Vision Lecture Notes 1 4 Compress
No ratings yet
Unit 4 Computer Vision Lecture Notes 1 4 Compress
138 pages
Real-Time CNN Visual Recognition
No ratings yet
Real-Time CNN Visual Recognition
13 pages
Deep Learning For Vision Book 2
No ratings yet
Deep Learning For Vision Book 2
292 pages
Computer Vision-Unit 2 Notes
No ratings yet
Computer Vision-Unit 2 Notes
15 pages
Tecnicas de Inteligencia Artificial Utilizadas para Detectar Objetos y Rostros en Una Imagen Una Revision
No ratings yet
Tecnicas de Inteligencia Artificial Utilizadas para Detectar Objetos y Rostros en Una Imagen Una Revision
4 pages
CV 15 Marks
No ratings yet
CV 15 Marks
12 pages
Block 4
No ratings yet
Block 4
98 pages
Fractals
No ratings yet
Fractals
98 pages
Unit 11
No ratings yet
Unit 11
43 pages
Computer Vision Essentials Guide
No ratings yet
Computer Vision Essentials Guide
28 pages
Making Machines See QAndA
No ratings yet
Making Machines See QAndA
3 pages
Lec 13
No ratings yet
Lec 13
21 pages
Computer Vision
No ratings yet
Computer Vision
33 pages
Automatic Lung Nodules Segmentation and Its 3D Visualization
No ratings yet
Automatic Lung Nodules Segmentation and Its 3D Visualization
98 pages
Unit 1
No ratings yet
Unit 1
21 pages
Module 2
No ratings yet
Module 2
140 pages
CV - Unit 2
No ratings yet
CV - Unit 2
94 pages
AI For Computer Vision
No ratings yet
AI For Computer Vision
6 pages
Unit-3 Notes CV
No ratings yet
Unit-3 Notes CV
19 pages
Machine Vision: Chapter Index
No ratings yet
Machine Vision: Chapter Index
1 page
Image Processing in Multimedia Databases
No ratings yet
Image Processing in Multimedia Databases
58 pages
Computer Vision Technology
No ratings yet
Computer Vision Technology
29 pages
CV Imp
No ratings yet
CV Imp
15 pages
Revisionback
No ratings yet
Revisionback
13 pages
Computer Vision for Tech Enthusiasts
No ratings yet
Computer Vision for Tech Enthusiasts
41 pages
Module 5 Part 2
No ratings yet
Module 5 Part 2
40 pages
IVP Notes
No ratings yet
IVP Notes
25 pages
Unit 4
No ratings yet
Unit 4
39 pages
Unit II
No ratings yet
Unit II
9 pages
Computer Vision Notes
No ratings yet
Computer Vision Notes
72 pages
Unit IV
No ratings yet
Unit IV
3 pages
Object Detection for Engineering Students
No ratings yet
Object Detection for Engineering Students
16 pages
Motivation For Ip: Rakesh Soni M.E. (CSE) Assistant Professor, Piet
No ratings yet
Motivation For Ip: Rakesh Soni M.E. (CSE) Assistant Professor, Piet
87 pages
ShaderX2 Real-TimeDepthOfFieldSimulation
No ratings yet
ShaderX2 Real-TimeDepthOfFieldSimulation
30 pages
List of Research Papers
No ratings yet
List of Research Papers
2 pages
Ycbcr: Ycbcr or Y'Cbcr Is A Family of Color Spaces Used in Video Systems. Y' Is
No ratings yet
Ycbcr: Ycbcr or Y'Cbcr Is A Family of Color Spaces Used in Video Systems. Y' Is
4 pages
400 Free Official IELTS Exam Preparation
No ratings yet
400 Free Official IELTS Exam Preparation
6 pages
Online Imaging and Design Techniques
No ratings yet
Online Imaging and Design Techniques
27 pages
Imaging of The Temporomandibular Joint
No ratings yet
Imaging of The Temporomandibular Joint
9 pages
Digital Image Processing Digital Image Processing: Image Enhancement Image Enhancement
No ratings yet
Digital Image Processing Digital Image Processing: Image Enhancement Image Enhancement
22 pages
Atlas Copco XA120 Document Overview
No ratings yet
Atlas Copco XA120 Document Overview
76 pages
TFL Colour Standards
No ratings yet
TFL Colour Standards
10 pages
Al-Ameen Trading List (Archroma
100% (1)
Al-Ameen Trading List (Archroma
2 pages
Logo Presentation Guide
No ratings yet
Logo Presentation Guide
22 pages
CSE 428 - Image Processing - Lecture 1 - Introduction To Image Processing
No ratings yet
CSE 428 - Image Processing - Lecture 1 - Introduction To Image Processing
40 pages
Slant to Ground Range Conversion Methods
No ratings yet
Slant to Ground Range Conversion Methods
5 pages
Image Compression Techniques Explained
No ratings yet
Image Compression Techniques Explained
60 pages
Lesson Plan Computer Graphicsprofmital K Panchal RugSq5 PDF
No ratings yet
Lesson Plan Computer Graphicsprofmital K Panchal RugSq5 PDF
5 pages
Chapter 2 OpenCV
No ratings yet
Chapter 2 OpenCV
6 pages
Screenshot 2025-03-10 at 3.34.22 PM
No ratings yet
Screenshot 2025-03-10 at 3.34.22 PM
19 pages
316319-Principles of Image Processing 120925
No ratings yet
316319-Principles of Image Processing 120925
7 pages
BD08 198-237 JapnSdBds-LR
No ratings yet
BD08 198-237 JapnSdBds-LR
40 pages
550F2E8B387B4520
No ratings yet
550F2E8B387B4520
8 pages
Poser File Types Explained
No ratings yet
Poser File Types Explained
3 pages
Brand Guidelines Caffino
No ratings yet
Brand Guidelines Caffino
43 pages
Interactive vs. Non-Interactive Graphics
No ratings yet
Interactive vs. Non-Interactive Graphics
5 pages
Engineering Graphics: Projection Theory (Part: Two)
No ratings yet
Engineering Graphics: Projection Theory (Part: Two)
32 pages
AI For IA Unit 2
No ratings yet
AI For IA Unit 2
12 pages
JPEG Image Metadata Details
No ratings yet
JPEG Image Metadata Details
233 pages
Assignment For Image Processing: Blurring
No ratings yet
Assignment For Image Processing: Blurring
11 pages
HP India Cartridge Price List 2023
100% (1)
HP India Cartridge Price List 2023
13 pages
11 - Feb - 20 Artikel Baru
No ratings yet
11 - Feb - 20 Artikel Baru
195 pages
00 - Course Info - MSC
No ratings yet
00 - Course Info - MSC
12 pages

Module 7

Uploaded by

Module 7

Uploaded by

Module:7

• Faster and scale-invariant.

1. Convolutional Neural Networks (CNNs)

You might also like