0% found this document useful (0 votes)
9 views37 pages

Object Detection and Segmentation

The document discusses object detection in deep learning, defining the task as identifying and localizing objects within an RGB image using category labels and bounding boxes. It highlights challenges such as variable outputs, the need for higher resolution images, and the evaluation of detection performance using metrics like Intersection over Union (IoU) and Mean Average Precision (mAP). Various detection methods, including single-stage and two-stage approaches, are also outlined, along with techniques for handling overlapping detections.

Uploaded by

gamecule1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views37 pages

Object Detection and Segmentation

The document discusses object detection in deep learning, defining the task as identifying and localizing objects within an RGB image using category labels and bounding boxes. It highlights challenges such as variable outputs, the need for higher resolution images, and the evaluation of detection performance using metrics like Intersection over Union (IoU) and Mean Average Precision (mAP). Various detection methods, including single-stage and two-stage approaches, are also outlined, along with techniques for handling overlapping detections.

Uploaded by

gamecule1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Deep Learning

Object Detection and Segmentation


Huỳnh Văn Thống
FPT Univ.
Object Detection: Task Definition
• Input: Single RGB Image.
• Output: A set of detected objects.
For each object
• Category label (from fixed, known
set of categories).
• Bounding box (four numbers: x, y,
width, height).

2/24/2025 2
Object Detection: Challenges
• Multiple outputs: Need to output
variable numbers of objects per
image.
• Multiple types of output: Need to
predict “what” (category label) as
well as “where” (bounding box).
• Large images: Classification
works at 224x224; need higher
resolution for detection, often
~800x600.

2/24/2025 3
Object Detection: Bounding Boxes

Bounding boxes are


typically axis-aligned

Oriented boxes are


much less common

2/24/2025 4
Object Detection: Bounding Boxes

Modal detection: Bounding


boxes (usually) cover only the
visible portion of the object

Amodal detection: box covers


the entire extent of the object,
even occluded parts

2/24/2025 5
Object Detection: Comparing Boxes
Intersection over Union (IoU) (also
called “Jaccard similarity” or
“Jaccard index”):

𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼


𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈

2/24/2025 6
Object Detection: Comparing Boxes
Intersection over Union (IoU) (also
called “Jaccard similarity” or
“Jaccard index”):

𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼


𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈

2/24/2025 7
Object Detection: Comparing Boxes
Intersection over Union (IoU) (also
called “Jaccard similarity” or
“Jaccard index”):

𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼


𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈

2/24/2025 8
Object Detection: Comparing Boxes
Intersection over Union (IoU) (also
called “Jaccard similarity” or
“Jaccard index”):

𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼


𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈

IoU > 0.5 is “decent”

2/24/2025 9
Object Detection: Comparing Boxes
Intersection over Union (IoU) (also
called “Jaccard similarity” or
“Jaccard index”):

𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼


𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈

IoU > 0.5 is “decent”


IoU > 0.7 is “pretty good”

2/24/2025 10
Object Detection: Comparing Boxes
Intersection over Union (IoU) (also
called “Jaccard similarity” or
“Jaccard index”):

𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼


𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑜𝑜𝑜𝑜 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈

IoU > 0.5 is “decent”


IoU > 0.7 is “pretty good”
IoU > 0.9 is “almost perfect”
2/24/2025 11
Detecting Single Object

2/24/2025 12
Detecting Multiple Object

2/24/2025 13
Detecting Multiple Object – Sliding Window
• Apply a CNN to many different crops of the image, CNN classifies
each crop as object or background.
• How many possible boxes are there in an image of size H x W?

800 x 600 image has ~58M boxes! No way we can evaluate them all

⇒ Object proposal and Object classification

2/24/2025 14
Detecting Multiple Object
• Object detection relies on object proposal and object classification
• Object proposal: find regions of interest (RoIs) in the image.
• Object classification: classify the object in these regions.
Object proposal Feature extraction Classifier

2/24/2025 15
Detecting Multiple Object
• Object detection relies on object proposal and object classification
• Object proposal: find regions of interest (RoIs) in the image.
• Object classification: classify the object in these regions.

• Two main families:


• Single-Stage: A grid in the image where each cell is a proposal (SSD, YOLO,
RetinaNet).
• Two-Stage: Region proposal then classification (Faster-RCNN).

2/24/2025 16
YOLO [Redmon et al., 2016]
• Divide an image into 𝑆𝑆 × 𝑆𝑆 grids.
• For each such cell we are interested in predicting 5 + 𝑘𝑘
quantities.
 Probability (confidence) that this cell is indeed contained
in a true bounding box.
 Width of the bounding box.
 Height of the bounding box.
 Center (𝑥𝑥, 𝑦𝑦) of the bounding box.
 Probability of the object in the bounding box belonging to
the 𝑘𝑘 𝑡𝑡𝑡 class (k - values).
• The output layer thus contains 𝑆𝑆 × 𝑆𝑆 × (5 + 𝑘𝑘) elements.

2/24/2025 17
YOLO [Redmon et al., 2016]
• Divide an image into 𝑆𝑆 × 𝑆𝑆 grids ⇒ output 𝑆𝑆 × 𝑆𝑆 × (5 + 𝑘𝑘)
elements.
• Retain the most confident bounding boxes and the
corresponding object label.

2/24/2025 18
Overlapping Boxes
• Problem: Object detectors
often output many overlapping
detections.
• Solution: Post-process raw
detections using Non-Max
Suppression (NMS):
1. Select next highest-scoring
box.
2. Eliminate lower-scoring boxes
with IoU > threshold (e.g. 0.7).
3. If any boxes remain, GOTO 1.

2/24/2025 19
Overlapping Boxes
• Solution: Post-process raw
detections using Non-Max
Suppression (NMS):
1. Select next highest-scoring
box.
2. Eliminate lower-scoring boxes
with IoU > threshold (e.g. 0.7).
3. If any boxes remain, GOTO 1.

2/24/2025 20
Overlapping Boxes
• Solution: Post-process raw
detections using Non-Max
Suppression (NMS):
1. Select next highest-scoring
box.
2. Eliminate lower-scoring boxes
with IoU > threshold (e.g. 0.7).
3. If any boxes remain, GOTO 1.

2/24/2025 21
Overlapping Boxes
• Solution: Post-process raw
detections using Non-Max
Suppression (NMS):
1. Select next highest-scoring
box.
2. Eliminate lower-scoring boxes
with IoU > threshold (e.g. 0.7).
3. If any boxes remain, GOTO 1.

2/24/2025 22
Overlapping Boxes
• Solution: Post-process raw
detections using Non-Max
Suppression (NMS):
1. Select next highest-scoring
box.
2. Eliminate lower-scoring boxes
with IoU > threshold (e.g. 0.7).
3. If any boxes remain, GOTO 1.

Problem: NMS may eliminate “good”


boxes when objects are highly
overlapping… no good solution.

2/24/2025 23
Evaluating Object Detector:
Mean Average Precision (mAP)
1. Run object detector on all test images (with NMS).
2. For each category, compute Average Precision
(AP) = area under Precision vs Recall Curve.
1. For each detection (highest score to lowest score).
1. If it matches some GT box with IoU > 0.5, mark it as positive
and eliminate the GT.
2. Otherwise mark it as negative.
3. Plot a point on PR Curve.

2/24/2025 24
Evaluating Object Detector:
Mean Average Precision (mAP)
1. Run object detector on all test images (with NMS).
2. For each category, compute Average Precision
(AP) = area under Precision vs Recall Curve.
1. For each detection (highest score to lowest score).
1. If it matches some GT box with IoU > 0.5, mark it as positive
and eliminate the GT.
2. Otherwise mark it as negative.
3. Plot a point on PR Curve.

2/24/2025 25
Evaluating Object Detector:
Mean Average Precision (mAP)
1. Run object detector on all test images (with NMS).
2. For each category, compute Average Precision
(AP) = area under Precision vs Recall Curve.
1. For each detection (highest score to lowest score).
1. If it matches some GT box with IoU > 0.5, mark it as positive
and eliminate the GT.
2. Otherwise mark it as negative.
3. Plot a point on PR Curve.

2/24/2025 26
Evaluating Object Detector:
Mean Average Precision (mAP)
1. Run object detector on all test images (with NMS).
2. For each category, compute Average Precision
(AP) = area under Precision vs Recall Curve.
1. For each detection (highest score to lowest score).
1. If it matches some GT box with IoU > 0.5, mark it as positive
and eliminate the GT.
2. Otherwise mark it as negative.
3. Plot a point on PR Curve.

2/24/2025 27
Evaluating Object Detector:
Mean Average Precision (mAP)
1. Run object detector on all test images (with NMS).
2. For each category, compute Average Precision
(AP) = area under Precision vs Recall Curve.
1. For each detection (highest score to lowest score).
1. If it matches some GT box with IoU > 0.5, mark it as positive
and eliminate the GT.
2. Otherwise mark it as negative.
3. Plot a point on PR Curve.

2/24/2025 28
Evaluating Object Detector:
Mean Average Precision (mAP)
1. Run object detector on all test images (with NMS).
2. For each category, compute Average Precision
(AP) = area under Precision vs Recall Curve.
1. For each detection (highest score to lowest score).
1. If it matches some GT box with IoU > 0.5, mark it as positive
and eliminate the GT.
2. Otherwise mark it as negative.
3. Plot a point on PR Curve.
2. Average Precision (AP) = area under PR curve.

2/24/2025 29
Evaluating Object Detector:
Mean Average Precision (mAP)
1. Run object detector on all test images (with NMS).
2. For each category, compute Average Precision
(AP) = area under Precision vs Recall Curve.
Car AP = 0.65
1. For each detection (highest score to lowest score).
1. If it matches some GT box with IoU > 0.5, mark it as positive
and eliminate the GT.
Cat AP = 0.8
2. Otherwise mark it as negative.
3. Plot a point on PR Curve. Dog AP = 0.86
2. Average Precision (AP) = area under PR curve.
[email protected]=0.77
3. Mean Average Precision (mAP) = average of AP for
each category.

2/24/2025 30
Evaluating Object Detector:
Mean Average Precision (mAP)
1. Run object detector on all test images (with NMS).
2. For each category, compute Average Precision
(AP) = area under Precision vs Recall Curve.
1. For each detection (highest score to lowest score).
[email protected]=0.77
1. If it matches some GT box with IoU > 0.5, mark it as positive and
eliminate the GT. [email protected]=0.71
2. Otherwise mark it as negative.
3. Plot a point on PR Curve. [email protected]=0.65
2. Average Precision (AP) = area under PR curve.
….
3. Mean Average Precision (mAP) = average of AP for
each category. [email protected]=0.2
4. For “COCO mAP”: Compute mAP@thresh for
each IoU threshold (0.5, 0.55, 0.6, …, 0.95) and COCO mAP=0.4
take average.
2/24/2025 31
Dealing with Scale
We need to detect objects of many different scales.
How to improve scale invariance of the detector?

2/24/2025 32
Dealing with Scale: Image Pyramid
Classic idea: build an image pyramid by
resizing the image to different scales,
then process each image scale
independently.

Problem: Expensive! Don’t share any


computation between scales.

2/24/2025 33
Dealing with Scale: Multiscale Feature [Lin et al., 2017]
CNNs have multiple stages that
operate at different resolutions.
Attach an independent detector to
the features at each level.

2/24/2025 34
Dealing with Scale: Multiscale Feature [Lin et al., 2017]
CNNs have multiple stages that
operate at different resolutions.
Attach an independent detector to
the features at each level.

Problem: detector on early features


doesn’t make use of the entire
backbone; doesn’t get access to high-
level features.

2/24/2025 35
Dealing with Scale: Feature Pyramid Network
[Lin et al., 2017]

Add top down connections


that feed information from
high level features back
down to lower level features.

Efficient multiscale features


where all levels benefit from
the whole backbone! Widely
used in practice.

2/24/2025 36
Questions?

2/24/2025 37

You might also like