Object Detection
Part – 2
Dr. Oybek Eraliev,
Department of Computer Engineering
Inha University In Tashkent.
Email: [email protected]
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 1
Object Detection
What is Object Detection?
Ø Object detection, within computer vision, involves
identifying objects within images or videos.
Ø These algorithms commonly rely on machine learning or deep
learning methods to generate valuable outcomes.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 2
Object Detection
What is Object Detection?
Ø So instead of classifying, which type of dog is present in these images, we have
to actually locate a dog in the image.
Ø That is, I have to find out where is the dog present in the image?
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 3
Object Detection
What is Object Detection?
ØNow the next question comes into the human mind, how can we do that?
ØWe can create a box around the dog that is present in the image and specify the
x and y coordinates of this box.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 4
Object Detection
What is Object Detection?
ØFor now, consider that the location of the object in the image can be
represented as coordinates of these boxes.
ØThis box around the object in the image is formally known as a bounding box.
Now, this becomes an image localization problem where we are given a set of
images and we have to identify where is the object present in the image.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 5
Object Detection
What is Object Detection?
ØNote that here we have a single class.
what if we have multiple classes?
ØIn this image, we have to locate
the objects in the image but note that all
the objects are not dogs.
ØHere we have a dog and a car. So we not
only have to locate the objects in the
image but also classify the located object
as a dog or Car.
ØSo this becomes an object detection
problem.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 6
Object Detection
What is Object Detection?
Image Classification Object Detection
• Object Classification • Object Classification
• Object Localization
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 7
Object Detection
Object Localization
What are localization and detection?
Image Classification Classification with Detection
Localization
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 8
Object Detection
Object Localization
Classification with localization Softmax (4)
…
𝑏! , 𝑏" , 𝑏# , 𝑏$
1 – pedestrian
2 – car
3 – motorcycle
4 – background
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 9
Object Detection
Object Localization
Classification with localization
(0,0)
𝑏! = 0.5
𝑏$ 𝑏" = 0.7
𝑏# = 0.3
𝑏$ = 0.4
𝑏#
(𝑏! , 𝑏" )
(1,1)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 10
Object Detection
Defining the target label y
1 – pedestrian Need to output 𝑏! , 𝑏" , 𝑏# , 𝑏$ , class label (1 − 4)
2 – car 1 0
3 – motorcycle 𝑏! ?
4 – background X= 𝑏" ?
𝑃% ?
Log. Reg. Loss
𝑦 = 𝑏# 𝑦=
?
𝑏! 𝑏$
𝑏" 0 ?
𝑏
MSE
1 ?
𝑦= # ?
𝑏$ 0
𝑐& If 𝑃% = 1, 𝐿𝑜𝑠𝑠 = ∑+)*&(𝑦[) −𝑦) )' Here, we used
𝑐' Softmax squared error
If 𝑃% = 0, 𝐿𝑜𝑠𝑠 = (\ 𝑦&−𝑦&) '
𝑐( function
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 11
Object Detection
Landmark Detection
𝑏! , 𝑏" , 𝑏# , 𝑏$ 𝑙&, 𝑙' … 𝑙,+ 𝑙-, 𝑙& … 𝑙'.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 12
Object Detection
Car detection Example
Training set:
X y
1 𝐶𝑜𝑛𝑣𝑁𝑒𝑡 y
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 13
Object Detection
Sliding Windows Detection
𝐶𝑜𝑛𝑣𝑁𝑒𝑡 𝐶𝑜𝑛𝑣𝑁𝑒𝑡 𝐶𝑜𝑛𝑣𝑁𝑒𝑡
The bigest disadvantage is computational cost
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 14
Object Detection
ConvNet implementation of Sliding Windows
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 15
Object Detection
ConvNet implementation of Sliding Windows
Weakness of this method is
that the bounding boxes
cordinates are not too
accurate.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 16
Object Detection
Intersection over Union (IoU)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 17
Object Detection
Intersection over Union (IoU)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 18
Object Detection
Intersection over Union (IoU)
𝑆𝑖𝑧𝑒 𝑜𝑓
𝐼𝑜𝑈 =
𝑆𝑖𝑧𝑒 𝑜𝑓
”Correct” if 𝐼𝑜𝑈 ≥ 0.5
Generally, IoU is a measure of the
overlap between two bounding boxes
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 19
Object Detection
Non – max suppression
𝑃!
0.8 0.7
0.9
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 20
Object Detection
Non – max suppression
𝑃!
0.9
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 21
Object Detection
Non – max suppression
𝑃! While there are any remaing boxes:
• Pick the box with largest 𝑃%
0.9
output that as a prediction
• Discard any remaining box with
𝐼𝑜𝑈 ≤ 0.5
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 22
Object Detection
Anchor Boxes (YOLO)
Ø Definition: An anchor box is a
predefined rectangle with a
specific size (height, width) and
aspect ratio (ratio of width to
height).
Ø Multiple anchor boxes are
defined for each location in the
image grid.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 23
Object Detection
Anchor Boxes (YOLO)
Ø Definition: An anchor box is a
predefined rectangle with a
specific size (height, width) and
aspect ratio (ratio of width to
height).
Ø Multiple anchor boxes are
defined for each location in the
image grid.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 24
Object Detection
Anchor Boxes (YOLO)
Ø Why Used?: Real-world objects
vary greatly in shape, size, and
aspect ratio. Anchor boxes help
object detection models predict
bounding boxes for objects more
effectively by providing a starting
point for predictions.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 25
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
1.Predefined Boxes:
1. Anchor boxes are designed before
training and are not learned during
training.
2. Each grid cell in the feature map has
multiple anchor boxes associated with
it, often with different scales and
aspect ratios.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 26
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
2. Assigning Ground Truth:
During training, the algorithm assigns
each ground truth box to the most
appropriate anchor box based on the
Intersection over Union (IoU) between
them.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 27
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
2. Prediction:
1. The model predicts the offsets
(shifts) and scales (resizing factors)
required to adjust the anchor boxes
to match the ground truth boxes for
the detected objects.
2. Additionally, it predicts a confidence
score and class label for each
anchor box.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 28
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
Ground truth
Anchor box 1
Anchor box 2
Anchor box 3
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 29
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
3. Post-Processing:
1. After prediction, anchor boxes with
low confidence scores are filtered
out.
2. Non-Maximum Suppression (NMS)
is applied to remove duplicate or
overlapping predictions for the same
object.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 30
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
Ground truth 1 – pedestrian
2 – car
Anchor box 1 3 – motorcycle
4 – background
Anchor box 2
Anchor box 3
𝑦 = [𝑃! 𝑡" 𝑡# 𝑡$ 𝑡% 𝑐& 𝑐' 𝑐( 𝑃! 𝑡" 𝑡# 𝑡$ 𝑡% 𝑐& 𝑐' 𝑐( 𝑃! 𝑡" 𝑡# 𝑡$ 𝑡% 𝑐& 𝑐' 𝑐( ]
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 31
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
Ground truth 1 – pedestrian
2 – car
Anchor box 1 3 – motorcycle
4 – background
Anchor box 2
Anchor box 3
𝑦 = [0.67𝑡" 𝑡# 𝑡$ 𝑡% 010 0.73𝑡" 𝑡# 𝑡$ 𝑡% 010 0.49𝑡" 𝑡# 𝑡$ 𝑡% 010]
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 32
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
Ground truth
Anchor box 1 IoU=0.75
Anchor box 2 IoU=0.80
Anchor box 3 IoU=0.45
𝑦 = [0.67𝑡" 𝑡# 𝑡$ 𝑡% 010 0.73𝑡" 𝑡# 𝑡$ 𝑡% 010 0.79𝑡" 𝑡# 𝑡$ 𝑡% 010]
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 33
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
Ground truth
Anchor box 2 IoU=0.80
𝑦 = [0.73𝑡" 𝑡# 𝑡$ 𝑡% 010]
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 34
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
𝑦 = [0.73 𝑡! 𝑡" 𝑡# 𝑡$ 0 1 0]
Calculating Bounding box cordinates:
𝑏! = 𝜎 𝑡! + 𝑐!
𝑏" = 𝜎 𝑡" + 𝑐"
𝑏# = 𝑝# 𝑒 ?!
𝑏$ = 𝑝$ 𝑒 ?"
𝑐! , 𝑐" : Top-left corner of the grid cell.
𝑝$ , 𝑝# : Width and height of the anchor box.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 35
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
𝑦 = [0.73 𝑡! 𝑡" 𝑡# 𝑡$ 0 1 0]
𝑐"
Calculating Bounding box cordinates:
𝑐! 𝑏! = 𝜎 𝑡! + 𝑐!
𝑝$
𝑏" = 𝜎 𝑡" + 𝑐"
𝑡! 𝑏# = 𝑝# 𝑒 ?!
𝑝# 𝑏$ = 𝑝$ 𝑒 ?"
𝑡" 𝑐! , 𝑐" : Top-left corner of the grid cell.
𝑝$ , 𝑝# : Width and height of the anchor box.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 36
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
𝑦z = [𝑃% 𝑡! 𝑡" 𝑡# 𝑡$ 𝑐& 𝑐' 𝑐(]
𝑐"
𝑦 = [𝑃% 𝑏! 𝑏" 𝑏# 𝑏$ 𝑐& 𝑐' 𝑐(]
Converting Bounding box cordinates:
𝑐! 𝑝$ 𝑡! = 𝑏! − 𝑐!
𝑡" = 𝑏" − 𝑐"
𝑡! @
𝑝# 𝑡# = log(A! )
!
𝑡" 𝑡$ = log(A )
@"
"
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 37
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
𝑦 = [0.73 𝑡! 𝑡" 𝑡# 𝑡$ 0 1 0]
Final bounding box for object.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 38
Object Detection
You Only Look Once (YOLO) Algorithm
1 – pedestrian 𝑃! 0 0
𝑏" ? ?
2 – car 𝑏# ? ?
3 – motorcycle 𝑏$ ? ?
𝑏% ? ?
𝑐& ? ?
𝑐' ? ?
𝑐 ? ?
𝒚 = 𝟑×𝟑×𝟐×𝟖 𝑦= (
𝑃!
0 1
? 𝑏"
𝑏"
? 𝑏#
3X3 is grid size 𝑏# ? 𝑏$
2 is # anchors 𝑏$ ? 𝑏%
𝑏% ?
8 is P, box cordinates 𝑐& ?
0
1
and #classes 𝑐' ? 0
𝑐(
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 39
Object Detection
Anchor Boxes
Functions of Anchor Boxes
1.Multi-Scale Object Detection:
• Anchor boxes allow the detection of objects at multiple scales by
associating different sizes and aspect ratios with grid cells in the feature
map.
• This is especially useful for detecting small and large objects in the same
image.
2.Handling Aspect Ratios:
• Objects in an image can have different shapes (e.g., tall, wide, square). By
using anchor boxes with varied aspect ratios, the model can better
accommodate these variations.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 40
Object Detection
Anchor Boxes
Functions of Anchor Boxes
3. Prediction Efficiency:
• Instead of predicting bounding boxes from scratch, the model predicts
adjustments to predefined anchor boxes, simplifying the learning
process.
4. Flexibility in Localization:
• Anchor boxes provide a systematic way to divide the search space,
ensuring that each grid cell can potentially detect multiple objects.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 41
Object Detection
Anchor Boxes
Where Are Anchor Boxes Used?
•Single Shot Detector (SSD):
• SSD uses anchor boxes (called default boxes) at different scales for
detecting objects at multiple resolutions.
•Faster R-CNN:
• Faster R-CNN uses anchor boxes in its Region Proposal Network (RPN) to
generate candidate regions of interest.
•YOLO (You Only Look Once):
• YOLOv2 and later versions use anchor boxes to predict bounding boxes
instead of directly regressing box coordinates.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 42
Object Detection
Anchor Boxes
Example
Imagine detecting a dog and a car in an image:
• A grid cell might have three anchor boxes with different aspect ratios (e.g.,
1:1, 2:1, 1:2).
• If the car overlaps with an anchor box of 2:1 ratio, the model adjusts this
anchor box's position and size to better fit the car.
• Similarly, for the dog, the 1:1 ratio anchor box may be adjusted.
By providing these reference boxes, anchor boxes ensure that the model
efficiently learns to detect and localize objects.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 43
Object Detection
Anchor Box Algorithm
Previously: With two Anchor Boxes:
Each object in training Each object in training
image is assigned to grid image is assigned to grid
cell that contains that cell that contains that
object’s midpoint object’s midpoint and
anchor box for the grid cell
with highest IoU
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 44
Object Detection
Anchor Boxes
𝑃!
𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
𝑐'
𝑐
𝑦= (
𝑃!
Anchor Box 2 𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
Anchor Box 1 𝑐'
𝑐(
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 45
Object Detection
Anchor Boxes (After Non-max suppression)
𝑃!
𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
𝑐'
𝑐
𝑦= (
𝑃!
Anchor Box 2 𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
Anchor Box 1 𝑐'
𝑐(
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 46
Object Detection
Anchor Boxes (After Non-max suppression)
𝑃!
𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
𝑐'
𝑐
𝑦= (
𝑃!
Anchor Box 2 𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
Anchor Box 1 𝑐'
𝑐(
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 47
Object Detection
Anchor Boxes
𝑃!
𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
𝑐'
𝑐
𝑦= (
𝑃!
Anchor Box 2 𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
Anchor Box 1 𝑐'
𝑐(
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 48
Object Detection
Regional CNN (R-CNN)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 49
Object Detection
Regional CNN (R-CNN)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 50
Object Detection
Regional CNN (R-CNN)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 51
Object Detection
Regional CNN (R-CNN)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 52
Object Detection
Regional CNN (R-CNN)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 53
Object Detection
Regional CNN (R-CNN)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 54
Object Detection
Regional CNN (R-CNN)
Calculating Bounding box cordinates:
Prediction of Regression Model for each regional proposal: 𝒕𝒙 , 𝒕𝒚 , 𝒕𝒉 , 𝒕𝒘.
𝑏! = 𝑡! 𝑤AHIAIJKL + 𝑥AHIAIJKL
𝑏! = 𝑡" ℎAHIAIJKL + 𝑦AHIAIJKL
𝑏# = 𝑤AHIAIJKL 𝑒 ?!
𝑏$ = ℎAHIAIJKL 𝑒 ?"
(𝑥AHIAIJKL , 𝑦AHIAIJKL , 𝑤AHIAIJKL , ℎAHIAIJKL ) are the center coordinates, width,
and height of the region proposal.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 55
Object Detection
Regional CNN (R-CNN)
Defining the offsets for training:
@# M!$%&$&'() @* M"$%&$&'()
𝑡! = $$%&$&'()
𝑡" = #$%&$&'()
@ @
𝑡# = log(# ! ) 𝑡$ = log($ " )
$%&$&'() $%&$&'()
(𝑥AHIAIJKL , 𝑦AHIAIJKL , 𝑤AHIAIJKL , ℎAHIAIJKL ) are the center coordinates, width,
and height of the region proposal.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 56
Object Detection
Regional CNN (R-CNN)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 57
Object Detection
Regional CNN (R-CNN)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 58
Object Detection
Faster R-CNN
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 59
Object Detection
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 60
Object Detection
Open Source Frameworks
Ø Lots of good implementations on GitHub! TensorFlow Detection API:
Ø https://github.com/tensorflow/models/tree/master/research/object_d
etection
Ø Faster RCNN, SSD, RFCN, Mask R-CNN, ...
Ø Detectron2 (PyTorch)
Ø https://github.com/facebookresearch/detectron2
Ø Mask R-CNN, RetinaNet, Faster R-CNN, RPN, Fast R-CNN, R-FCN, ...
Finetune on your own dataset with pre-trained models
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 61
Term Project
ØTerm Project
ØMake a team (4~6 students in a team)
ØChoose a project topic (free topic, should be not the same with other teams),
(Week 9)
ØMake a project proposal (Week 10)
ØPrepare a report of the project
ØReport of the project
ØRole of each team member and contribution
ØObjective
ØDefinition
ØBlock diagrams of the application
ØDemo video (Show your project while presentation)
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 62
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 63
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 64