0% found this document useful (0 votes)

11 views64 pages

Object Detection

AI for beginners

Uploaded by

Mukhammadsodik2003 Shavkatbekov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views64 pages

Object Detection

AI for beginners

Uploaded by

Mukhammadsodik2003 Shavkatbekov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Object Detection

Part – 2
Dr. Oybek Eraliev,
Department of Computer Engineering
Inha University In Tashkent.
Email: [email protected]

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 1

Object Detection
What is Object Detection?
Ø Object detection, within computer vision, involves
identifying objects within images or videos.

Ø These algorithms commonly rely on machine learning or deep

learning methods to generate valuable outcomes.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 2

Object Detection
What is Object Detection?
Ø So instead of classifying, which type of dog is present in these images, we have
to actually locate a dog in the image.

Ø That is, I have to find out where is the dog present in the image?

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 3

Object Detection
What is Object Detection?
ØNow the next question comes into the human mind, how can we do that?
ØWe can create a box around the dog that is present in the image and specify the
x and y coordinates of this box.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 4

Object Detection
What is Object Detection?
ØFor now, consider that the location of the object in the image can be
represented as coordinates of these boxes.
ØThis box around the object in the image is formally known as a bounding box.
Now, this becomes an image localization problem where we are given a set of
images and we have to identify where is the object present in the image.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 5

Object Detection
What is Object Detection?
ØNote that here we have a single class.
what if we have multiple classes?
ØIn this image, we have to locate
the objects in the image but note that all
the objects are not dogs.
ØHere we have a dog and a car. So we not
only have to locate the objects in the
image but also classify the located object
as a dog or Car.
ØSo this becomes an object detection
problem.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 6

Object Detection
What is Object Detection?

Image Classification Object Detection

• Object Classification • Object Classification

• Object Localization

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 7

Object Detection
Object Localization

What are localization and detection?

Image Classification Classification with Detection
Localization

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 8

Object Detection
Object Localization
Classification with localization Softmax (4)

…
𝑏! , 𝑏" , 𝑏# , 𝑏$

1 – pedestrian
2 – car
3 – motorcycle
4 – background
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 9
Object Detection
Object Localization
Classification with localization
(0,0)

𝑏! = 0.5
𝑏$ 𝑏" = 0.7
𝑏# = 0.3
𝑏$ = 0.4
𝑏#

(𝑏! , 𝑏" )
(1,1)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 10

Object Detection
Defining the target label y
1 – pedestrian Need to output 𝑏! , 𝑏" , 𝑏# , 𝑏$ , class label (1 − 4)
2 – car 1 0
3 – motorcycle 𝑏! ?
4 – background X= 𝑏" ?
𝑃% ?
Log. Reg. Loss
𝑦 = 𝑏# 𝑦=
?
𝑏! 𝑏$
𝑏" 0 ?
𝑏
MSE
1 ?
𝑦= # ?
𝑏$ 0
𝑐& If 𝑃% = 1, 𝐿𝑜𝑠𝑠 = ∑+)*&(𝑦[) −𝑦) )' Here, we used
𝑐' Softmax squared error
If 𝑃% = 0, 𝐿𝑜𝑠𝑠 = (\ 𝑦&−𝑦&) '
𝑐( function

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 11

Object Detection
Landmark Detection

𝑏! , 𝑏" , 𝑏# , 𝑏$ 𝑙&, 𝑙' … 𝑙,+ 𝑙-, 𝑙& … 𝑙'.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 12

Object Detection
Car detection Example
Training set:
X y

1 𝐶𝑜𝑛𝑣𝑁𝑒𝑡 y

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 13

Object Detection
Sliding Windows Detection

𝐶𝑜𝑛𝑣𝑁𝑒𝑡 𝐶𝑜𝑛𝑣𝑁𝑒𝑡 𝐶𝑜𝑛𝑣𝑁𝑒𝑡

The bigest disadvantage is computational cost

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 14

Object Detection
ConvNet implementation of Sliding Windows

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 15

Object Detection
ConvNet implementation of Sliding Windows

Weakness of this method is

that the bounding boxes
cordinates are not too
accurate.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 16

Object Detection
Intersection over Union (IoU)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 17

Object Detection
Intersection over Union (IoU)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 18

Object Detection
Intersection over Union (IoU)

𝑆𝑖𝑧𝑒 𝑜𝑓
𝐼𝑜𝑈 =
𝑆𝑖𝑧𝑒 𝑜𝑓
”Correct” if 𝐼𝑜𝑈 ≥ 0.5

Generally, IoU is a measure of the

overlap between two bounding boxes

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 19

Object Detection
Non – max suppression

𝑃!
0.8 0.7

0.9

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 20

Object Detection
Non – max suppression

𝑃!
0.9

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 21

Object Detection
Non – max suppression

𝑃! While there are any remaing boxes:

• Pick the box with largest 𝑃%

0.9
output that as a prediction

• Discard any remaining box with

𝐼𝑜𝑈 ≤ 0.5

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 22

Object Detection
Anchor Boxes (YOLO)
Ø Definition: An anchor box is a
predefined rectangle with a
specific size (height, width) and
aspect ratio (ratio of width to
height).

Ø Multiple anchor boxes are

defined for each location in the
image grid.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 23

Object Detection
Anchor Boxes (YOLO)
Ø Definition: An anchor box is a
predefined rectangle with a
specific size (height, width) and
aspect ratio (ratio of width to
height).

Ø Multiple anchor boxes are

defined for each location in the
image grid.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 24

Object Detection
Anchor Boxes (YOLO)
Ø Why Used?: Real-world objects
vary greatly in shape, size, and
aspect ratio. Anchor boxes help
object detection models predict
bounding boxes for objects more
effectively by providing a starting
point for predictions.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 25

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection

1.Predefined Boxes:
1. Anchor boxes are designed before
training and are not learned during
training.
2. Each grid cell in the feature map has
multiple anchor boxes associated with
it, often with different scales and
aspect ratios.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 26

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection

2. Assigning Ground Truth:

During training, the algorithm assigns
each ground truth box to the most
appropriate anchor box based on the
Intersection over Union (IoU) between
them.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 27

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection

2. Prediction:
1. The model predicts the offsets
(shifts) and scales (resizing factors)
required to adjust the anchor boxes
to match the ground truth boxes for
the detected objects.
2. Additionally, it predicts a confidence
score and class label for each
anchor box.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 28
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection

Ground truth

Anchor box 1

Anchor box 2

Anchor box 3

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 29

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection

3. Post-Processing:
1. After prediction, anchor boxes with
low confidence scores are filtered
out.
2. Non-Maximum Suppression (NMS)
is applied to remove duplicate or
overlapping predictions for the same
object.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 30

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
Ground truth 1 – pedestrian
2 – car
Anchor box 1 3 – motorcycle
4 – background
Anchor box 2

Anchor box 3

𝑦 = [𝑃! 𝑡" 𝑡# 𝑡$ 𝑡% 𝑐& 𝑐' 𝑐( 𝑃! 𝑡" 𝑡# 𝑡$ 𝑡% 𝑐& 𝑐' 𝑐( 𝑃! 𝑡" 𝑡# 𝑡$ 𝑡% 𝑐& 𝑐' 𝑐( ]

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 31

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
Ground truth 1 – pedestrian
2 – car
Anchor box 1 3 – motorcycle
4 – background
Anchor box 2

Anchor box 3

𝑦 = [0.67𝑡" 𝑡# 𝑡$ 𝑡% 010 0.73𝑡" 𝑡# 𝑡$ 𝑡% 010 0.49𝑡" 𝑡# 𝑡$ 𝑡% 010]

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 32

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection

Ground truth

Anchor box 1 IoU=0.75

Anchor box 2 IoU=0.80

Anchor box 3 IoU=0.45

𝑦 = [0.67𝑡" 𝑡# 𝑡$ 𝑡% 010 0.73𝑡" 𝑡# 𝑡$ 𝑡% 010 0.79𝑡" 𝑡# 𝑡$ 𝑡% 010]

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 33

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection

Ground truth

Anchor box 2 IoU=0.80

𝑦 = [0.73𝑡" 𝑡# 𝑡$ 𝑡% 010]

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 34

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
𝑦 = [0.73 𝑡! 𝑡" 𝑡# 𝑡$ 0 1 0]
Calculating Bounding box cordinates:

𝑏! = 𝜎 𝑡! + 𝑐!
𝑏" = 𝜎 𝑡" + 𝑐"
𝑏# = 𝑝# 𝑒 ?!
𝑏$ = 𝑝$ 𝑒 ?"
𝑐! , 𝑐" : Top-left corner of the grid cell.
𝑝$ , 𝑝# : Width and height of the anchor box.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 35
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
𝑦 = [0.73 𝑡! 𝑡" 𝑡# 𝑡$ 0 1 0]
𝑐"
Calculating Bounding box cordinates:

𝑐! 𝑏! = 𝜎 𝑡! + 𝑐!
𝑝$
𝑏" = 𝜎 𝑡" + 𝑐"
𝑡! 𝑏# = 𝑝# 𝑒 ?!
𝑝# 𝑏$ = 𝑝$ 𝑒 ?"
𝑡" 𝑐! , 𝑐" : Top-left corner of the grid cell.
𝑝$ , 𝑝# : Width and height of the anchor box.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 36
Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
𝑦z = [𝑃% 𝑡! 𝑡" 𝑡# 𝑡$ 𝑐& 𝑐' 𝑐(]
𝑐"
𝑦 = [𝑃% 𝑏! 𝑏" 𝑏# 𝑏$ 𝑐& 𝑐' 𝑐(]

Converting Bounding box cordinates:

𝑐! 𝑝$ 𝑡! = 𝑏! − 𝑐!
𝑡" = 𝑏" − 𝑐"
𝑡! @
𝑝# 𝑡# = log(A! )
!
𝑡" 𝑡$ = log(A )
@"
"

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 37

Object Detection
Anchor Boxes (YOLO)
How Anchor Boxes Work in Object
Detection
𝑦 = [0.73 𝑡! 𝑡" 𝑡# 𝑡$ 0 1 0]
Final bounding box for object.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 38

Object Detection
You Only Look Once (YOLO) Algorithm
1 – pedestrian 𝑃! 0 0
𝑏" ? ?
2 – car 𝑏# ? ?
3 – motorcycle 𝑏$ ? ?
𝑏% ? ?
𝑐& ? ?
𝑐' ? ?
𝑐 ? ?
𝒚 = 𝟑×𝟑×𝟐×𝟖 𝑦= (
𝑃!
0 1
? 𝑏"
𝑏"
? 𝑏#
3X3 is grid size 𝑏# ? 𝑏$
2 is # anchors 𝑏$ ? 𝑏%
𝑏% ?
8 is P, box cordinates 𝑐& ?
0
1
and #classes 𝑐' ? 0
𝑐(
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 39
Object Detection
Anchor Boxes

Functions of Anchor Boxes

1.Multi-Scale Object Detection:
• Anchor boxes allow the detection of objects at multiple scales by
associating different sizes and aspect ratios with grid cells in the feature
map.
• This is especially useful for detecting small and large objects in the same
image.
2.Handling Aspect Ratios:
• Objects in an image can have different shapes (e.g., tall, wide, square). By
using anchor boxes with varied aspect ratios, the model can better
accommodate these variations.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 40

Object Detection
Anchor Boxes

Functions of Anchor Boxes

3. Prediction Efficiency:
• Instead of predicting bounding boxes from scratch, the model predicts
adjustments to predefined anchor boxes, simplifying the learning
process.

4. Flexibility in Localization:
• Anchor boxes provide a systematic way to divide the search space,
ensuring that each grid cell can potentially detect multiple objects.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 41

Object Detection
Anchor Boxes

Where Are Anchor Boxes Used?

•Single Shot Detector (SSD):
• SSD uses anchor boxes (called default boxes) at different scales for
detecting objects at multiple resolutions.
•Faster R-CNN:
• Faster R-CNN uses anchor boxes in its Region Proposal Network (RPN) to
generate candidate regions of interest.
•YOLO (You Only Look Once):
• YOLOv2 and later versions use anchor boxes to predict bounding boxes
instead of directly regressing box coordinates.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 42

Object Detection
Anchor Boxes

Example

Imagine detecting a dog and a car in an image:

• A grid cell might have three anchor boxes with different aspect ratios (e.g.,
1:1, 2:1, 1:2).
• If the car overlaps with an anchor box of 2:1 ratio, the model adjusts this
anchor box's position and size to better fit the car.
• Similarly, for the dog, the 1:1 ratio anchor box may be adjusted.

By providing these reference boxes, anchor boxes ensure that the model
efficiently learns to detect and localize objects.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 43

Object Detection
Anchor Box Algorithm

Previously: With two Anchor Boxes:

Each object in training Each object in training
image is assigned to grid image is assigned to grid
cell that contains that cell that contains that
object’s midpoint object’s midpoint and
anchor box for the grid cell
with highest IoU

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 44

Object Detection
Anchor Boxes
𝑃!
𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
𝑐'
𝑐
𝑦= (
𝑃!
Anchor Box 2 𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
Anchor Box 1 𝑐'
𝑐(

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 45

Object Detection
Anchor Boxes (After Non-max suppression)
𝑃!
𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
𝑐'
𝑐
𝑦= (
𝑃!
Anchor Box 2 𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
Anchor Box 1 𝑐'
𝑐(

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 46

Object Detection
Anchor Boxes (After Non-max suppression)
𝑃!
𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
𝑐'
𝑐
𝑦= (
𝑃!
Anchor Box 2 𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
Anchor Box 1 𝑐'
𝑐(

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 47

Object Detection
Anchor Boxes
𝑃!
𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
𝑐'
𝑐
𝑦= (
𝑃!
Anchor Box 2 𝑡"
𝑡#
𝑡$
𝑡%
𝑐&
Anchor Box 1 𝑐'
𝑐(