0% found this document useful (0 votes)

27 views19 pages

Unit 3

Image and video analytics unit3

Uploaded by

j.priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views19 pages

Unit 3

Image and video analytics unit3

Uploaded by

j.priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT -3

CCS349 / IMAGE AND VIDEO ANALYTICS

UNIT III OBJECT DETECTION USING MACHINE LEARNING

Object detection– Object detection methods – Deep Learning framework for Object
detection– bounding box approach-Intersection over Union (IoU) –Deep Learning
Architectures-R-CNN-Faster R-CNN-You Only Look Once(YOLO)-Salient features-
Loss Functions-YOLO architectures

What is Object Detection

Object Detection is a computer vision technique to locate objects in an image or in a
video. Organizations and researchers are spending huge time and resources to uncover this
capability. When we humans look at a picture, we can quickly identify the objects and their
respective position in an image. We can quickly categorize if it is an apple or a car or a
human being. We can also determine from any angle. The reason is that our minds have been
trained in such a way that it can identify various objects. Even if the size of an object gets
smaller or bigger, we are able to locate them and detect them. The goal is to replicate this
decision-making intelligence using Machine Learning and Deep Learning.

What is Object classification , object localization and object detection

Look at the images in Figure 5-1 of a vacuum cleaner. The image classification solutions
to classify such images into “a Vacuum Cleaner” or “not.” So we could have easily labeled
the first image as a vacuum cleaner.
On the other hand, localization refers to finding the position of the object in an image.
So when we do Image Localization, it means that the algorithm is having a dual
responsibility of classifying an image as well as drawing a bounding box around it, which
is depicted in the second image. In the first image of Figure 5-1, we have a vacuum
cleaner, and in the second image, we have localized it.

Figure 5-1 Object detection means identifying and localization of the object. In the first image, we can classify
if it is a vacuum cleaner, while in the second image, we are drawing a box around it, which is the localization of
the image
To scale the solution, we can have multiple objects in the same image and even multiple
objects of different categories in the same image, and we have to identify all of them. And
draw the bounding boxes around them. An example can be of a solution trained to detect
cars. On a busy road, there will be many cars, and hence the solution should be able to
detect each of them and draw bounding boxes around them.
Object detection is surely a fantastic solution. We will now discuss the major object
detection use cases in the next section.

Defining the terms Object Detection ,Classification and Localisation

• Object detection finds objects within an image or video.
• Object classification determines which specific objects are within an image or video actually are. It
labels these objects.
• Object localization specifically tracks where objects are located in an image or video. This determines
the position of any object within a piece of visual content.

Applications of Object Classification and Detection :

1. Facial Recognition
2. Cancer Detection
3. Other vehicles
4.Pedestrians
5.Cyclists
6. Traffic signals
7. Lane markings
8.Construction

Use cases of Object Detection

Deep Learning has expanded many capabilities across domains and organizations. Object
detection is a key one and is a very powerful solution which is making huge ripples in our
business and personal world. The major use cases of object detection are
1. Object Detection is the key intelligence behind autonomous driving technology. It
allows the users to detect the cars, pedestrians, the background, motorbikes, and so
on to improve road safety.
2. We can detect objects in the hands of people, and the solution can be used for
security and monitoring purposes. Surveillance systems can be made much more
intelligent and accurate. Crowd control systems can be made more sophisticated,
and the reaction time will be reduced.
3. A solution might be used for detecting objects in a shopping basket, and it can be
used by the retailers for the automated transactions. This will speed up the overall
process with less manual intervention.
4. Object Detection is also used in testing of mechanical systems and on manufacturing
lines. We can detect objects present on the products which might be
contaminating the product quality.
5. In the medical world, the identification of diseases by analyzing the images of a
body part will help in faster treatment of the diseases.
There are very less areas where the usage is not envisioned. It is one of the areas
which are highly researched, and every day new progress is being made in this domain.
Organizations and researchers across the globe are making huge ripples in this area and
creating path-breaking solutions.
Object Detection methods
We can perform object detection using both Machine Learning and Deep Learning.
Here are a few Machine Learning solutions:
1. Image segmentation using simple attributes like shape, size, and color of an object.
2. We can use an aggregated channel feature (ACF), which is a variation of channel
features. ACF does not calculate the rectangular sums at various locations or scales.
Instead, it extracts features directly as pixel values.
3. Viola-Jones algorithm can be used for face detection.
There are other solutions like RANSAC (random sample consensus), Haar feature–
based cascade classifier, SVM classification using HOG features, and so on which can be
used for object detection.

Deep Learning methods :

The following Deep Learning architectures are commonly being used for Object
Detection:
1. R-CNN: Regions with CNN features. It combines Regional Proposals with CNN.
2. Fast R-CNN: A Fast Region–based Convolutional Neural Network.
3. Faster R-CNN: Object detection networks on Region Proposal algorithms to
hypothesize object locations.
4. Mask R-CNN: This network extends Faster R-CNN by adding the prediction of
segmentation masks on each region of interest.
5. YOLO: You Only Look Once architecture. It proposes a single Neural Network to
predict bounding boxes and class probabilities from an image in a single evaluation.
6. SSD: Single Shot MultiBox Detector. It presents a model to predict objects in images
using a single deep Neural Network.

3.2 Deep Learning frameworks for Object

Detection
Few important components of Object Detection are
• Sliding window approach for Object Detection Bounding box
approach
• Intersection over Union
(IoU) Non-max
suppression Anchor
boxes concept

3.2.1 Sliding window approach for Object Detection

When we want to detect objects, a very simple approach can be: why not divide the
image into regions or specific areas and then classify each one of them. This approach for
object detection is sliding window. As the name suggests, it is a rectangular box which
slides through the entire image. The box is of fixed length and width with a stride to move
over the entire image.
Look at the image of the vacuum cleaner in Figure 5-2. We are using a sliding window
at each part of the image. The red box is sliding over the entire image of the vacuum
cleaner. From left to right and then vertically, we can observe that different parts of the
image are becoming the point of observation. Since the window is sliding, it is referred to
as the sliding window approach.
Figure 5-2 The sliding window approach to detect an object and identify it. Notice how the sliding box
is moving across the entire image; the process is able to detect but is really a time-consuming process and
computationally expensive too

Then for each of these regions cropped, we can classify whether this region contains
an object that interests us or not. And then we increase the size of the sliding window
and continue the process.Sliding window has proven to work, but it is a computationally
very expensive technique and will be slow to implement as we are classifying all the
regions in an image.
Also, to localize the objects, we need a small window size and small stride. But still it is a
simple approach to understand.

3.3 Bounding box approach

The sliding window approach outputs less accurate bounding boxes as it is dependent
on the size of the window. And hence we have another approach wherein we divide the
entire image into grids (x by x), and then for each grid, we define our target label. We can
show a bounding box in Figure 5-3.

Bounding box can generate the x coordinate, y coordinate, height, and width of the bounding box and the
class probability score

A bounding box can give us the following details:

Pc: Probability of having an object in the grid cell (0: no object, 1: an object).
Bx:
If Pc is 1, it is the x coordinate of the bounding box.
By: If Pc is 1, it is the y coordinate of the bounding box. Bh:
If Pc is 1, it is the height of the bounding box.
Bw: If Pc is 1, it is the width of the bounding box.
C1: It is the class probability that the object belongs to Class 1. C2:
It is the class probability that the object belongs to Class 2

If an object lies over multiple grids, then the grid that contains the midpoint of that object
is responsible for detecting that object.
Intersection over Union ( IoU):

Intersection over Union is a test to ascertain how close is our prediction to the actual
truth.
It is represented by Equation 5-1 and is shown in Figure 5-4.

Figure 5-4 Intersection over Union is used to measure the performance of detection. The numerator is the
common area, while the denominator is the complete union of the two areas. The higher the value of IoU, the better
it is

IoU = Overlapping region/Combined entire region (Equation 5-1)

So, if we get a higher value of Intersection over Union, it means the overlap is better.
Hence, the prediction is more accurate and better. It is depicted in the example in Figure 55
to visualize.

Figure 5-5 IoU values for different positions of the overlapping blocks. If the value is closer to 1.0, it means that
the detection is more accurate as compared to the value of 0.15

As we can see in Figure 5-5, for IoU of 0.15, there is very less overlap between the two
boxes as compared to 0.85 or 0.90. It means that the one with 0.85 IoU is a better solution
to the one with 0.15 IoU. The detection solution can hence be compared directly.
Intersection over Union allows us to measure and compare the performance of various
solutions. It also makes it easier for us to distinguish between useful bounding boxes and
not-so-important ones. Intersection over Union is an important concept with wide usages.
Using it, we can compare and contrast the acceptability of all the possible solutions and
choose the best one from them.

3.4 Deep Learning architectures

Deep Learning helps in object detection. We can detect objects of interest in an image
or in a video or even in the live video stream. We are going to create a live video stream
solution later in the chapter.
We have seen earlier that there are some problems with the sliding window approach.
Objects can have different locations in an image and can be of different aspect ratio or size.
An object might be covering the entire region; on the other hand, somewhere it will be
covering a small percentage only. There might be more than one object present in the
image. The objects can be at various angles or dimensions. Or one object can lie in multiple
grids. And moreover, some use cases require real-time predictions. It results in having a
very large number of regions and hence huge computation power. It will take a considerable
amount of time too. The traditional approaches of image analysis and detection will not be
of much help in such situations. Hence, we require Deep Learning– based solutions to
resolve and develop robust solutions for object detection.
Deep Learning–based solutions allow us to train better and hence get better results.

3.4.1 Region-based CNN (R-CNN)

We understand that having a very large number of regions is a challenge. Ross
Girshick et al. proposed R-CNN to address the problem of selecting a large number of
regions. RCNN is Region-based CNN architecture. Instead of classifying a huge number
of regions, the solution suggests to use selective search and extract only 2000 regions
from the image.
They are called “Region Proposals.”
The architecture for R-CNN is shown in Figure 5-8.

Figure 5-8 The process in R-CNN. Here, we extract region proposals from the input image, compute the CNN
features, and then classify the regions. Image source: https://arxiv.org/pdf/1311.2524.pdf and published here with the
permission of the researchers

With reference to Figure 5-8 where we have shown the process, let us understand the
entire process in detail now:
1.
The first step is to input an image, represented by step 1 in Figure 5-8.
2. Then get the regions we are interested in, which is shown in step 2 in Figure 5-8. These are the 2000
proposed regions. They are detected using the following steps:
a) We create the initial segmentation for the image.
b) Then we generate the various candidate regions for the image.
c) We combine similar regions into larger ones iteratively. A greedy search approach is used for it.
d) Finally, we use the generated regions to output the final region proposals.
3. Then in the next step, we reshape all the 2000 regions as per the implementation
in the CNN.
4.
We then pass through each region through CNN to get features for each
region.
5.
The extracted features are now passed through a support vector machine to
classify the presence of objects in the region proposed.
6.
And then, we predict the bounding boxes for the objects using bounding box
regression. This means that we are making the final prediction about the image. As
shown in the last step, we are making a prediction if the image is an airplane or a
person or a TV monitor.
The preceding process is used by R-CNN to detect the objects in an image. It is
surely an innovative architecture, and it proposes a region of interest as an impactful
concept to detect objects.
But there are a few challenges with R-CNN, which are
1. R-CNN implements three algorithms (CNN for extracting the features, SVM for
the classification of objects, and bounding box regression for getting the bounding
boxes). It makes R-CNN solutions quite slow to be trained.
2. It extracts features using CNN for each image region. And the number of regions
is 2000. It means if we have 1000 images, the number of features to be extracted
is 1000 times 2000 which again makes it slower.

3. Because of these reasons, it takes 40–50 seconds to make a prediction for an image,
and hence it becomes a problem for huge datasets.
4. Also, the selective search algorithm is fixed, and not much improvements can be
made.
As R-CNN is not very fast and quite difficult to implement for huge datasets, the
same authors proposed Fast R-CNN to overcome the issues. w
Faster R-CNN
To overcome the slowness in R-CNN and Fast R-CNN, Shaoqing Ran et al. proposed
Faster R-CNN. The intuition behind the Faster R-CNN is to replace the selective search
which is slow and time-consuming. Faster R-CNN uses the Regional Proposal Network or
RPN.
The architecture of Faster R-CNN is shown in Figure 5-10.

Figure 5-10 Faster R-CNN is an improvement over the previous versions. It consists of two modules – one
is a deep convolutional network, and the other is the Fast R-CNN detector

Faster R-CNN, is composed of two modules. The first module is a deep fully
convolutional network that proposes regions, and the second module is the Fast R-
CNN detector that uses the proposed regions. The entire system is a single, unified
network for object detection.

The way a Faster R-CNN works is as follows:

1. We take an input image and make it pass through CNN as shown in Figure 5- 10.
2. From the feature maps received, we apply Region Proposal Networks (RPNs). The
way an RPN works can be understood by referring to Figure 5-11.
Figure 5-11 Region proposal networks are used in Faster R-CNN. The image has been taken from the
original paper

The sub steps followed are

a) RPN takes the feature maps generated from the last step.
b) RPN applies a sliding window and generates k anchor boxes. We have discussed
anchor boxes in the last section.
c) The anchor boxes generated are of different shapes and sizes.
d) RPN will also predict that an anchor is an object or not.
e) It will also give the bounding box regressor to adjust the anchors.
f) To be noted is RPN has not suggested the class of the object.
g) We will get object proposals and the respective objectness scores.
3. Apply ROI pooling to make the size of all the proposals the same.
4. And then, finally, we feed them to the fully connected layers with softmax and linear
regression.
5. We will receive the predicted Object Classification and respective bounding boxes.

Faster R-CNN is able to combine the intelligence and use deep convolution fully
connected layers and Fast R-CNN using proposed regions. The entire solution is a
single and unified solution for object detection.
Though Faster R-CNN is surely an improvement in terms of performance over RCNN
and Fast R-CNN, still the algorithm does not analyze all the parts of the image
simultaneously. Instead, each and every part of the image is analyzed in a sequence. Hence,
it requires a large number of passes over a single image to recognize all the objects.
Moreover, since a lot of systems are working in a sequence, the performance of one
depends on the performance of the preceding steps.
You Only Look Once (YOLO)
You Only Look Once or YOLO is targeted for real-time object detection. The previous
algorithms we discussed use regions to localize the objects in the image. Those algorithms
look at a part of the image and not the complete image, whereas in YOLO a single CNN
predicts both the bounding boxes and the respective class probabilities. YOLO was
proposed in 2016 by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi.
The actual paper can be accessed at https://arxiv.org/pdf/1506.02640v5.pdf.
To quote from the actual paper, “We reframe object detection as a single regression
problem, straight from image pixels to bounding box coordinates and class probabilities.”
As shown in Figure 5-12, YOLO divides an image into a grid of cells (represented by
S). Each of the cells predicts bounding boxes (represented by B). Then YOLO works on
each bounding box and generates a confidence score about the goodness of the shape of
the box. The class probability for the object is also predicted. Finally, the bounding box
having class probability scores above are selected, and they are used to locate the object
within that image.

Figure 5-12 The YOLO process is simple; the image has been taken from the original paper
https://arxiv.org/pdf/1506.02640v5.pdf

3.4.2 Salient features of YOLO

1. YOLO divides the input image into an SxS grid. To be noted is that each grid is
responsible for predicting only one object. If the center of an object falls in a grid
cell, that grid cell is responsible for detecting that object.
2. For each of the grid cells, it predicts boundary boxes (B). Each of the boundary boxes
has five attributes – the x coordinate, y coordinate, width, height, and a confidence
score. In other words, it has (x, y, w, h) and a score. This confidence score is the
confidence of having an object inside the box. It also reflects the accuracy of the
boundary box.
3. The width w and height h are normalized to the images’ width and height. The x and
y coordinates represent the center relative to the bound of the grid cells.

4. The confidence is defined as Probability(Object) times IoU. If there is no object, the

confidence is zero. Else, the confidence is equal to the IoU between the predicted
box and ground truth.
5. Each grid cell predicts C conditional class probabilities – Pr(Classi | Object). These
probabilities are conditioned on the grid cell containing an object. We only predict
one set of class probabilities per grid cell, regardless of the number of boxes B.
6. At the test time, we multiply the conditional class probabilities and the individual
class predictions. It gives us the class-specific confidence scores for each box. It can
be represented in Equation 5-2:

We will now examine how we calculate the loss function in YOLO. It is important to
get the loss function calculation function before we can study the entire architecture in
detail.

3.4.3 Loss function in YOLO

We have seen in the last section that YOLO predicts multiple bounding boxes for each
cell. And we choose the bounding box which has the maximum IoU with the ground truth.
To calculate the loss, YOLO optimizes for sum-squared error in the output in the model
as sum-squared error is easy to optimize.
The loss function is shown in Equation 5-3 and comprises localization loss, confidence
loss, and classification loss. We are first representing the complete

loss function and then describing the terms in detail.

(Equation 5-3)

In Equation 5-3, we have localization loss, confidence loss, and classification loss,
where 1 obj i denotes if the object appears in cell i and 1 obj ij denotes that the jth bounding
box predictor in cell i is “responsible” for that prediction.
Let’s describe the terms in the preceding equation. Here, we have
A. Localization loss is to measure the errors for the predicted boundary boxes. It
measures their location and size errors. In the preceding equation, the first two terms
represent the localization loss. 1 obj i is 1 if the jth boundary box in cell i is
responsible for detecting the object, else the value is 0. λcoord is responsible for the
increase in the weight for the loss in the coordinates of the boundary boxes. The
default value of λcoord is 5.
B. Confidence loss is the loss if an object is detected in the box. It is the second loss
term in the equation shown. In the term earlier, we have
C. The next term is a confidence loss if the object is not detected. In the term earlier,
we have

D. The final term is the classification loss. If an object is indeed detected, then for
each cell it is the squared error of the class probabilities for each class.

The final loss is the sum total of all these components. As the objective of any Deep
Learning solution, the objective will be to minimize this loss value.

3.4.4 YOLO architecture

The network design is shown in Figure 5-13 and is taken from the actual paper at
https://arxiv.org/pdf/1506.02640v5.pdf.
Figure 5-13 The complete YOLO architecture; the image has been taken from the original paper at
https://arxiv.org/pdf/1506.02640v5.pdf

In the paper, the authors have mentioned that the network has been an inspiration from
GoogLeNet. The network has 24 convolutional layers followed by 2 fully connected layers.
Instead of Inception modules used by GoogLeNet, YOLO uses 1x1 reduction layers
followed by 3x3 convolutional layers. YOLO might detect the duplicates of the same
object. For this, non-maximal suppression has been implemented.
This removes the duplicate lower confidence score.
In Figure 5-14, we have a figure having 13x13 grids. In total, 169 grids are there
wherein each grid predicts 5 bounding boxes. Hence, there are a total of 169*5 = 845
bounding boxes. When we apply a threshold of 30% or more, we get 3 bounding boxes as
shown in Figure 5-14.

Figure 5-14 The YOLO process divides the region into SxS grids. Each grid predicts five bounding boxes,
and based on the threshold setting which is 30% here, we get the final three bounding boxes; the image has been
taken from the original paper

So, YOLO looks at the image only once but in a clever manner. It is a very fast algorithm for
real-time processing. To quote from the original paper:
1. YOLO is refreshingly simple.
2. YOLO is extremely fast. Since we frame detection as a regression problem we don’t
need a complex pipeline. We simply run our Neural Network on a new image at test
time to predict detections. Our base network runs at 45 frames per second with no
batch processing on a Titan X GPU and a fast version runs at more than 150 fps. This
means we can process streaming video in real-time with less than 25 milliseconds of
latency. Furthermore, YOLO achieves more than twice the mean average precision of
other real-time systems.
3. YOLO reasons globally about the image when making predictions. Unlike sliding
window and region proposal-based techniques, YOLO sees the entire image during
training and test time so it implicitly encodes contextual information about classes as
well as their appearance.
4. YOLO learns generalizable representations of objects. When trained on natural
images and tested on artwork, YOLO outperforms top detection methods like DPM
and R-CNN by a wide margin. Since YOLO is highly generalizable it is less likely to
break down when applied to new domains or unexpected inputs.
There are a few challenges with YOLO too. It suffers from high localization error.
Moreover, since each of the grid cells predicts only two boxes and can have only one class
as the output, YOLO can predict only a limited number of nearby objects. It suffers from
a problem of low recall too. And hence in the next version of YOLOv2 and YOLOv3,
these issues were addressed.
YOLO is one of the most widely used object detection solutions. Its uniqueness lies in
its simplicity and speed.

Questions:

Part-A
1. What is the concept of anchor boxes and non-max suppression?
To generate the final object detections, tiled anchor boxes that belong to the
background class are removed, and the remaining ones are filtered by their confidence
score. Anchor boxes with the greatest confidence score are selected using nonmaximum
suppression (NMS).

2. How are bounding boxes important for object detection?

In the context of digital image processing, the bounding box denotes the border's
coordinates on the X and Y axes that enclose an image. They are used to identify a target
and serve as a reference for object detection and generate a collision box for the object.

3. How are R-CNN , Fast R-CNN and Faster R-CNN different and what are
the improvements?

R-CNN Fast R-CNN Faster R-CNN

region proposals Selective Selective search Region proposal network

method search

Prediction timing 40-50 sec 2 seconds 0.2 seconds

computation High High computation time Low computation time

computation
time
The mAP on 58.5 66.9 (when trained with 69.9(when trained with
Pascal VOC 2007 VOC 2007 only) VOC 2007 only)
test dataset(%) 70.0 (when trained with
VOC 2007 and 2012
both)

The mAP on Pascal 53.3 65.7 (when trained 67.0(when trained with
VOC 2012 test dataset with VOC 2012 VOC 2012 only)
(%) only) 70.4 (when trained with
68.4 (when trained with
VOC 2007 and 2012 both)
VOC 2007 and 2012
75.9(when trained with
both)
VOC 2007 and 2012 and
COCO)

4. What is IoU?
IoU calculates intersection over the union of the two bounding boxes, the bounding box
of the ground truth and the predicted bounding box.

5. What are the metrics used for object detection?

mAP (mean Average precision) is a popular metric in measuring the accuracy of object
detectors. Average precision calculates the average precision value for recall value over 0
to 1.

6. What is NMS?

Non-Max Suppression (NMS) is a technique used in many computer vision

object detection algorithms. It is a class of algorithms to select one bounding box out
of many overlapping bounding boxes for a single class. NMS implementation:

a) Sort the prediction confidence scores in decreasing order.

b) Start from the top scores, ignore any current prediction if we find any previous
predictions that have the same class and IoU > Threshold(generally we use 0.5) with
the current prediction.
c) Repeat the above step until all predictions are checked.

7. What is the loss function in YOLO?

YOLO uses a sum of squared error between the predictions and the ground truth to
calculate the loss. The loss function composes of:
• The Classification loss.
• The Localization loss (errors between the predicted boundary
box and the ground truth).
• The Confidence loss (the objectness of the box).

8. What is the advantage of two-stage methods?

In two-stage methods like R-CNN, they first predict a few candidate

object locations and then use a convolutional neural network to classify each
of these candidate object locations as one of the classes or as background.

9. What is FPN?

Feature Pyramid Network (FPN) is a feature extractor designed with a

feature pyramid concept to improve accuracy and speed. Images are first to
pass through the CNN pathway, yielding semantically rich final layers.
Then to regain better resolution, it creates a top-down pathway by
upsampling this feature map. While the top-down pathway helps detect
objects of varying sizes, spatial positions may be skewed. Lateral
connections are added between the original feature maps and the
corresponding reconstructed layers to improve object localization. It
currently provides one of the leading ways to detect objects at multiple
scales, and YOLOv3, Faster R-CNN were build up with this technique.

10. Why do we use data augmentation?

Data augmentation is a technique for synthesizing new data by

modifying existing data in such a way that the target is not changed, or it is
changed in a known way. Data augmentation is important in improving
accuracy. Augment data techniques like flipping, cropping, add noise, and
color distortion.

11. What is the advantage of SDD over Faster R-CNN?

SSD speeds up the process by removing the need for the

region proposal network(RPN) used in Faster R-CNN.
Part – B
1. Explain about Object detection and various Object detection methods.
2. Elaborate about Deep Learning framework for Object detection.
3. Explain about bounding box approach.
4. Discuss about Intersection over Union (IoU).
5. Elaborate about Deep Learning Architectures of R-CNN.
6. Discuss about Faster R-CNN.
7. Discuss about You Only Look Once (YOLO), Salient features, Loss Functions.
8. Illustrate and explain about YOLO architectures.

Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Object Detection and Identification Report
No ratings yet
Object Detection and Identification Report
45 pages
Object Detection Project Report
No ratings yet
Object Detection Project Report
45 pages
Unit 3
No ratings yet
Unit 3
17 pages
What Is Object Detection in Computer Vision
No ratings yet
What Is Object Detection in Computer Vision
8 pages
Understanding Object Detection Techniques
No ratings yet
Understanding Object Detection Techniques
46 pages
Object Detection for the Visually Impaired
No ratings yet
Object Detection for the Visually Impaired
4 pages
Object Detection Explained
No ratings yet
Object Detection Explained
45 pages
Object Detection
No ratings yet
Object Detection
76 pages
John 2020 Comparative
No ratings yet
John 2020 Comparative
7 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Object Detection
No ratings yet
Object Detection
13 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
Finalreport
No ratings yet
Finalreport
56 pages
Real-Time Object Detection App
No ratings yet
Real-Time Object Detection App
6 pages
Object Detection Security System Report
No ratings yet
Object Detection Security System Report
13 pages
Object Detectionusing Machine Learningand Deep Learning
No ratings yet
Object Detectionusing Machine Learningand Deep Learning
9 pages
Object Detection of Threatening Items Using TensorFlow
No ratings yet
Object Detection of Threatening Items Using TensorFlow
5 pages
Tools, Techniques, Datasets and Application Areas For Object Detection in An Image: A Review
No ratings yet
Tools, Techniques, Datasets and Application Areas For Object Detection in An Image: A Review
55 pages
Devansh Rajesh Dhuri 8TH F Roll No.13 (Object Detection in Ai)
No ratings yet
Devansh Rajesh Dhuri 8TH F Roll No.13 (Object Detection in Ai)
10 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
From Classical Techniques To Convolution-Based Models: A Review of Object Detection Algorithms
No ratings yet
From Classical Techniques To Convolution-Based Models: A Review of Object Detection Algorithms
6 pages
Vijay Report
No ratings yet
Vijay Report
14 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
Object Detection Report
No ratings yet
Object Detection Report
48 pages
Object Detection Presentation
No ratings yet
Object Detection Presentation
12 pages
Object Detection in Deep Learning
No ratings yet
Object Detection in Deep Learning
61 pages
An Evaluation of Deep Learning Methods For Small Object
No ratings yet
An Evaluation of Deep Learning Methods For Small Object
18 pages
Object Detection Techniques Explained
No ratings yet
Object Detection Techniques Explained
16 pages
A Novel Model To Detect and Categorize Objects From Images by Using A Hybrid Machine Learning Model
No ratings yet
A Novel Model To Detect and Categorize Objects From Images by Using A Hybrid Machine Learning Model
13 pages
Object Detection and Game-Based Learning
No ratings yet
Object Detection and Game-Based Learning
23 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
5 pages
Real-Time CNN Visual Recognition
No ratings yet
Real-Time CNN Visual Recognition
13 pages
Object Detection for Robotics
No ratings yet
Object Detection for Robotics
4 pages
E3sconf Icmed-Icmpc2023 01016
No ratings yet
E3sconf Icmed-Icmpc2023 01016
6 pages
Object Tracking
No ratings yet
Object Tracking
50 pages
De PPT (EC) 2-1
No ratings yet
De PPT (EC) 2-1
16 pages
Fin Irjmets1684232858
No ratings yet
Fin Irjmets1684232858
9 pages
Object Detection Techniques Overview
No ratings yet
Object Detection Techniques Overview
22 pages
Object Recognition Techniques Guide
No ratings yet
Object Recognition Techniques Guide
34 pages
Object Detection with OpenCV in Python
No ratings yet
Object Detection with OpenCV in Python
5 pages
2003 07442v1 PDF
No ratings yet
2003 07442v1 PDF
7 pages
Module 6
No ratings yet
Module 6
83 pages
Object Detection in Real Images
No ratings yet
Object Detection in Real Images
27 pages
Mini Project Report
No ratings yet
Mini Project Report
15 pages
Mask R-CNN for Indoor Scene Segmentation
No ratings yet
Mask R-CNN for Indoor Scene Segmentation
13 pages
Paper 7 - The Object Detection Based On Deep Learning
No ratings yet
Paper 7 - The Object Detection Based On Deep Learning
6 pages
Research Article: An Evaluation of Deep Learning Methods For Small Object Detection
No ratings yet
Research Article: An Evaluation of Deep Learning Methods For Small Object Detection
18 pages
The Comparison Between Various Object de
No ratings yet
The Comparison Between Various Object de
3 pages
Irjet V6i4920
No ratings yet
Irjet V6i4920
7 pages
Real Time Object Detection Using Deep Learning Andmachine Learning Project
No ratings yet
Real Time Object Detection Using Deep Learning Andmachine Learning Project
56 pages
A Literature Review of Object Detection Using YOLOv4 Detector
No ratings yet
A Literature Review of Object Detection Using YOLOv4 Detector
7 pages
Solar Battery Charger
No ratings yet
Solar Battery Charger
25 pages
Object Detection Using Machine Learning: Bachelor of Technology
No ratings yet
Object Detection Using Machine Learning: Bachelor of Technology
45 pages
Object Detection
No ratings yet
Object Detection
3 pages
Engproc 33 00022
No ratings yet
Engproc 33 00022
6 pages
A Review and An Approach For Object Detection in Images
No ratings yet
A Review and An Approach For Object Detection in Images
43 pages
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
No ratings yet
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
6 pages
Image and Video Analytics - Syllabus
No ratings yet
Image and Video Analytics - Syllabus
2 pages
Python Unit5
No ratings yet
Python Unit5
45 pages
Python Unit3
No ratings yet
Python Unit3
28 pages
Python Unit 1
No ratings yet
Python Unit 1
45 pages
Python Unit2
No ratings yet
Python Unit2
48 pages
Quotation For Photogrammetry & Services
No ratings yet
Quotation For Photogrammetry & Services
2 pages
DS - Ass 4
No ratings yet
DS - Ass 4
3 pages
PTM Photoshop Actions Guide
No ratings yet
PTM Photoshop Actions Guide
15 pages
Gr-9UNIT 3 - Digital Documentation Notes
No ratings yet
Gr-9UNIT 3 - Digital Documentation Notes
7 pages
Fightcade Update for Retro Gamers
No ratings yet
Fightcade Update for Retro Gamers
17 pages
Advance Python Programming
100% (1)
Advance Python Programming
12 pages
LLT User Guide
No ratings yet
LLT User Guide
29 pages
m710s Ug HMM en
No ratings yet
m710s Ug HMM en
72 pages
FY15 DVD Receiver FW Update Instructions
No ratings yet
FY15 DVD Receiver FW Update Instructions
7 pages
Ch567 Cognitive SOCIAL Emotional INTERACTION
No ratings yet
Ch567 Cognitive SOCIAL Emotional INTERACTION
177 pages
H3C UniServer R4700 G5 Rack Server Data Sheet v1.3
No ratings yet
H3C UniServer R4700 G5 Rack Server Data Sheet v1.3
3 pages
General Features of Microsoft Windows Lectures
No ratings yet
General Features of Microsoft Windows Lectures
63 pages
ICT Word Processing Lesson - 2
No ratings yet
ICT Word Processing Lesson - 2
20 pages
62) Sports Event Management System
100% (1)
62) Sports Event Management System
26 pages
Dosaya 2020
No ratings yet
Dosaya 2020
6 pages
B32D74CE-6F01-4045-8912-9C579A8EF7CC
No ratings yet
B32D74CE-6F01-4045-8912-9C579A8EF7CC
44 pages
Mobile OS Features and Constraints
No ratings yet
Mobile OS Features and Constraints
36 pages
Wa0002
No ratings yet
Wa0002
295 pages
Stcath2026sba IT
No ratings yet
Stcath2026sba IT
8 pages
Quiz App Development Report
No ratings yet
Quiz App Development Report
18 pages
100 Project Ideas
No ratings yet
100 Project Ideas
15 pages
University of Gondar: Web Based Student Information Management System For Edgetfeleg General Secondary School
100% (1)
University of Gondar: Web Based Student Information Management System For Edgetfeleg General Secondary School
32 pages
Raspberry Pi Beginners Guide The Ultimate Raspberry Pi 4 Setup Programming Projects Guide For Beginners Learn Programming Skills and Become A Master in Computer Technology
No ratings yet
Raspberry Pi Beginners Guide The Ultimate Raspberry Pi 4 Setup Programming Projects Guide For Beginners Learn Programming Skills and Become A Master in Computer Technology
70 pages
Enhanced Breakout Ball Game Report
No ratings yet
Enhanced Breakout Ball Game Report
7 pages
C++ and Programming Fundamentals
No ratings yet
C++ and Programming Fundamentals
86 pages
Naadhi - Test - GD5 - Indesign
No ratings yet
Naadhi - Test - GD5 - Indesign
2 pages
AutoCAD Essentials for Engineers
No ratings yet
AutoCAD Essentials for Engineers
2 pages
Java Slip Sssss
No ratings yet
Java Slip Sssss
25 pages
Computer Architecture Course Guide
No ratings yet
Computer Architecture Course Guide
5 pages
Power BI vs Tableau: Key Comparisons
100% (1)
Power BI vs Tableau: Key Comparisons
4 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

UNIT -3

CCS349 / IMAGE AND VIDEO ANALYTICS

UNIT III OBJECT DETECTION USING MACHINE LEARNING

What is Object Detection

What is Object classification , object localization and object detection

Defining the terms Object Detection ,Classification and Localisation

Applications of Object Classification and Detection :

Use cases of Object Detection

Deep Learning methods :

3.2 Deep Learning frameworks for Object

3.2.1 Sliding window approach for Object Detection

3.3 Bounding box approach

A bounding box can give us the following details:

IoU = Overlapping region/Combined entire region (Equation 5-1)

3.4 Deep Learning architectures

3.4.1 Region-based CNN (R-CNN)

The way a Faster R-CNN works is as follows:

The sub steps followed are

3.4.2 Salient features of YOLO

4. The confidence is defined as Probability(Object) times IoU. If there is no object, the

3.4.3 Loss function in YOLO

loss function and then describing the terms in detail.

3.4.4 YOLO architecture

2. How are bounding boxes important for object detection?

R-CNN Fast R-CNN Faster R-CNN

region proposals Selective Selective search Region proposal network

Prediction timing 40-50 sec 2 seconds 0.2 seconds

computation High High computation time Low computation time

5. What are the metrics used for object detection?

Non-Max Suppression (NMS) is a technique used in many computer vision

a) Sort the prediction confidence scores in decreasing order.

7. What is the loss function in YOLO?

8. What is the advantage of two-stage methods?

In two-stage methods like R-CNN, they first predict a few candidate

Feature Pyramid Network (FPN) is a feature extractor designed with a

10. Why do we use data augmentation?

Data augmentation is a technique for synthesizing new data by

11. What is the advantage of SDD over Faster R-CNN?

SSD speeds up the process by removing the need for the

You might also like