0% found this document useful (0 votes)
27 views19 pages

Unit 3

Image and video analytics unit3

Uploaded by

j.priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views19 pages

Unit 3

Image and video analytics unit3

Uploaded by

j.priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT -3

CCS349 / IMAGE AND VIDEO ANALYTICS

UNIT III OBJECT DETECTION USING MACHINE LEARNING


Object detection– Object detection methods – Deep Learning framework for Object
detection– bounding box approach-Intersection over Union (IoU) –Deep Learning
Architectures-R-CNN-Faster R-CNN-You Only Look Once(YOLO)-Salient features-
Loss Functions-YOLO architectures

What is Object Detection


Object Detection is a computer vision technique to locate objects in an image or in a
video. Organizations and researchers are spending huge time and resources to uncover this
capability. When we humans look at a picture, we can quickly identify the objects and their
respective position in an image. We can quickly categorize if it is an apple or a car or a
human being. We can also determine from any angle. The reason is that our minds have been
trained in such a way that it can identify various objects. Even if the size of an object gets
smaller or bigger, we are able to locate them and detect them. The goal is to replicate this
decision-making intelligence using Machine Learning and Deep Learning.

What is Object classification , object localization and object detection


Look at the images in Figure 5-1 of a vacuum cleaner. The image classification solutions
to classify such images into “a Vacuum Cleaner” or “not.” So we could have easily labeled
the first image as a vacuum cleaner.
On the other hand, localization refers to finding the position of the object in an image.
So when we do Image Localization, it means that the algorithm is having a dual
responsibility of classifying an image as well as drawing a bounding box around it, which
is depicted in the second image. In the first image of Figure 5-1, we have a vacuum
cleaner, and in the second image, we have localized it.

Figure 5-1 Object detection means identifying and localization of the object. In the first image, we can classify
if it is a vacuum cleaner, while in the second image, we are drawing a box around it, which is the localization of
the image
To scale the solution, we can have multiple objects in the same image and even multiple
objects of different categories in the same image, and we have to identify all of them. And
draw the bounding boxes around them. An example can be of a solution trained to detect
cars. On a busy road, there will be many cars, and hence the solution should be able to
detect each of them and draw bounding boxes around them.
Object detection is surely a fantastic solution. We will now discuss the major object
detection use cases in the next section.

Defining the terms Object Detection ,Classification and Localisation


• Object detection finds objects within an image or video.
• Object classification determines which specific objects are within an image or video actually are. It
labels these objects.
• Object localization specifically tracks where objects are located in an image or video. This determines
the position of any object within a piece of visual content.

Applications of Object Classification and Detection :

1. Facial Recognition
2. Cancer Detection
3. Other vehicles
4.Pedestrians
5.Cyclists
6. Traffic signals
7. Lane markings
8.Construction

Use cases of Object Detection


Deep Learning has expanded many capabilities across domains and organizations. Object
detection is a key one and is a very powerful solution which is making huge ripples in our
business and personal world. The major use cases of object detection are
1. Object Detection is the key intelligence behind autonomous driving technology. It
allows the users to detect the cars, pedestrians, the background, motorbikes, and so
on to improve road safety.
2. We can detect objects in the hands of people, and the solution can be used for
security and monitoring purposes. Surveillance systems can be made much more
intelligent and accurate. Crowd control systems can be made more sophisticated,
and the reaction time will be reduced.
3. A solution might be used for detecting objects in a shopping basket, and it can be
used by the retailers for the automated transactions. This will speed up the overall
process with less manual intervention.
4. Object Detection is also used in testing of mechanical systems and on manufacturing
lines. We can detect objects present on the products which might be
contaminating the product quality.
5. In the medical world, the identification of diseases by analyzing the images of a
body part will help in faster treatment of the diseases.
There are very less areas where the usage is not envisioned. It is one of the areas
which are highly researched, and every day new progress is being made in this domain.
Organizations and researchers across the globe are making huge ripples in this area and
creating path-breaking solutions.
Object Detection methods
We can perform object detection using both Machine Learning and Deep Learning.
Here are a few Machine Learning solutions:
1. Image segmentation using simple attributes like shape, size, and color of an object.
2. We can use an aggregated channel feature (ACF), which is a variation of channel
features. ACF does not calculate the rectangular sums at various locations or scales.
Instead, it extracts features directly as pixel values.
3. Viola-Jones algorithm can be used for face detection.
There are other solutions like RANSAC (random sample consensus), Haar feature–
based cascade classifier, SVM classification using HOG features, and so on which can be
used for object detection.

Deep Learning methods :

The following Deep Learning architectures are commonly being used for Object
Detection:
1. R-CNN: Regions with CNN features. It combines Regional Proposals with CNN.
2. Fast R-CNN: A Fast Region–based Convolutional Neural Network.
3. Faster R-CNN: Object detection networks on Region Proposal algorithms to
hypothesize object locations.
4. Mask R-CNN: This network extends Faster R-CNN by adding the prediction of
segmentation masks on each region of interest.
5. YOLO: You Only Look Once architecture. It proposes a single Neural Network to
predict bounding boxes and class probabilities from an image in a single evaluation.
6. SSD: Single Shot MultiBox Detector. It presents a model to predict objects in images
using a single deep Neural Network.

3.2 Deep Learning frameworks for Object


Detection
Few important components of Object Detection are
• Sliding window approach for Object Detection Bounding box
approach
• Intersection over Union
(IoU) Non-max
suppression Anchor
boxes concept

3.2.1 Sliding window approach for Object Detection

When we want to detect objects, a very simple approach can be: why not divide the
image into regions or specific areas and then classify each one of them. This approach for
object detection is sliding window. As the name suggests, it is a rectangular box which
slides through the entire image. The box is of fixed length and width with a stride to move
over the entire image.
Look at the image of the vacuum cleaner in Figure 5-2. We are using a sliding window
at each part of the image. The red box is sliding over the entire image of the vacuum
cleaner. From left to right and then vertically, we can observe that different parts of the
image are becoming the point of observation. Since the window is sliding, it is referred to
as the sliding window approach.
Figure 5-2 The sliding window approach to detect an object and identify it. Notice how the sliding box
is moving across the entire image; the process is able to detect but is really a time-consuming process and
computationally expensive too

Then for each of these regions cropped, we can classify whether this region contains
an object that interests us or not. And then we increase the size of the sliding window
and continue the process.Sliding window has proven to work, but it is a computationally
very expensive technique and will be slow to implement as we are classifying all the
regions in an image.
Also, to localize the objects, we need a small window size and small stride. But still it is a
simple approach to understand.

3.3 Bounding box approach


The sliding window approach outputs less accurate bounding boxes as it is dependent
on the size of the window. And hence we have another approach wherein we divide the
entire image into grids (x by x), and then for each grid, we define our target label. We can
show a bounding box in Figure 5-3.

Bounding box can generate the x coordinate, y coordinate, height, and width of the bounding box and the
class probability score

A bounding box can give us the following details:


Pc: Probability of having an object in the grid cell (0: no object, 1: an object).
Bx:
If Pc is 1, it is the x coordinate of the bounding box.
By: If Pc is 1, it is the y coordinate of the bounding box. Bh:
If Pc is 1, it is the height of the bounding box.
Bw: If Pc is 1, it is the width of the bounding box.
C1: It is the class probability that the object belongs to Class 1. C2:
It is the class probability that the object belongs to Class 2

If an object lies over multiple grids, then the grid that contains the midpoint of that object
is responsible for detecting that object.
Intersection over Union ( IoU):

Intersection over Union is a test to ascertain how close is our prediction to the actual
truth.
It is represented by Equation 5-1 and is shown in Figure 5-4.

Figure 5-4 Intersection over Union is used to measure the performance of detection. The numerator is the
common area, while the denominator is the complete union of the two areas. The higher the value of IoU, the better
it is

IoU = Overlapping region/Combined entire region (Equation 5-1)


So, if we get a higher value of Intersection over Union, it means the overlap is better.
Hence, the prediction is more accurate and better. It is depicted in the example in Figure 55
to visualize.

Figure 5-5 IoU values for different positions of the overlapping blocks. If the value is closer to 1.0, it means that
the detection is more accurate as compared to the value of 0.15

As we can see in Figure 5-5, for IoU of 0.15, there is very less overlap between the two
boxes as compared to 0.85 or 0.90. It means that the one with 0.85 IoU is a better solution
to the one with 0.15 IoU. The detection solution can hence be compared directly.
Intersection over Union allows us to measure and compare the performance of various
solutions. It also makes it easier for us to distinguish between useful bounding boxes and
not-so-important ones. Intersection over Union is an important concept with wide usages.
Using it, we can compare and contrast the acceptability of all the possible solutions and
choose the best one from them.

3.4 Deep Learning architectures


Deep Learning helps in object detection. We can detect objects of interest in an image
or in a video or even in the live video stream. We are going to create a live video stream
solution later in the chapter.
We have seen earlier that there are some problems with the sliding window approach.
Objects can have different locations in an image and can be of different aspect ratio or size.
An object might be covering the entire region; on the other hand, somewhere it will be
covering a small percentage only. There might be more than one object present in the
image. The objects can be at various angles or dimensions. Or one object can lie in multiple
grids. And moreover, some use cases require real-time predictions. It results in having a
very large number of regions and hence huge computation power. It will take a considerable
amount of time too. The traditional approaches of image analysis and detection will not be
of much help in such situations. Hence, we require Deep Learning– based solutions to
resolve and develop robust solutions for object detection.
Deep Learning–based solutions allow us to train better and hence get better results.

3.4.1 Region-based CNN (R-CNN)


We understand that having a very large number of regions is a challenge. Ross
Girshick et al. proposed R-CNN to address the problem of selecting a large number of
regions. RCNN is Region-based CNN architecture. Instead of classifying a huge number
of regions, the solution suggests to use selective search and extract only 2000 regions
from the image.
They are called “Region Proposals.”
The architecture for R-CNN is shown in Figure 5-8.

Figure 5-8 The process in R-CNN. Here, we extract region proposals from the input image, compute the CNN
features, and then classify the regions. Image source: https://arxiv.org/pdf/1311.2524.pdf and published here with the
permission of the researchers

With reference to Figure 5-8 where we have shown the process, let us understand the
entire process in detail now:
1.
The first step is to input an image, represented by step 1 in Figure 5-8.
2. Then get the regions we are interested in, which is shown in step 2 in Figure 5-8. These are the 2000
proposed regions. They are detected using the following steps:
a) We create the initial segmentation for the image.
b) Then we generate the various candidate regions for the image.
c) We combine similar regions into larger ones iteratively. A greedy search approach is used for it.
d) Finally, we use the generated regions to output the final region proposals.
3. Then in the next step, we reshape all the 2000 regions as per the implementation
in the CNN.
4.
We then pass through each region through CNN to get features for each
region.
5.
The extracted features are now passed through a support vector machine to
classify the presence of objects in the region proposed.
6.
And then, we predict the bounding boxes for the objects using bounding box
regression. This means that we are making the final prediction about the image. As
shown in the last step, we are making a prediction if the image is an airplane or a
person or a TV monitor.
The preceding process is used by R-CNN to detect the objects in an image. It is
surely an innovative architecture, and it proposes a region of interest as an impactful
concept to detect objects.
But there are a few challenges with R-CNN, which are
1. R-CNN implements three algorithms (CNN for extracting the features, SVM for
the classification of objects, and bounding box regression for getting the bounding
boxes). It makes R-CNN solutions quite slow to be trained.
2. It extracts features using CNN for each image region. And the number of regions
is 2000. It means if we have 1000 images, the number of features to be extracted
is 1000 times 2000 which again makes it slower.

3. Because of these reasons, it takes 40–50 seconds to make a prediction for an image,
and hence it becomes a problem for huge datasets.
4. Also, the selective search algorithm is fixed, and not much improvements can be
made.
As R-CNN is not very fast and quite difficult to implement for huge datasets, the
same authors proposed Fast R-CNN to overcome the issues. w
Faster R-CNN
To overcome the slowness in R-CNN and Fast R-CNN, Shaoqing Ran et al. proposed
Faster R-CNN. The intuition behind the Faster R-CNN is to replace the selective search
which is slow and time-consuming. Faster R-CNN uses the Regional Proposal Network or
RPN.
The architecture of Faster R-CNN is shown in Figure 5-10.

Figure 5-10 Faster R-CNN is an improvement over the previous versions. It consists of two modules – one
is a deep convolutional network, and the other is the Fast R-CNN detector

Faster R-CNN, is composed of two modules. The first module is a deep fully
convolutional network that proposes regions, and the second module is the Fast R-
CNN detector that uses the proposed regions. The entire system is a single, unified
network for object detection.

The way a Faster R-CNN works is as follows:


1. We take an input image and make it pass through CNN as shown in Figure 5- 10.
2. From the feature maps received, we apply Region Proposal Networks (RPNs). The
way an RPN works can be understood by referring to Figure 5-11.
Figure 5-11 Region proposal networks are used in Faster R-CNN. The image has been taken from the
original paper

The sub steps followed are


a) RPN takes the feature maps generated from the last step.
b) RPN applies a sliding window and generates k anchor boxes. We have discussed
anchor boxes in the last section.
c) The anchor boxes generated are of different shapes and sizes.
d) RPN will also predict that an anchor is an object or not.
e) It will also give the bounding box regressor to adjust the anchors.
f) To be noted is RPN has not suggested the class of the object.
g) We will get object proposals and the respective objectness scores.
3. Apply ROI pooling to make the size of all the proposals the same.
4. And then, finally, we feed them to the fully connected layers with softmax and linear
regression.
5. We will receive the predicted Object Classification and respective bounding boxes.

Faster R-CNN is able to combine the intelligence and use deep convolution fully
connected layers and Fast R-CNN using proposed regions. The entire solution is a
single and unified solution for object detection.
Though Faster R-CNN is surely an improvement in terms of performance over RCNN
and Fast R-CNN, still the algorithm does not analyze all the parts of the image
simultaneously. Instead, each and every part of the image is analyzed in a sequence. Hence,
it requires a large number of passes over a single image to recognize all the objects.
Moreover, since a lot of systems are working in a sequence, the performance of one
depends on the performance of the preceding steps.
You Only Look Once (YOLO)
You Only Look Once or YOLO is targeted for real-time object detection. The previous
algorithms we discussed use regions to localize the objects in the image. Those algorithms
look at a part of the image and not the complete image, whereas in YOLO a single CNN
predicts both the bounding boxes and the respective class probabilities. YOLO was
proposed in 2016 by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi.
The actual paper can be accessed at https://arxiv.org/pdf/1506.02640v5.pdf.
To quote from the actual paper, “We reframe object detection as a single regression
problem, straight from image pixels to bounding box coordinates and class probabilities.”
As shown in Figure 5-12, YOLO divides an image into a grid of cells (represented by
S). Each of the cells predicts bounding boxes (represented by B). Then YOLO works on
each bounding box and generates a confidence score about the goodness of the shape of
the box. The class probability for the object is also predicted. Finally, the bounding box
having class probability scores above are selected, and they are used to locate the object
within that image.

Figure 5-12 The YOLO process is simple; the image has been taken from the original paper
https://arxiv.org/pdf/1506.02640v5.pdf

3.4.2 Salient features of YOLO

1. YOLO divides the input image into an SxS grid. To be noted is that each grid is
responsible for predicting only one object. If the center of an object falls in a grid
cell, that grid cell is responsible for detecting that object.
2. For each of the grid cells, it predicts boundary boxes (B). Each of the boundary boxes
has five attributes – the x coordinate, y coordinate, width, height, and a confidence
score. In other words, it has (x, y, w, h) and a score. This confidence score is the
confidence of having an object inside the box. It also reflects the accuracy of the
boundary box.
3. The width w and height h are normalized to the images’ width and height. The x and
y coordinates represent the center relative to the bound of the grid cells.

4. The confidence is defined as Probability(Object) times IoU. If there is no object, the


confidence is zero. Else, the confidence is equal to the IoU between the predicted
box and ground truth.
5. Each grid cell predicts C conditional class probabilities – Pr(Classi | Object). These
probabilities are conditioned on the grid cell containing an object. We only predict
one set of class probabilities per grid cell, regardless of the number of boxes B.
6. At the test time, we multiply the conditional class probabilities and the individual
class predictions. It gives us the class-specific confidence scores for each box. It can
be represented in Equation 5-2:

We will now examine how we calculate the loss function in YOLO. It is important to
get the loss function calculation function before we can study the entire architecture in
detail.

3.4.3 Loss function in YOLO


We have seen in the last section that YOLO predicts multiple bounding boxes for each
cell. And we choose the bounding box which has the maximum IoU with the ground truth.
To calculate the loss, YOLO optimizes for sum-squared error in the output in the model
as sum-squared error is easy to optimize.
The loss function is shown in Equation 5-3 and comprises localization loss, confidence
loss, and classification loss. We are first representing the complete

loss function and then describing the terms in detail.

(Equation 5-3)

In Equation 5-3, we have localization loss, confidence loss, and classification loss,
where 1 obj i denotes if the object appears in cell i and 1 obj ij denotes that the jth bounding
box predictor in cell i is “responsible” for that prediction.
Let’s describe the terms in the preceding equation. Here, we have
A. Localization loss is to measure the errors for the predicted boundary boxes. It
measures their location and size errors. In the preceding equation, the first two terms
represent the localization loss. 1 obj i is 1 if the jth boundary box in cell i is
responsible for detecting the object, else the value is 0. λcoord is responsible for the
increase in the weight for the loss in the coordinates of the boundary boxes. The
default value of λcoord is 5.
B. Confidence loss is the loss if an object is detected in the box. It is the second loss
term in the equation shown. In the term earlier, we have
C. The next term is a confidence loss if the object is not detected. In the term earlier,
we have

D. The final term is the classification loss. If an object is indeed detected, then for
each cell it is the squared error of the class probabilities for each class.

The final loss is the sum total of all these components. As the objective of any Deep
Learning solution, the objective will be to minimize this loss value.

3.4.4 YOLO architecture


The network design is shown in Figure 5-13 and is taken from the actual paper at
https://arxiv.org/pdf/1506.02640v5.pdf.
Figure 5-13 The complete YOLO architecture; the image has been taken from the original paper at
https://arxiv.org/pdf/1506.02640v5.pdf

In the paper, the authors have mentioned that the network has been an inspiration from
GoogLeNet. The network has 24 convolutional layers followed by 2 fully connected layers.
Instead of Inception modules used by GoogLeNet, YOLO uses 1x1 reduction layers
followed by 3x3 convolutional layers. YOLO might detect the duplicates of the same
object. For this, non-maximal suppression has been implemented.
This removes the duplicate lower confidence score.
In Figure 5-14, we have a figure having 13x13 grids. In total, 169 grids are there
wherein each grid predicts 5 bounding boxes. Hence, there are a total of 169*5 = 845
bounding boxes. When we apply a threshold of 30% or more, we get 3 bounding boxes as
shown in Figure 5-14.

Figure 5-14 The YOLO process divides the region into SxS grids. Each grid predicts five bounding boxes,
and based on the threshold setting which is 30% here, we get the final three bounding boxes; the image has been
taken from the original paper

So, YOLO looks at the image only once but in a clever manner. It is a very fast algorithm for
real-time processing. To quote from the original paper:
1. YOLO is refreshingly simple.
2. YOLO is extremely fast. Since we frame detection as a regression problem we don’t
need a complex pipeline. We simply run our Neural Network on a new image at test
time to predict detections. Our base network runs at 45 frames per second with no
batch processing on a Titan X GPU and a fast version runs at more than 150 fps. This
means we can process streaming video in real-time with less than 25 milliseconds of
latency. Furthermore, YOLO achieves more than twice the mean average precision of
other real-time systems.
3. YOLO reasons globally about the image when making predictions. Unlike sliding
window and region proposal-based techniques, YOLO sees the entire image during
training and test time so it implicitly encodes contextual information about classes as
well as their appearance.
4. YOLO learns generalizable representations of objects. When trained on natural
images and tested on artwork, YOLO outperforms top detection methods like DPM
and R-CNN by a wide margin. Since YOLO is highly generalizable it is less likely to
break down when applied to new domains or unexpected inputs.
There are a few challenges with YOLO too. It suffers from high localization error.
Moreover, since each of the grid cells predicts only two boxes and can have only one class
as the output, YOLO can predict only a limited number of nearby objects. It suffers from
a problem of low recall too. And hence in the next version of YOLOv2 and YOLOv3,
these issues were addressed.
YOLO is one of the most widely used object detection solutions. Its uniqueness lies in
its simplicity and speed.

Questions:

Part-A
1. What is the concept of anchor boxes and non-max suppression?
To generate the final object detections, tiled anchor boxes that belong to the
background class are removed, and the remaining ones are filtered by their confidence
score. Anchor boxes with the greatest confidence score are selected using nonmaximum
suppression (NMS).

2. How are bounding boxes important for object detection?

In the context of digital image processing, the bounding box denotes the border's
coordinates on the X and Y axes that enclose an image. They are used to identify a target
and serve as a reference for object detection and generate a collision box for the object.

3. How are R-CNN , Fast R-CNN and Faster R-CNN different and what are
the improvements?

R-CNN Fast R-CNN Faster R-CNN

region proposals Selective Selective search Region proposal network


method search

Prediction timing 40-50 sec 2 seconds 0.2 seconds

computation High High computation time Low computation time


computation
time
The mAP on 58.5 66.9 (when trained with 69.9(when trained with
Pascal VOC 2007 VOC 2007 only) VOC 2007 only)
test dataset(%) 70.0 (when trained with
VOC 2007 and 2012
both)

The mAP on Pascal 53.3 65.7 (when trained 67.0(when trained with
VOC 2012 test dataset with VOC 2012 VOC 2012 only)
(%) only) 70.4 (when trained with
68.4 (when trained with
VOC 2007 and 2012 both)
VOC 2007 and 2012
75.9(when trained with
both)
VOC 2007 and 2012 and
COCO)

4. What is IoU?
IoU calculates intersection over the union of the two bounding boxes, the bounding box
of the ground truth and the predicted bounding box.

5. What are the metrics used for object detection?

mAP (mean Average precision) is a popular metric in measuring the accuracy of object
detectors. Average precision calculates the average precision value for recall value over 0
to 1.

6. What is NMS?

Non-Max Suppression (NMS) is a technique used in many computer vision


object detection algorithms. It is a class of algorithms to select one bounding box out
of many overlapping bounding boxes for a single class. NMS implementation:

a) Sort the prediction confidence scores in decreasing order.

b) Start from the top scores, ignore any current prediction if we find any previous
predictions that have the same class and IoU > Threshold(generally we use 0.5) with
the current prediction.
c) Repeat the above step until all predictions are checked.

7. What is the loss function in YOLO?

YOLO uses a sum of squared error between the predictions and the ground truth to
calculate the loss. The loss function composes of:
• The Classification loss.
• The Localization loss (errors between the predicted boundary
box and the ground truth).
• The Confidence loss (the objectness of the box).

8. What is the advantage of two-stage methods?

In two-stage methods like R-CNN, they first predict a few candidate


object locations and then use a convolutional neural network to classify each
of these candidate object locations as one of the classes or as background.

9. What is FPN?

Feature Pyramid Network (FPN) is a feature extractor designed with a


feature pyramid concept to improve accuracy and speed. Images are first to
pass through the CNN pathway, yielding semantically rich final layers.
Then to regain better resolution, it creates a top-down pathway by
upsampling this feature map. While the top-down pathway helps detect
objects of varying sizes, spatial positions may be skewed. Lateral
connections are added between the original feature maps and the
corresponding reconstructed layers to improve object localization. It
currently provides one of the leading ways to detect objects at multiple
scales, and YOLOv3, Faster R-CNN were build up with this technique.

10. Why do we use data augmentation?

Data augmentation is a technique for synthesizing new data by


modifying existing data in such a way that the target is not changed, or it is
changed in a known way. Data augmentation is important in improving
accuracy. Augment data techniques like flipping, cropping, add noise, and
color distortion.

11. What is the advantage of SDD over Faster R-CNN?

SSD speeds up the process by removing the need for the


region proposal network(RPN) used in Faster R-CNN.
Part – B
1. Explain about Object detection and various Object detection methods.
2. Elaborate about Deep Learning framework for Object detection.
3. Explain about bounding box approach.
4. Discuss about Intersection over Union (IoU).
5. Elaborate about Deep Learning Architectures of R-CNN.
6. Discuss about Faster R-CNN.
7. Discuss about You Only Look Once (YOLO), Salient features, Loss Functions.
8. Illustrate and explain about YOLO architectures.

You might also like