0% found this document useful (0 votes)

54 views6 pages

Yolo & Object Detection - Complete Note + Lab

summary of object detection using YOLO

Uploaded by

bisiriyuopeyemi2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views6 pages

Yolo & Object Detection - Complete Note + Lab

summary of object detection using YOLO

Uploaded by

bisiriyuopeyemi2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

YOLO & Object Detection — Complete Note + Lab

Goal: A self-contained note that starts from what object detection is, transitions into YOLO intuition and math,
explains how to implement it with CNNs, and finishes with lab-ready TensorFlow/Keras code you can run and
adapt.

1. What is Object Detection?

Object detection is the computer vision task of finding what objects are present in an image and where they
are located. For each detected object we want:

• a class label (what it is), and

• a bounding box (where it is) — typically represented as (x_min, y_min, x_max, y_max) or
(x_center, y_center, width, height) .

Related tasks

• Image classification — one label for the whole image.

• Object detection — multiple labeled boxes per image.
• Semantic segmentation — per-pixel class labels.
• Instance segmentation — segmentation masks per object.

Important evaluation metrics

Area(Bpred ∩Bgt )
• Intersection over Union (IoU) for a predicted box and ground-truth box: IoU = Area(Bpred ∪Bgt )
• Precision / Recall computed over detections.
• mAP (mean Average Precision) across classes at IoU thresholds (e.g., 0.5, or COCO-style multiple
thresholds).

2. The YOLO Family — Intuition and Evolution

YOLO (You Only Look Once) treats detection as a single regression problem from image pixels to bounding
boxes + class probabilities. Its core ideas:

• Divide the image into a grid of cells.

• Each grid cell is responsible for predicting objects whose center falls in that cell.
• The network predicts, for each cell, one or more bounding boxes and class scores — in one forward
pass.

Why YOLO? It is fast and end-to-end: a single network produces all detections, allowing real-time
performance.

1
Evolution highlights:

• YOLOv1 (2016) — grid-based prediction, simpler, used MSE loss for everything.
• YOLOv2 — introduced anchor boxes (priors) and dimension clustering (k-means) to find good
anchors.
• YOLOv3/v4 — multi-scale detection and improved backbones.
• Newer YOLOs (v5/…/v8) continue to refine architecture, training, and deployment.

3. Mathematical Formulation (YOLOv1-style, then anchors)

3.1 Grid and Output tensor

Let the image be resized to a fixed size (e.g., 448×448). Divide image into an S × S grid.

For each cell i (there are S 2 cells), the network predicts:

• for j = 1 … B bounding boxes: xij , yij , wij , hij , Cij

• x, y are offsets relative to the cell (often normalized between 0 and 1)
• w, h are relative to the image (or predicted as offsets from anchors — see below)
• C is the confidence (probability * IoU)
• class probabilities for the cell: pi (1..C)

So the output per cell has size B × 5 + C . The full tensor shape is (S, S, B × 5 + C).

3.2 Bounding box parameterization (anchors)

If not using anchors (YOLOv1), the network directly predicts x, y, w, h .

When anchors are used (common in YOLOv2+), each grid cell has pre-defined anchor boxes with sizes
(wa , ha ) . The network predicts offsets tx , ty , tw , th and we decode as:

bx = σ(tx ) + cx by = σ(ty ) + cy bw = wa ⋅ etw bh = ha ⋅ eth

• (cx , cy ) is the top-left corner coordinate of the grid cell (integer grid indices).
• σ(⋅) is the sigmoid function producing values in (0,1), ensuring centers remain inside the cell.
• Exponential on tw , th ensures positive widths/heights and models multiplicative offsets.

3.3 Confidence and class probability

• Network also predicts objectness/confidence score C ^ per predicted box.

^(c∣object) .
• Conditional class probabilities are predicted per grid cell: p
• Final per-box class score often computed as: score(c) = C^ × p^(c∣object).

2
4. Loss Function — Why It’s Special
YOLO's loss combines multiple objectives in one scalar. The canonical formulation (YOLOv1) uses Mean
Squared Error (MSE) with weights:

[\mathcal{L} = \lambda_{coord} \sum_{i=1}^{S^2} \sum_{j=1}^{B} \mathbb{1}{ij}^{obj} \left[(x}-\hat{x{ij})^2 +

(y)^2\right] \}-\hat{y}_{ij

• \lambda_{coord} \sum_{i=1}^{S^2} \sum_{j=1}^{B} \mathbb{1}{ij}^{obj} \left[(\sqrt{w}}-

\sqrt{\hat{w{ij}})^2 + (\sqrt{h)^2\right] \}}-\sqrt{\hat{h}_{ij}
• \sum_{i=1}^{S^2} \sum_{j=1}^{B} \mathbb{1}{ij}^{obj} (C)^2 \}-\hat{C}_{ij
• \lambda_{noobj} \sum_{i=1}^{S^2} \sum_{j=1}^{B} \mathbb{1}{ij}^{noobj} (C)^2 \}-\hat{C}_{ij
• \sum_{i=1}^{S^2} \mathbb{1}{i}^{obj} \sum_i(c))^2 ]}^{C} (p_i(c)-\hat{p

Meaning of terms:

obj
• 1ij = 1 if object is present in cell i and bounding box j is the responsible box (usually the predicted
box with highest IoU to the ground-truth). Otherwise 0.
• The ⋅ on widths/heights reduces the relative loss on large boxes (stabilizes regression).
• λcoord (e.g., 5) increases weight on localization.
• λnoobj (e.g., 0.5) decreases penalty for predicting objects where there are none (helps with class
imbalance: few objects vs many background cells).

Differences from usual ANN losses

• Classification uses cross-entropy typically, but YOLOv1 used MSE on class probabilities. Later
versions moved toward log-loss (cross-entropy) for classes.
• YOLO mixes regression losses (MSE on continuous coordinates) with classification loss. This union
demands careful weighting.
• YOLO explicitly separates losses for object-present and no-object cells with different weights.

5. Anchor Generation (k-means with IoU)

Good anchor sizes help convergence. The standard approach:

1. Extract all ground-truth boxes from dataset, convert to normalized (w, h) (divided by image width/
height).
2. Run k-means clustering but with distance = 1 − IoU (box, centroid) instead of Euclidean.
3. The cluster centers are the normalized anchors.

This produces anchors tailored to dataset shape distributions (vehicles vs people produce different
anchors).

3
6. Inference: Decoding Predictions & Non-Max Suppression (NMS)
Steps:

1. Forward pass → get tensor shape (S, S, B × 5 + C) .

2. For each cell and each anchor (box) decode bx , by , bw , bh into absolute image coordinates.
3. Compute class scores: score(c) = conf × p(c) .
4. Filter boxes by a confidence threshold (e.g., 0.25).
5. For each class separately, run NMS: keep highest-score box, remove boxes with IoU > threshold
(e.g., 0.5).

TensorFlow helper: [Link].non_max_suppression(boxes, scores, max_output_size,

iou_threshold) .

7. Implementation Notes & Practical Tips

• Use sigmoid for tx, ty and objectness to bound outputs.
• Use exponential for width/height offsets when using anchors.
• Normalize bounding boxes to image size for stable training.
• Use multiple scales (detection at different feature map sizes) for better small-object detection (used
by YOLOv3+).
• For loss, start with YOLOv1 weights (λcoord = 5, λnoobj = 0.5) ; tune later.

8. Lab Code — YOLOv1-style skeleton with anchors & loss

(TensorFlow/Keras)
This lab is a working, runnable skeleton. It uses a simplified backbone and a reasonably complete loss +
decode + NMS pipeline. Replace the dataset-loading parts with your own annotated data.

Note: This is pedagogical — real production YOLOs use more sophisticated backbones and
training tricks.

# yolo_lab.py (run with Python 3.8+, TensorFlow 2.x)

import numpy as np
import tensorflow as tf
from [Link] import layers, Model

# --------------------------
# Utilities: IoU, decode, NMS
# --------------------------

4
def box_iou(boxes1, boxes2):
# boxes: [...,4] as (x_center, y_center, w, h) absolute coords
# convert to x1y1x2y2
def to_x1y1x2y2(b):
x, y, w, h = b[...,0], b[...,1], b[...,2], b[...,3]
x1 = x - w/2; y1 = y - h/2; x2 = x + w/2; y2 = y + h/2
return [Link]([x1,y1,x2,y2], axis=-1)

b1 = to_x1y1x2y2(boxes1)
b2 = to_x1y1x2y2(boxes2)

# intersection
ix1 = [Link](b1[...,0], b2[...,0])
iy1 = [Link](b1[...,1], b2[...,1])
ix2 = [Link](b1[...,2], b2[...,2])
iy2 = [Link](b1[...,3], b2[...,3])
iw = [Link](ix2 - ix1, 0)
ih = [Link](iy2 - iy1, 0)
inter = iw * ih

area1 = (b1[...,2]-b1[...,0]) * (b1[...,3]-b1[...,1])

area2 = (b2[...,2]-b2[...,0]) * (b2[...,3]-b2[...,1])
union = area1 + area2 - inter
iou = inter / (union + 1e-9)
return iou

# --------------------------
# Anchor calculation (k-means IoU)
# --------------------------

def kmeans_anchors(boxes_wh, k=3, max_iter=100):

# boxes_wh: N x 2 normalized widths/heights
N = boxes_wh.shape[0]
[Link](0)
centroids = boxes_wh[[Link](N, k, replace=False)]

for _ in range(max_iter):
# compute 1 - IoU as distance
dists = [Link]((N, k))
for i in range(N):
for j in range(k):
w1,h1 = boxes_wh[i]
w2,h2 = centroids[j]
inter_w = min(w1, w2)
inter_h = min(h1, h2)
inter = inter_w * inter_h
union = w1*h1 + w2*h2 - inter
iou = inter / union

5
dists[i,j] = 1 - iou
nearest = [Link](dists, axis=1)
new_centroids = [Link]([boxes_wh[nearest==j].mean(axis=0) if
[Link](nearest==j) else centroids[j] for j in range(k)])
if [Link](new_centroids, centroids):
break
centroids = new_centroids
return centroids

# --------------------------
# Model: simple YOLOv1-style
# --------------------------

S = 7 # grid
B = 2 # boxes per cell
C = 3 # classes (example)
INPUT_SHAPE = (448,448,3)

def build_simple_yolo(input_shape=INPUT_SHAPE, S=7, B=2, C=3):

inputs = [Link](shape=input_shape)
x = layers.Conv2D(64, 7, strides=2, padding='same', activation='relu')
(inputs)
x = layers.MaxPool2D(2)(x)
x = layers.Conv2D(192, 3, padding='same', activation='relu')(x)
x = layers.MaxPool2D(2)(x)
x = layers.Conv2D(128,1,activation='relu')(x)
x = layers.Conv2D(256,3,padding='same',activation='relu')(x)
x = layers.Conv2D(256,1,activation='relu')(x)
x = layers.Conv2D(512,3,padding='same',activation='relu')(x)
x = layers.MaxPool2D(2)(x)
x = layers.Conv2D(1024,3,padding='same',activation='relu')(x)
x = layers.Conv2D

YOLO
No ratings yet
YOLO
43 pages
Real-Time Face Detection Based On YOLO
No ratings yet
Real-Time Face Detection Based On YOLO
4 pages
YOLO
No ratings yet
YOLO
7 pages
YOLO
No ratings yet
YOLO
31 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
YOLO: Real-Time Object Detection System
No ratings yet
YOLO: Real-Time Object Detection System
10 pages
YOLO Evolution: V1 to V3 Analysis
No ratings yet
YOLO Evolution: V1 to V3 Analysis
6 pages
YOLO Algorithm for Object Detection
No ratings yet
YOLO Algorithm for Object Detection
32 pages
Yolo
No ratings yet
Yolo
20 pages
Object Detection With YOLO
No ratings yet
Object Detection With YOLO
18 pages
YOLO: Real-Time Object Detection
No ratings yet
YOLO: Real-Time Object Detection
10 pages
CI Object Detection and Localization
No ratings yet
CI Object Detection and Localization
27 pages
YOLO Beginner Explanation
No ratings yet
YOLO Beginner Explanation
2 pages
Yolo
No ratings yet
Yolo
10 pages
Seminar 201202175023
No ratings yet
Seminar 201202175023
16 pages
YOLO Object Detection Algorithm Overview
No ratings yet
YOLO Object Detection Algorithm Overview
46 pages
YOLO: For Computer Vision Experts
No ratings yet
YOLO: For Computer Vision Experts
3 pages
YOLO Object Detection Explained - A Beginner's Guide - DataCamp
No ratings yet
YOLO Object Detection Explained - A Beginner's Guide - DataCamp
14 pages
YOLO Object Detection Report
No ratings yet
YOLO Object Detection Report
42 pages
Ex No 06
No ratings yet
Ex No 06
4 pages
Yolopdf
No ratings yet
Yolopdf
10 pages
Yolo India
No ratings yet
Yolo India
14 pages
YOLO
No ratings yet
YOLO
14 pages
Week 05
No ratings yet
Week 05
38 pages
Understanding YOLOv2 Architecture and Benefits
No ratings yet
Understanding YOLOv2 Architecture and Benefits
38 pages
Yolo U1
No ratings yet
Yolo U1
21 pages
YOLO: Efficient Object Detection Guide
No ratings yet
YOLO: Efficient Object Detection Guide
19 pages
YOLO9000: Real-Time Object Detection
No ratings yet
YOLO9000: Real-Time Object Detection
9 pages
Yolo 220209212833
No ratings yet
Yolo 220209212833
17 pages
Overview of YOLO Object Detection
No ratings yet
Overview of YOLO Object Detection
7 pages
YOLO
No ratings yet
YOLO
10 pages
YOLOv3: Enhanced Object Detection Model
No ratings yet
YOLOv3: Enhanced Object Detection Model
6 pages
Yolov 3
No ratings yet
Yolov 3
42 pages
Project
100% (1)
Project
30 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Yolo
No ratings yet
Yolo
32 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
Signature Object Detection Based On YOLOv3
No ratings yet
Signature Object Detection Based On YOLOv3
4 pages
Finish Presentation
No ratings yet
Finish Presentation
56 pages
Object Detection Week 2 YOLOv1-YOLOv8
100% (1)
Object Detection Week 2 YOLOv1-YOLOv8
264 pages
The Revolutionary YOLO
No ratings yet
The Revolutionary YOLO
5 pages
YOLOv 5
No ratings yet
YOLOv 5
10 pages
YOLO: Fast Object Detection Guide
No ratings yet
YOLO: Fast Object Detection Guide
13 pages
Yolo Algorithm
No ratings yet
Yolo Algorithm
37 pages
YOLOv3 for Object Detection Evaluation
No ratings yet
YOLOv3 for Object Detection Evaluation
6 pages
Object Detection Method Based On Yolov3 Using Deep Learning Networks
No ratings yet
Object Detection Method Based On Yolov3 Using Deep Learning Networks
4 pages
Object Detection with YOLO
No ratings yet
Object Detection with YOLO
3 pages
YOLOv3 for Object Detection
No ratings yet
YOLOv3 for Object Detection
6 pages
YOLO Algorithm: Real-Time Detection
No ratings yet
YOLO Algorithm: Real-Time Detection
8 pages
Unit 6
No ratings yet
Unit 6
4 pages
YOLO Algorithm Overview and Usage
No ratings yet
YOLO Algorithm Overview and Usage
4 pages
YOLO Algorithm for Object Detection
No ratings yet
YOLO Algorithm for Object Detection
9 pages
Mastering All YOLO Models From YOLOv1 To YOLO
100% (1)
Mastering All YOLO Models From YOLOv1 To YOLO
58 pages
Object Detection On Thermal Images - by Enes Özipek - The Startup - Medium
No ratings yet
Object Detection On Thermal Images - by Enes Özipek - The Startup - Medium
21 pages
Simple ATS Diagram
100% (4)
Simple ATS Diagram
4 pages
SY8205FCC
No ratings yet
SY8205FCC
11 pages
Current Control Techniques for SAPF
No ratings yet
Current Control Techniques for SAPF
5 pages
Rotunda NGS+ DBSUsersManual ENG
100% (1)
Rotunda NGS+ DBSUsersManual ENG
37 pages
SRAM 2008 Torpedo
No ratings yet
SRAM 2008 Torpedo
64 pages
L Scan® 500p
No ratings yet
L Scan® 500p
2 pages
IRFZ48N
No ratings yet
IRFZ48N
8 pages
Naval Architects Notebook - Vol 2 - David P Martin
100% (1)
Naval Architects Notebook - Vol 2 - David P Martin
102 pages
Sample C.V.
No ratings yet
Sample C.V.
4 pages
This Study Resource Was: Example 11.4-1. A Mixture of Benzene-Toluene Is To
No ratings yet
This Study Resource Was: Example 11.4-1. A Mixture of Benzene-Toluene Is To
8 pages
Bunk Beds: Start With The Panels
No ratings yet
Bunk Beds: Start With The Panels
5 pages
Department of Mechanical Engineering Full Work
No ratings yet
Department of Mechanical Engineering Full Work
25 pages
Fire Watch Standby Man Training Presentation (1) - 110028
100% (2)
Fire Watch Standby Man Training Presentation (1) - 110028
54 pages
Main Transformer Protection Panel End 1
No ratings yet
Main Transformer Protection Panel End 1
54 pages
M365 BMSGuide
No ratings yet
M365 BMSGuide
12 pages
Quotation for MAINI Dock Leveler 7810 FH
No ratings yet
Quotation for MAINI Dock Leveler 7810 FH
3 pages
Density Based Traffic Light Control System Report (Rough)
No ratings yet
Density Based Traffic Light Control System Report (Rough)
19 pages
Catia Design
No ratings yet
Catia Design
9 pages
WebSphere Application Server 8.5 Security Hardening
No ratings yet
WebSphere Application Server 8.5 Security Hardening
111 pages
Metric Thread Calculator
No ratings yet
Metric Thread Calculator
1 page
Caterpillar Fuel Tank Parts List
No ratings yet
Caterpillar Fuel Tank Parts List
2 pages
Ace EV - Brochure
No ratings yet
Ace EV - Brochure
2 pages
Continental GT 2004-2010
100% (8)
Continental GT 2004-2010
5,655 pages
HW #5 - OPSDAL - Unit Commitment
No ratings yet
HW #5 - OPSDAL - Unit Commitment
9 pages
Final Year Research Project & Innovation: Semi Auto Pesticide Sprayer For Greenhouse
100% (2)
Final Year Research Project & Innovation: Semi Auto Pesticide Sprayer For Greenhouse
67 pages
A Barrel Vault
No ratings yet
A Barrel Vault
3 pages
Yantra
No ratings yet
Yantra
23 pages
Boeing 737 Oil Tank Maintenance
No ratings yet
Boeing 737 Oil Tank Maintenance
4 pages
Frankfurt Public Transport Guide
No ratings yet
Frankfurt Public Transport Guide
21 pages
QSV Gas Engine Control Training
100% (2)
QSV Gas Engine Control Training
38 pages

Yolo & Object Detection - Complete Note + Lab

Uploaded by

Yolo & Object Detection - Complete Note + Lab

Uploaded by

YOLO & Object Detection — Complete Note + Lab

1. What is Object Detection?

• a class label (what it is), and

• Image classification — one label for the whole image.

Important evaluation metrics

2. The YOLO Family — Intuition and Evolution

• Divide the image into a grid of cells.

3. Mathematical Formulation (YOLOv1-style, then anchors)

3.1 Grid and Output tensor

For each cell i (there are S 2 cells), the network predicts:

• for j = 1 … B bounding boxes: xij , yij , wij , hij , Cij

3.2 Bounding box parameterization (anchors)

If not using anchors (YOLOv1), the network directly predicts x, y, w, h .

bx = σ(tx ) + cx by = σ(ty ) + cy bw = wa ⋅ etw bh = ha ⋅ eth

3.3 Confidence and class probability

• Network also predicts objectness/confidence score C ^ per predicted box.

[\mathcal{L} = \lambda_{coord} \sum_{i=1}^{S^2} \sum_{j=1}^{B} \mathbb{1}{ij}^{obj} \left[(x}-\hat{x{ij})^2 +

• \lambda_{coord} \sum_{i=1}^{S^2} \sum_{j=1}^{B} \mathbb{1}{ij}^{obj} \left[(\sqrt{w}}-

Differences from usual ANN losses

5. Anchor Generation (k-means with IoU)

1. Forward pass → get tensor shape (S, S, B × 5 + C) .

TensorFlow helper: [Link].non_max_suppression(boxes, scores, max_output_size,

7. Implementation Notes & Practical Tips

8. Lab Code — YOLOv1-style skeleton with anchors & loss

# yolo_lab.py (run with Python 3.8+, TensorFlow 2.x)

area1 = (b1[...,2]-b1[...,0]) * (b1[...,3]-b1[...,1])

def kmeans_anchors(boxes_wh, k=3, max_iter=100):

def build_simple_yolo(input_shape=INPUT_SHAPE, S=7, B=2, C=3):

You might also like