What Is a Vector in YOLO?
In YOLO, a vector is a structured list of numbers representing the output
prediction for each part (cell) of the image. This vector contains information
about:
1. Bounding box coordinates (x, y, width, height)
2. Objectness score (how likely there is an object)
3. Class probabilities (e.g., car, bus, person)
How YOLO Forms Vectors from an Image
Let’s go step-by-step:
1. Image is divided into a grid
For example:
An image of size 640x640 is divided into a S x S grid (e.g., 20x20 or 80x80
depending on model size).
2. Each grid cell predicts a vector
Each grid cell predicts a vector of length:
Vector Length=B×5+C\text{Vector Length} = B \times 5 +
CVector Length=B×5+C
Where:
B = number of bounding boxes (usually 3 in YOLOv5)
5 = 4 box coordinates (x, y, w, h) + 1 objectness score
C = number of classes (e.g., 80 for COCO dataset)
3. Example Vector
Assume:
B = 3 (3 bounding boxes)
C = 2 (classes: car, bus)
Then each cell predicts:
Vector=[x1,y1,w1,h1,obj1,x2,y2,w2,h2,obj2,x3,y3,w3,h3,obj3,class1score,class2
score]\text{Vector} = [x1, y1, w1, h1, obj1, x2, y2, w2, h2, obj2, x3, y3, w3, h3,
obj3, class1_score,
class2_score]Vector=[x1,y1,w1,h1,obj1,x2,y2,w2,h2,obj2,x3,y3,w3,h3,obj3,class
1score,class2score]
[0.5, 0.6, 0.4, 0.3, 0.98, 0.2, 0.1, 0.5, 0.6, 0.12, ... , 0.95, 0.05]
The first 15 values are box predictions
The last two values are confidence scores for car and bus
During Training
YOLO compares the predicted vectors to the ground truth vectors (from
labels) using loss functions:
Localization loss: for box coordinates
Confidence loss: for objectness
Classification loss: for class probabilities
During Inference
YOLO generates a big tensor (multi-dimensional array) of these vectors (1
per grid cell).
It filters them using:
o Objectness threshold
o Non-Max Suppression (NMS) to remove overlapping boxes
The result: final predicted bounding boxes with labels
Inside the Code
These vectors are formed and processed in:
models/yolo.py: defines how predictions are made
utils/general.py: for processing outputs
detect.py: uses these vectors to draw boxes
File/Folder Description
Dataset/ May contain instructions, notes, or helper scripts
related to downloading, formatting, or labeling
datasets.
data/ Usually contains images or label data, but in this
case, it seems an image (bus.jpg) was deleted —
maybe previously used for testing.
models/ Contains the YOLO model architecture files (.py files
like yolo.py, common.py, etc.) that define how the
neural network is structured.
runs/train/exp12/ Contains trained model weights (like best.pt) after
weights/ running train.py. These are used in detect.py for
inference.
test_images/ Folder with test images used during the inference
phase (detect.py).
utils/ Helper functions and utility scripts used by YOLO for
tasks like image preprocessing, plotting boxes, non-
max suppression, etc.
CONTRIBUTING.md Guidelines for people who want to contribute to the
repository.
Dockerfile Enables deployment or environment setup using
Docker. Helps ensure consistent setups for
training/testing across systems.
LICENSE License under which the code is distributed (e.g.,
MIT, GPL).
README.md Main documentation file describing how to use the
YOLO project — setup instructions, training,
detection, etc.
dataset.yaml Configuration file defining paths to
training/validation data and class names. Used by
train.py.
detect.py Main script to run object detection using trained
weights (.pt files).
export.py Exports the YOLO model to different formats (like
ONNX, TorchScript) for deployment on mobile or
embedded systems.
hubconf.py Makes the model compatible with PyTorch Hub,
enabling easy loading with torch.hub.load().
requirements.txt Lists Python packages required to run the project
(e.g., torch, opencv-python, matplotlib). Install using
pip install -r requirements.txt.
train.py Main training script to train the YOLO model on
custom or default datasets.
tutorial.ipynb A Jupyter Notebook explaining or demonstrating how
to train and run detection with YOLO interactively.
val.py Used to evaluate/validate a trained YOLO model on
a dataset and calculate metrics like mAP (mean
Average Precision).