KENDRIYA VIDYALAYA, EMBASSY OF INDIA, KATHMANDU, NEPAL
Computer Vision (2025-26)
REVISION NOTES
Introduction to Computer Vision
Definition: Computer Vision (CV) is a field within Artificial Intelligence (AI) that enables
computers and systems to derive meaningful information from digital images, videos, and
other visual inputs. It allows machines to process and analyze visual data to simulate human
sight. By using algorithms and machine learning models, CV applications can detect objects,
recognize patterns, and make decisions based on the visual input provided.
Example – Emoji Scavenger Hunt: Imagine playing a game where a machine shows you an
emoji and asks you to find a real-life object that matches it. In the “Emoji Scavenger Hunt”
game, the computer uses its “vision” to detect the objects you show in front of your camera
and check if they match the emoji. This simulates how CV enables machines to identify
objects from real-world environments using camera input.
How It Works: Computer Vision uses advanced algorithms to interpret visual data. It breaks
down images into pixels, processes them using machine learning techniques, and identifies
patterns, shapes, or objects by comparing them with its dataset.
Computer Vision vs. Image Processing
Feature Computer Vision Image Processing
Main To extract meaningful information from To process raw input images to enhance
Goal images/videos and understand them to make them or prepare them for other tasks.
predictions.
Scope A superset of Image Processing. A subset of Computer Vision.
Examples Object detection, handwriting recognition. Rescaling images, correcting brightness,
changing tones.
Applications of Computer Vision
Over the years, CV has evolved to become a crucial part of various
industries, with applications that have transformed sectors ranging from
retail to healthcare. Here are some real-world applications of CV:
a. Facial Recognition
Definition: Facial recognition systems identify or verify a person’s
identity using their facial features.
Applications:
Smart Homes & Cities: CV plays a critical role in
enhancing security. In smart homes, facial recognition
technology can be used to control access, allowing only
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 1 -
registered individuals inside. Similarly, smart city cameras can recognize and track
people in public spaces for security purposes.
Attendance Systems: Schools and workplaces use facial recognition for automated
attendance marking.
Example: Schools can track student attendance automatically by scanning students’ faces
upon entry.
b. Face Filters in Social Media
Definition: Face filters are used to apply augmented reality
(AR) effects to users’ faces in apps like Instagram and
Snapchat.
How It Works: Computer vision algorithms detect and map
facial features in real-time. Using this data, the system
overlays digital filters that enhance or alter the appearance of
the face.
Example: When you apply a dog filter on Snapchat, CV
algorithms track the eyes, mouth, and nose, allowing the filter
to adjust dynamically as you move.
c. Google’s Search by Image
Definition: Google’s “Search by Image” feature
uses computer vision to allow users to upload an
image instead of typing keywords, and Google
returns relevant search results based on that
image.
How It Works: The CV system analyzes
features like colors, shapes, and patterns of the
uploaded image, compares them to images in its
database, and displays matching results.
Example: If you upload a picture of a landmark, Google will identify it and provide detailed
information about the place, including its history and location.
d. Computer Vision in Retail
Customer Behavior Tracking: Retailers use CV to
track customers’ movements within stores. Cameras
and CV algorithms analyze how people navigate
through aisles, which helps in optimizing store
layouts.
Inventory Management: Cameras monitor stock
levels on shelves, and CV algorithms provide real-
time analysis of which products need restocking.
Example: Amazon Go stores use computer vision to
create a cashier-less shopping experience. Shoppers
can pick items off the shelf, and CV systems
automatically detect what they’ve selected, charge
their account, and let them walk out without checking
out manually.
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 2 -
e. Self-Driving Cars
Definition: Autonomous vehicles rely heavily on
computer vision to interpret the surrounding
environment, helping the car navigate safely
without human intervention.
Key Tasks: CV enables self-driving cars to detect
objects like other cars, pedestrians, road signs, and
obstacles. It also assists in lane detection and route
navigation.
Example: Tesla’s autopilot uses computer vision
to detect nearby vehicles and ensure lane accuracy,
adjust speed, and manage traffic conditions.
f. Medical Imaging
Definition: CV is revolutionizing healthcare by aiding in the analysis of medical images such
as X-rays, MRIs, and CT scans.
How It Works: The technology helps to
identify abnormalities and diseases by
converting 2D scans into detailed 3D
models, offering better insights for
diagnosis.
Example: AI-powered systems can detect
tumors or fractures from medical images
faster and sometimes more accurately than
human radiologists, providing early
diagnosis and better treatment outcomes.
g. Google Translate App (Augmented Reality)
Definition: By using CV combined with augmented reality
(AR), Google Translate allows users to point their phone
cameras at foreign text and receive a real-time translation
overlay.
How It Works: Optical character recognition (OCR) detects
the foreign words, while AR translates and overlays the text in
the user’s preferred language.
Example: If you’re traveling abroad and come across a sign in
a language you don’t understand, pointing your camera at it
will display the translated text almost instantly on your screen.
COMPUTER VISION TASKS
The tasks in computer vision are performed to extract information from an input image.
For Single Objects:
o Classification: Assigning an input image a single label from a fixed set of categories.
o Classification + Localization: Identifying both what object is present and its location
within the image. This is used only for single objects.
For Multiple Objects:
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 3 -
o Object Detection: Finding instances of real-world objects in images or videos, such
as faces or bicycles. It involves both classification and localization for multiple
objects.
o Instance Segmentation: Detecting objects, giving them a category, and then
assigning a label to each pixel based on that category.
BASICS OF IMAGES IN COMPUTER VISION
Pixels: The smallest unit of information that makes up a digital picture. They are arranged in
a 2-dimensional grid and represent the image. The more pixels an image has, the more it
resembles the original.
Resolution: The number of pixels in an image. It can be expressed as width by height (e.g.,
1280×1024) or as a single number in megapixels (a megapixel is a million pixels).
Pixel Value: A value that describes a pixel's brightness and/or color.
o In a byte image, the value is an 8-bit integer, giving a range of 0 to 255.
o 0 is typically black (no color) and 255 is white (full color).
Grayscale Images: Images with shades of gray, from black (0) to white (255). Each pixel is
one byte and the image has a single 2D array of pixels.
RGB Images: Colored images made up of three primary colors: Red, Green, and Blue.
o Each pixel in an RGB image has a set of three values, one for each color channel (R,
G, B), ranging from 0 to 255.
o These three values combine to form the complete color of the pixel.
Image Features & Convolution
Definition: Features are essential visual elements in an image that help in recognizing or
categorizing objects.
Key Features:
Edges: Boundaries between different regions in an image.
Corners: Points where two edges meet.
Blobs: Regions that differ in properties such as color or intensity from surrounding
areas.
Example: In facial recognition, detecting key features like eyes, nose, and mouth edges is
crucial for identification.
Convolution: A mathematical operation that is fundamental to many image processing
operators. It involves the element-wise multiplication of an image array and another array called
the
Kernel: A matrix that is slid across an image and multiplied with the input to enhance the output
in a desirable way, such as applying filters. Convolution is used in Convolutional Neural
Networks (CNNs) to extract image features.
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 4 -
Convolutional Neural Networks (CNN)
A Convolutional Neural Network (CNN) is a Deep Learning algorithm which can take in an input
image, assign importance (learnable weights and biases) to various aspects/objects in the image and
be able to differentiate one from the other.
CNNs are a specialized class of deep neural networks designed to process and analyze visual data.
They are highly effective in tasks such as image classification, object detection, and image
segmentation.
Structure of a CNN:
1. Convolution Layer: The first layer in a CNN where filters (kernels) scan the input
image to extract features like edges, colors, and textures.
Example: If you input a picture of a cat, the convolution layer extracts features
like the shape of the cat’s eyes, ears, and fur pattern.
2. ReLU Layer (Rectified Linear Unit): This activation function removes negative
values from the feature maps, introducing non-linearity.
3. Pooling Layer: Reduces the dimensionality of the feature maps by selecting the most
important information (e.g., through Max Pooling).
Example: Max Pooling selects the brightest or most prominent feature in a
given region, allowing the network to focus on key details.
4. Fully Connected Layer: Flattens the input and uses it for classification. The
flattened vector is used to assign labels to the input image.
Example: After feature extraction, the fully connected layer identifies whether
the input image is a cat or a dog based on the probability distribution across
labels.
Convolution Operation: Convolution is the core operation in CNNs. A small matrix
(kernel) is slid across the image and multiplies pixel values to detect features such as edges.
The convolution output is a feature map, highlighting specific patterns in the image.
Example: Applying an edge-detection filter on an image will highlight the boundaries of
objects, such as outlining a building’s edges in a photograph.
Prepared by: M. S. KumarSwamy, TGT(Maths) Page - 5 -