G.D.
GOENKA PUBLIC SCHOOL
Shivpuri Link Road, Gwalior
CLASS – XII Artificial Intelligence
UNIT -3 MAKING MACHINE SEES
What is Computer Vision?
Computer Vision (CV) is a part of Artificial Intelligence (AI) that helps machines see, understand,
and analyze images and videos—just like humans do.
It allows computers to make decisions or give suggestions based on what they see.
How is it Similar to Human Vision?
Just like humans use eyes and brain to see and understand, machines use:
o Cameras (like eyes)
o Algorithms & AI models (like the brain)
What Does CV Do?
CV helps machines to:
Detect objects (like cars, faces, or animals)
Classify images (e.g., cat or dog)
Recognize faces
Find defects in products in factories
Monitor roads, buildings, and machines in real time
Why is Computer Vision Useful?
Fast: Much quicker than humans
Accurate: Less chance of error
Works Non-stop: Can run 24/7
Objective: No personal bias
Scalable: Can handle huge amounts of data
Deep Learning in CV:
CV uses deep learning models to become smarter and more accurate.
These models are so advanced that in some tasks (like face recognition), they perform even better than
humans.
Sometimes also called Machine Vision.
WORKING OF COMPUTER VISION
What is Computer Vision?
Computer Vision is a branch of AI that focuses on helping computers understand images and videos.
It processes and analyzes digital images to recognize objects, patterns, or meaning just like a human
would.
1. Basics of Digital Images
A digital image is a picture stored in a computer using numbers.
It can be created by:
o Drawing in software (like MS Paint or Photoshop)
o Clicking a photo using a digital camera
o Scanning a physical photo
2. Interpretation of Image in Digital Form
When a computer processes an image, it perceives it as a collection of tiny squares known as pixels. Each pixel, short
for "picture element," represents a specific color value. These pixels collectively form the digital image. During the
process of digitization, an image is converted into a grid of pixels. The resolution of the image is determined by the
number of pixels it contains; the higher the resolution, the more detailed the image appears and the closer it resembles
the original scene.
What Are Pixels?
A pixel (short for "picture element") is the smallest square in a digital image.
Each pixel shows one color.
When combined, thousands or millions of pixels make up the whole image.
How Do Computers Read Images?
Computers don’t “see” images. They read numbers representing each pixel.
The process of turning an image into a grid of pixels is called digitization.
What is Resolution?
Resolution = Number of pixels in an image.
More pixels = clearer and more detailed image.
Black & White Images (Monochrome):
Each pixel has a value from 0 to 255:
o 0 = Black
o 255 = White
o Numbers in between = Shades of grey
3.3 COMPUTER VISION – PROCESS
Computer Vision typically follows 5 stages. Below are the first two stages explained clearly:
3.3.1 Image Acquisition
This is the first step where digital images or videos are captured. Image acquisition is the initial stage in
the process of computer vision, involving the capture of digital images or videos.
Images can be taken from:
o Digital cameras
o Scanners
o Design software (e.g., Photoshop)
o Medical equipment like MRI or CT scans
Key Points:
High-resolution devices = Clearer and more detailed images
Lighting and camera angle affect image quality
This stage provides the raw data for the entire Computer Vision system
Examples:
A camera taking a picture of a classroom
An MRI scanner capturing a brain image
3.3.2 Preprocessing
Preprocessing in computer vision aims to enhance the quality of the acquired image. Preprocessing
improves image quality before it is analyzed by AI or models.
Common Preprocessing Techniques:
1. Noise Reduction
o Removes unwanted disturbances like blurriness or random spots.
o Example: Cleaning grainy photos taken in the dark
2. Image Normalization
o Standardizes pixel values across images for consistency.
o Makes pixel values consistent (e.g., scale from 0–255 to 0–1)
o Helps the AI model learn better
3. Resizing/Cropping
o Changes image size or shape for uniformity.
o Changes the size or aspect ratio of the image to make it uniform.
o Example: Resize all images to 224×224 pixels.
4. Histogram Equalization
o Improves brightness and contrast.
o Adjusts the brightness and contrast of an image.
o Example: Enhances a dark image to show more details
Purpose of Preprocessing:
Clean up images (remove noise)
Highlight important features
Make all images consistent and uniform
3.3.3 Feature Extraction
What is it?
Feature Extraction means finding important patterns in an image that help the computer recognize or
understand it.
These features help in identifying objects, textures, colors, etc.
Common Feature Extraction Methods:
o Edge detection identifies the boundaries between different regions in an image where there is a significant
change in intensity
o Corner detection identifies points where two or more edges meet. These points are areas of high curvature
in an image, focused on identifying sharp changes in image gradients, which often correspond to corners
or junctions in objects.
o Texture analysis extracts features like smoothness, roughness, or repetition in an image
o Colour-based feature extraction quantifies colour distributions within the image, enabling
discrimination between different objects or regions based on their colour characteristics.
In Deep Learning:
Convolutional Neural Networks (CNNs) automatically extract features during training—no need to
manually define them.
3.3.4 Detection and Segmentation
Detection and segmentation are fundamental tasks in computer vision, focusing on identifying objects or regions
of interest within an image.
Single Object Tasks:
1. Classification:
o Tells what type of object is in the image.
o Example: Recognizing if the image has a cat or a dog.
2. Classification + Localization:
o Tells the object’s class and its location (using bounding boxes).
Multiple Object Tasks:
1. Object Detection:
o Finds multiple objects in an image.
o Draws bounding boxes around each one and labels them.
o Popular Algorithms:
R-CNN
YOLO (You Only Look Once)
SSD (Single Shot Detector)
2. Image Segmentation:
o Divides an image into regions by classifying each pixel.
o More detailed than object detection.
Types of Segmentation:
Semantic Segmentation:
o Labels all objects of the same type together.
o Example: All animals in one class, without telling which is which.
Instance Segmentation:
o Labels each object separately, even if they are of the same type.
o Example: Two dogs in the same image will be identified as Dog 1 and Dog 2.
o
3.3.5 High-Level Processing
Purpose:
This is the final stage where the computer understands and makes decisions based on the objects it
detected.
What It Does:
Recognizes objects and scenes
Understands relationships between objects
Analyzes context (e.g., a doctor is in an operating room)
Helps in decision-making for real-life uses like:
o Autonomous vehicles
o Medical diagnostics
o Smart surveillance
Summary: 5 Stages of Computer Vision Process
1. Image Acquisition – Capturing the image
2. Preprocessing – Cleaning and preparing the image
3. Feature Extraction – Identifying patterns and features
4. Detection/Segmentation – Finding and separating objects
5. High-Level Processing – Understanding and decision-making
3.4 Applications of Computer Vision
Computer Vision is already part of many everyday tools. Some key applications include:
1. Facial Recognition
o Used by apps like Facebook to detect and tag faces in photos.
2. Healthcare
o Detects diseases, tumours, or irregularities in medical images (like MRI scans).
3. Self-Driving Cars
o Helps cars understand surroundings, detect traffic signs, people, and other vehicles.
4. OCR (Optical Character Recognition)
o Converts images of text (printed or handwritten) into editable digital text.
5. Machine Inspection
o Detects faults or defects in manufactured products during quality checks.
6. 3D Model Building
o Builds 3D models from real-world objects; used in robotics, gaming, AR/VR.
7. Surveillance
o CCTV cameras analyze videos to spot suspicious behavior and ensure safety.
8. Fingerprint & Biometric Recognition
o Verifies user identity using fingerprint scans or facial features.
3.5 Challenges of Computer Vision
Even though CV is powerful, it faces several difficulties:
1. Reasoning and Interpretation
o CV must not just see but understand images, which requires complex logic and reasoning.
2. Image Acquisition Issues
o Factors like poor lighting, different camera angles, and crowded scenes make image capture
difficult.
3. Privacy and Security Concerns
o CV systems (like face recognition) can raise privacy issues and are often debated.
4. False or Duplicate Content
o Fake images/videos or data breaches can fool CV systems, leading to misinformation or security
risks.
3.6 The Future of Computer Vision
CV has grown from simple tasks to advanced systems that mimic human-level understanding.
Deep learning and large datasets have made this possible.
In the future, we may see:
o Smart healthcare tools that detect diseases early
o Immersive AR/VR experiences
o More intelligent, safe, and helpful AI tools
Vision Ahead:
If used ethically and with innovation, Computer Vision will positively transform industries and lives worldwide.