AL701 – Computer Vision COMPLETE NOTES (All Units +
Images Included)
UNIT I – INTRODUCTION TO COMPUTER VISION
Computer Vision is a field of Artificial Intelligence that deals with the extraction, analysis,
and understanding of useful information from images and videos. It enables machines to
interpret visual data like humans. The main goals include object detection, classification,
segmentation, and scene understanding.
Diagram: Image as f(x,y) – 2D Intensity Function
Images are represented as a 2D function f(x,y) where each pixel holds intensity values.
Types include binary, grayscale, RGB, and colored images. Image processing involves
modifying images, while Computer Vision focuses on understanding them. Basic image
operations such as resizing, cropping, rotating, contrast enhancement, and bitwise
operations help prepare images for analysis.
UNIT II – BINARY IMAGE PROCESSING
Binary image processing converts grayscale images into two-level binary images using
thresholding. Techniques include global thresholding, Otsu’s optimal thresholding, and
adaptive thresholding. Morphological operations such as erosion, dilation, opening, and
closing help refine shapes, remove noise, and extract meaningful structures.
Diagram: Morphological Operations – Erosion & Dilation
Connected Component Analysis (CCA) labels distinct objects using 4-connectivity or
8-connectivity. Contour analysis extracts shape boundaries, useful for measuring area,
perimeter, and shape classification.
UNIT III – COLOR SPACES & IMAGE ENHANCEMENT
Color spaces provide different ways to represent color data. RGB is device-dependent,
whereas HSV, LAB, and YCbCr offer better segmentation and illumination invariance.
Histogram Equalization enhances contrast by redistributing intensity values.
Diagram: RGB to HSV Conversion Flow
CLAHE (Contrast Limited Adaptive Histogram Equalization) improves local contrast while
preventing noise amplification. Filtering using kernels such as box, Gaussian, and median
filters helps smooth images and remove noise. Convolution is the core mathematical
operation.
UNIT IV – GRADIENTS, EDGE DETECTION, SEGMENTATION,
RECOGNITION
Image gradients represent intensity changes. First-order derivative filters like Sobel,
Prewitt, and Roberts detect edges, while Laplacian is a second-order operator for sharper
edges. Canny Edge Detector is the most accurate multi-stage detector.
Diagram: Canny Edge Detection Pipeline
Segmentation techniques divide an image into meaningful regions. Major approaches
include thresholding, region growing, K-means clustering, watershed algorithm, and deep
learning-based segmentation. Image classification uses CNN architectures such as VGG,
ResNet, and MobileNet. Object detection uses YOLO, SSD, and Faster R-CNN for
real-time detection.
UNIT V – COMPUTER VISION APPLICATIONS
Computer Vision applications include gesture recognition, motion estimation, object
tracking, face detection, and deep-learning based perception. Motion estimation uses
optical flow, block matching, and feature tracking, while object tracking uses algorithms
like KCF, Camshift, Deep SORT, and Kalman filter. Face detection uses Haar cascades
and deep learning models.
Diagram: Face Detection Pipeline
The OpenCV DNN module runs deep learning models such as YOLO, SSD, and
MobileNet for real-time computer vision tasks. These applications are widely used in
autonomous driving, robotics, augmented reality, and surveillance systems.