Experiment 1: Implementation of
Computer Vision Techniques
Title:
Implementation of basic computer vision techniques – loading, displaying, resizing,
cropping, and rotating images.
Objective:
1. To understand the basic image handling functions in computer vision.
2. To implement operations like loading, displaying, resizing, cropping, and rotating
images using Python and OpenCV.
3. To get familiar with image representation as a matrix of pixels.
Theory:
Computer Vision enables machines to interpret and process images.
• Images are stored as matrices of pixels. Each pixel contains intensity values
(0–255 for grayscale, three channels for RGB).
• OpenCV provides functions to manipulate these pixels easily.
Operations Explained:
• Loading & Displaying: Using cv2.imread() and cv2.imshow().
Resizing: Reducing or enlarging dimensions using cv2.resize().
• Cropping: Selecting a region of interest (ROI) by slicing pixel values.
Rotation: Rotating about the center using transformation matrices.
Block Diagram:
┌─────────────────┐
│ Input Image File│
└───────┬─────────┘
│
┌───────────▼───────────┐
│ Load Image (cv2) │
└───────────┬───────────┘
│
┌─────────────┼────────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌─────▼─────┐
│ Resize │ │ Cropping│ │ Rotation │
└────┬────┘ └────┬────┘ └─────┬─────┘
│ │ │
└─────────────┼────────────────┘
│
┌───────▼─────────┐
│ Display Results │
└─────────────────┘
Algorithm:
1. Start the program.
2. Import libraries (cv2, matplotlib, numpy).
3. Load image using cv2.imread().
4. Display original image.
5. Apply resize operation.
6. Apply crop operation.
7. Apply rotate operation.
8. Display final results.
9. End.
Flowchart:
┌───────────┐
│ Start │
└─────┬─────┘
│
┌─────▼─────┐
│ Load Image│
└─────┬─────┘
│
┌─────────▼─────────┐
│ Display Original │
└─────┬─────┬───────┘
│ │
┌────▼─┐ ┌─▼────┐
│Resize│ │Crop │
└──────┘ └──────┘
│
┌────▼────┐
│ Rotate │
└────┬────┘
│
┌────▼────┐
│ Display │
└────┬────┘
│
┌─▼───┐
│ End │
└─────┘
Code (Python with OpenCV):
import cv2
import matplotlib.pyplot as plt
# Step 1: Load the image image =
cv2.imread('sample.jpg')
# Convert BGR (OpenCV format) to RGB for matplotlib image_rgb
= cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Step 2: Display Original Image
plt.subplot(2, 2, 1)
plt.imshow(image_rgb) plt.title("Original
Image") plt.axis('off')
# Step 3: Resize Image
resized = cv2.resize(image_rgb, (200, 200))
plt.subplot(2, 2, 2) plt.imshow(resized)
plt.title("Resized Image") plt.axis('off')
# Step 4: Crop Image
cropped = image_rgb[50:200, 100:300]
plt.subplot(2, 2, 3)
plt.imshow(cropped) plt.title("Cropped
Image") plt.axis('off')
# Step 5: Rotate Image (h,
w) = image_rgb.shape[:2]
center = (w // 2, h // 2)
rotation_matrix = cv2.getRotationMatrix2D(center, 45, 1.0)
rotated = cv2.warpAffine(image_rgb, rotation_matrix, (w, h))
plt.subplot(2, 2, 4) plt.imshow(rotated) plt.title("Rotated
Image (45°)") plt.axis('off')
plt.show()
Sample Output (Illustration):
1. Original Image → Shows input file.
2. Resized Image → 200x200 scaled version.
3. Cropped Image → Selected region only.
4. Rotated Image → Rotated at 45°.
(In lab record, paste screenshots of each output here.)
Conclusion:
• Implemented basic computer vision techniques successfully.
• Learned how to represent an image as a pixel matrix.
• Applied resizing, cropping, and rotation using OpenCV.
• This forms the base for advanced vision tasks like object detection, classification,
and recognition.
Experiment 2: Implementation of Image
Arithmetic Operations
Objective:
To implement arithmetic operations such as addition, subtraction, bitwise AND, OR, and
NOT on images using Python.
Theory:
Image arithmetic is used in many applications of computer vision:
• Addition: Blend two images or increase brightness.
• Subtraction: Detect differences between two images (motion detection).
• Bitwise Operations: Perform logical operations (masking, ROI extraction).
Diagram (description): Block
diagram:
Image1 + Image2 → Addition
Image1 - Image2 → Subtraction
Image1 & Image2 → AND
Algorithm:
1. Import OpenCV.
2. Load two images of the same size.
3. Perform addition using cv2.add().
4. Perform subtraction using cv2.subtract().
5. Perform bitwise operations using cv2.bitwise_and, cv2.bitwise_or,
cv2.bitwise_not.
6. Display results.
Code:
import cv2
import matplotlib.pyplot as plt
# Load images
img1 = cv2.imread("image1.jpg") img2
= cv2.imread("image2.jpg")
img1 = cv2.resize(img1,
(300,300)) img2 = cv2.resize(img2,
(300,300))
# Arithmetic Operations add =
cv2.add(img1, img2) subtract =
cv2.subtract(img1, img2) bitwise_and =
cv2.bitwise_and(img1, img2) bitwise_or =
cv2.bitwise_or(img1, img2) bitwise_not =
cv2.bitwise_not(img1)
#
Display
titles = ["Image1", "Image2", "Addition", "Subtraction", "AND", "OR",
"NOT"]
images = [img1, img2, add, subtract, bitwise_and, bitwise_or, bitwise_not]
for i in range(7):
plt.subplot(2,4,i+1)
plt.imshow(cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB))
plt.title(titles[i]) plt.axis("off")
plt.show()
Output:
• Displays the results of arithmetic and bitwise operations.
Conclusion:
We implemented addition, subtraction, and logical operations on images, which form the base
of image blending, masking, and motion detection.
Experiment 3: Implementation of Image
Enhancement Techniques
Objective:
To implement image enhancement techniques such as logarithmic transformation, image
negation, histogram equalization, and contrast stretching.
Theory:
Image enhancement improves visual appearance or highlights important features.
• Log Transformation: Enhances darker regions.
• s = c log(1 + r)
• where, c is constant, s is the output intensity value and r is the input intensity value
• It maps a narrow range of low intensity values in the input into a wide range
of output levels.
• The opposite is true of higher values of input levels.
• It expands the values of dark pixels in an image while compressing the higher
level values.
• It compresses the dynamic range of images with large variations in pixel
values.
• Log reduces contrast of brighter regions.
•
•
• Negative Transformation: Inverts pixel values (useful for medical images).
• Negative Transformation is a point processing technique in
image processing where each pixel value in an image is replaced with
its negative. It effectively inverts the brightness levels of the
image.
•
• Formula:
• For a grayscale image with pixel values in the range [0, 255]:
•
• s = L - 1 - r
•
• Where:
• - s = output pixel value
• - r = original pixel value
• - L = total number of possible intensity levels (for 8-bit
images, L = 256)
• So, S=255-r
•
• Histogram Equalization: Improves contrast by redistributing intensity.
Histogram is the graphical representation of any data.
• Histogram provide a global description of the
appearance of an image. It is a spatial domain method.
• Histogram is the representation of relative frequency of occurrence of various
grey levels of an image.
• Histogram equalization is used for image enhancement.
• Histogram can control the quality of an image by normalizing the histogram
values to a flat profile.
• Contrast Stretching: Expands the range of intensity values. Contrast is
the difference between the highest grey level and low grey level of
an image.
• Low contrast images can result from poor illuminations, Lack of dynamic
range in the imaging sensor, or even the wrong setting of a lens aperture during
image acquisition.
• It expands the range of intensity levels in an image so that it spans the
full intensity range of display devices.
• The contrast can be stretched by making darker portion more darker and
brighter portion more brighter.
•
Diagram (description): Block diagram: Input Image → Enhancement (Log /
Negative / Histogram / Contrast) → Enhanced Output Algorithm:
1. Load input grayscale image.
2. Apply log transformation: s = c * log(1+r)
3. Apply negative transformation: s = 255 – r.
4. Apply histogram equalization using cv2.equalizeHist().
5. Apply contrast stretching manually.
6. Display all results.
Code:
import cv2 import
numpy as np
import matplotlib.pyplot as plt
# Load grayscale image
img = cv2.imread("sample.jpg", 0)
# Log transformation c = 255 /
np.log(1 + np.max(img)) log_trans
= c * (np.log(img + 1))
log_trans = np.array(log_trans, dtype=np.uint8)
# Negative transformation negative
= 255 - img
# Histogram Equalization hist_eq
= cv2.equalizeHist(img)
# Contrast Stretching min_val
= np.min(img) max_val =
np.max(img)
contrast_stretch = ((img - min_val) / (max_val - min_val)) * 255
contrast_stretch = contrast_stretch.astype(np.uint8)
#
Display
titles = ["Original", "Log", "Negative", "Histogram Eq.", "Contrast
Stretch"]
images = [img, log_trans, negative, hist_eq, contrast_stretch]
for i in
range(5):
plt.subplot(1,5,i+1)
plt.imshow(images[i], cmap="gray")
plt.title(titles[i]) plt.axis("off")
plt.show()
Output:
• Original and enhanced images with log, negative, histogram equalization, and contrast
stretching.
Conclusion:
We successfully applied various image enhancement techniques to improve brightness,
contrast, and visibility of image details.
Experiment 4: Implementation of Filtering
Techniques
Objective:
To implement filtering techniques such as image blurring and sharpening.
Theory:
Filtering is used to highlight or suppress certain features of an image.
• Blurring (Smoothing): Removes noise by averaging pixels (Gaussian, Median,
Averaging). Filtering techniques in image processing involve modifying an image to
enhance certain features or reduce unwanted noise. This is done by applying a small
matrix called a filter kernel, which moves across the image pixel by pixel. At each
position, the kernel’s values are multiplied with the corresponding neighboring pixel
values, summed up, and the central pixel is replaced with this new computed value—a
process known as convolution. There are different types of filters: smoothing filters,
like averaging or Gaussian filters, blur the image to reduce noise, while sharpening
filters, such as Laplacian filters, highlight edges and fine details. Filtering is essential
for improving image quality in various applications, including object detection and
medical imaging. Popular tools like OpenCV and MATLAB provide easy-to-use
functions to implement these filtering techniques efficiently.
• Sharpening: Enhances edges using Laplacian or high-pass filters. Sharpening is an
image processing technique used to enhance the clarity and detail of an image by
emphasizing edges and fine features. It works by increasing the contrast between
neighboring pixels where there are rapid intensity changes, which typically
correspond to edges or boundaries of objects. This makes the details stand out more
clearly and the image appear crisper. Sharpening is often achieved using filters like
the Laplacian or high-pass filters that highlight these intensity transitions. This
technique is widely applied in fields such as photography, medical imaging, and
computer vision to improve the visibility of important structures and details.
Diagram (description): Block
diagram:
Input Image → Blurring / Sharpening → Filtered Image Algorithm:
1. Load image.
2. Apply average blurring with cv2.blur().
3. Apply Gaussian blur with cv2.GaussianBlur().
4. Apply median blur with cv2.medianBlur().
5. Apply sharpening using a custom kernel with cv2.filter2D().
6. Display results.
Code:
# Load image
img = cv2.imread("sample.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Blurring
average = cv2.blur(img_rgb, (5,5)) gaussian =
cv2.GaussianBlur(img_rgb, (5,5), 0) median =
cv2.medianBlur(img_rgb, 5)
# Sharpening
kernel = np.array([[0, -1, 0],
[-1, 5,-1], [0,
-1, 0]])
sharpened = cv2.filter2D(img_rgb, -1, kernel)
#
Display
titles = ["Original", "Average Blur", "Gaussian Blur", "Median Blur",
"Sharpened"]
images = [img_rgb, average, gaussian, median, sharpened]
for i in
range(5):
plt.subplot(1,5,i+1)
plt.imshow(images[i])
plt.title(titles[i]) plt.axis("off")
plt.show() Output:
• Displays blurred and sharpened images.
Conclusion:
We successfully applied different filters to remove noise and enhance details in an image.
Experiment 5: Implementation of Edge
Detection Techniques
Objective:
To implement edge detection techniques such as Sobel, Prewitt, Laplacian, and Canny edge
detectors.
Theory:
• Edges are points where intensity changes sharply. Edge detection
significantly reduces the amount of data and filters out useless
information, while preserving the important structural
properties in an image.
• Edges are boundaries between different textures.
• Edge also can be defined as discontinuities in image intensity from one
pixel to another.
• The edges for an image are always the important characteristics that offer an
indication for a higher frequency.
• Detection of edges for an image may help for image segmentation, data
compression, and also help for well matching, such as image reconstruction and so
on.
• Edge detection is difficult in noisy images, since both the noise and the edges
contain high-frequency content.
•
• Sobel Operator: Calculates gradient in x and y direction. The Sobel operator is an
edge detection method that uses two 3x3 filters to find horizontal and vertical edges
by calculating image intensity gradients. It's simple, reduces noise, and highlights
sharp changes, making edges more visible.
• Prewitt Operator: Similar to Sobel but with simpler masks. The Prewitt operator is
an edge detection technique that uses two 3x3 filters to detect horizontal and vertical
edges by calculating intensity gradients. Similar to the Sobel operator but with
simpler kernels, it highlights regions with sudden brightness changes, helping identify
edges in an image. It's easy to implement and useful in basic image processing tasks.
• Laplacian Operator: Uses second derivative for edge detection. The Laplacian
operator is an edge detection technique that uses a single filter to detect edges by
measuring the second derivative of image intensity. Unlike Sobel or Prewitt, which
detect gradient direction, Laplacian highlights areas where intensity changes rapidly
in all directions. It’s sensitive to noise but very effective for finding fine edges and
details in an image.
• Canny Edge Detector: Multi-stage, gives best results with noise removal +
thresholding. The Canny edge detector is a multi-step edge detection technique
known for its accuracy and noise reduction. It involves smoothing the image with a
Gaussian filter, finding intensity gradients, applying non-maximum suppression to
thin the edges, and using double thresholding with edge tracking to keep only strong,
meaningful edges. It’s widely used due to its ability to detect clear, continuous edges
while minimizing noise.
Diagram (description):
Input Image → Edge Detection (Sobel / Prewitt / Laplacian / Canny) → Edge
Map
Algorithm:
1. Read input grayscale image.
2. Apply Sobel operator in X and Y.
3. Apply Prewitt operator manually with kernels.
4. Apply Laplacian operator.
5. Apply Canny detector using cv2.Canny().
6. Display results.
Code:
import cv2 import
numpy as np
import matplotlib.pyplot as plt
img = cv2.imread("sample.jpg", 0)
#
Sobel
sobelx = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
sobely = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3) sobel
= cv2.magnitude(sobelx, sobely)
#
Prewitt
kernelx = np.array([[-1,0,1],[-1,0,1],[-1,0,1]])
kernely = np.array([[1,1,1],[0,0,0],[-1,-1,-1]])
prewittx = cv2.filter2D(img, -1, kernelx)
prewitty = cv2.filter2D(img, -1, kernely) prewitt
= prewittx + prewitty
# Laplacian
laplacian = cv2.Laplacian(img, cv2.CV_64F)
#
Canny
canny = cv2.Canny(img, 100, 200)
#
Display
titles = ["Original", "Sobel", "Prewitt", "Laplacian", "Canny"] images
= [img, sobel, prewitt, laplacian, canny]
for i in
range(5):
plt.subplot(1,5,i+1)
plt.imshow(images[i], cmap="gray")
plt.title(titles[i]) plt.axis("off")
plt.show()
Conclusion:
We successfully implemented Sobel, Prewitt, Laplacian, and Canny methods for detecting
edges in images.
Experiment 6: Implementation of
Morphological Operations
Objective:
To implement basic morphological operations: erosion, dilation, opening, and closing.
Theory:
Morphological operations are used in binary image processing.
• Erosion: Shrinks white regions.
• Dilation: Expands white regions.
• Opening: Erosion followed by dilation (removes noise). Closing: Dilation
followed by erosion (fills gaps).
Diagram:
Input Binary Image → Morphological Operation → Processed Image Algorithm:
1. Read grayscale image and convert to binary.
2. Define structuring element (kernel).
3. Apply erosion and dilation.
4. Apply opening and closing.
5. Display results.
Code:
img = cv2.imread("sample.jpg", 0)
_, binary = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
kernel = np.ones((5,5), np.uint8)
erosion = cv2.erode(binary, kernel, iterations=1) dilation
= cv2.dilate(binary, kernel, iterations=1) opening =
cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel) closing =
cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)
titles = ["Original Binary", "Erosion", "Dilation", "Opening",
"Closing"] images = [binary, erosion, dilation, opening, closing]
for i in
range(5):
plt.subplot(1,5,i+1)
plt.imshow(images[i], cmap="gray")
plt.title(titles[i]) plt.axis("off")
plt.show()
Conclusion:
Morphological operations were successfully applied to remove noise and refine shapes in
binary images.
Experiment 7: Implementation of Image
Segmentation (Thresholding)
Objective:
To implement image segmentation using global thresholding, adaptive thresholding, and
Otsu’s method.
Theory:
• Segmentation partitions image into meaningful regions.
• Global Thresholding: Single threshold value.
• Adaptive Thresholding: Different thresholds for different regions. Otsu’s Method:
Automatic thresholding by minimizing variance.
Diagram:
Input Image → Thresholding (Global / Adaptive / Otsu) → Segmented Image
Algorithm:
1. Read grayscale image.
2. Apply global thresholding with cv2.threshold().
3. Apply adaptive thresholding with cv2.adaptiveThreshold().
4. Apply Otsu’s method.
5. Display results.
Code:
img = cv2.imread("sample.jpg", 0)
# Global Thresholding
_, th1 = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
# Adaptive Thresholding
th2 = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY, 11, 2)
th3 = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 2)
# Otsu’s Thresholding
_, th4 = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
titles = ["Original", "Global Thresh", "Adaptive Mean",
"Adaptive Gaussian", "Otsu"]
images = [img, th1, th2, th3, th4]
for i in
range(5):
plt.subplot(1,5,i+1)
plt.imshow(images[i], cmap="gray")
plt.title(titles[i]) plt.axis("off")
plt.show()
Conclusion:
We implemented segmentation techniques which effectively divided images into meaningful
foreground and background regions.
Experiment 8: Implementation of Image
Compression using DCT
Objective:
To implement image compression using the Discrete Cosine Transform (DCT).
Theory:
• DCT transforms image into frequency domain.
• High frequency components can be removed for compression. JPEG
uses DCT-based compression.
Diagram:
Input Image → DCT → Coefficient Quantization → Compressed Image Algorithm:
1. Read grayscale image.
2. Apply DCT using cv2.dct().
3. Quantize high frequency components by zeroing small coefficients.
4. Apply inverse DCT with cv2.idct().
5. Display original and compressed image.
Code:
img = cv2.imread("sample.jpg", 0) img
= np.float32(img) / 255.0
# Apply DCT dct =
cv2.dct(img)
# Zero out small coefficients (compression)
dct_compressed = np.copy(dct) dct_compressed[20:,
20:] = 0
# Apply Inverse DCT
compressed = cv2.idct(dct_compressed)
#
Display
titles = ["Original", "Compressed"] images
= [img, compressed]
for i in
range(2):
plt.subplot(1,2,i+1)
plt.imshow(images[i], cmap="gray")
plt.title(titles[i]) plt.axis("off")
plt.show()
Conclusion:
Image compression was successfully achieved using DCT by removing higher frequency
components while retaining major visual features.
Experiment 9: Extracting Histogram of
Oriented Gradients (HOG) Features
Objective
Extract HOG features for object detection and visualize the HOG image and descriptor.
Theory
HOG (Histogram of Oriented Gradients) captures local shape/edge information by:
1. computing image gradients, 2) creating orientation histograms within small cells, 3)
normalizing across blocks for illumination invariance, and 4) concatenating into a
feature vector. Widely used in pedestrian detection and classical object detectors
(SVM + HOG).
Diagram (one figure)
Image → Gradients → Cell Histograms → Block Normalization → HOG Feature
Vector
Algorithm
1. Read and convert image to grayscale.
2. Resize to a fixed size (e.g., 128×64) commonly used in HOG literature.
3. Compute gradients and cell histograms (e.g., 8×8 cells, 9 bins).
4. Normalize per block (e.g., 2×2 cells).
5. Visualize HOG image and print descriptor length. Python Code
import cv2 import numpy as np import
matplotlib.pyplot as plt from
skimage import color from
skimage.transform import resize from
skimage.feature import hog
# 1) Load & preprocess
img = cv2.imread("person.jpg") # supply your image
path img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img_gray =
color.rgb2gray(img_rgb)
img_resized = resize(img_gray, (128, 64), anti_aliasing=True)
# 2) HOG extraction features,
hog_image =
hog( img_resized,
orientations=9,
pixels_per_cell=(8, 8),
cells_per_block=(2, 2),
block_norm="L2-Hys",
visualize=True
)
print("HOG descriptor length:", features.shape[0])
# 3) Visualize plt.figure(figsize=(8,4))
plt.subplot(1,2,1); plt.imshow(img_resized, cmap='gray');
plt.title('Resized Input'); plt.axis('off')
plt.subplot(1,2,2); plt.imshow(hog_image, cmap='gray'); plt.title('HOG
Visualization'); plt.axis('off') plt.show()
Output (what you should see)
• Left: normalized/resized grayscale image.
• Right: HOG visualization highlighting edge orientations.
• Terminal prints the descriptor length (often a few thousand numbers for 128×64).
Conclusion
HOG encodes shape via local gradient orientation statistics and is robust to illumination
changes. It is a strong classical baseline and still useful for feature engineering or lightweight
detectors.
Experiment 10: Image Classification using a
Convolutional Neural Network (CNN)
Objective
Build, train, and evaluate a CNN for image classification (example: CIFAR-10).
Theory
CNNs learn hierarchical features: early layers capture edges/texture; deeper layers capture
parts/objects. Core building blocks: Conv → ReLU → Pool → (Dropout) → Dense →
Softmax. Use cross-entropy loss and SGD/Adam to optimize.
Diagram (one figure)
Input (32×32×3)
→ [Conv 3×3 + ReLU] ×2 → MaxPool
→ [Conv 3×3 + ReLU] ×2 → MaxPool
→ Flatten → Dense → Softmax(10 classes)
Algorithm
1. Load dataset (CIFAR-10), split into train/test.
2. Normalize images to [0,1].
3. Define CNN architecture.
4. Compile with optimizer (Adam), loss (categorical crossentropy), metrics
(accuracy).
5. Train for N epochs; validate on test set.
6. Evaluate and visualize accuracy/loss curves; show sample predictions. Python
Code
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt import
numpy as np
# 1) Load data
(x_train, y_train), (x_test, y_test) =
tf.keras.datasets.cifar10.load_data()
# 2) Preprocess
x_train = x_train.astype("float32")/255.0
x_test = x_test.astype("float32")/255.0
num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
# 3) Build CNN model =
models.Sequential([
layers.Conv2D(32, (3,3), activation='relu', padding='same',
input_shape=(32,32,3)),
layers.Conv2D(32, (3,3), activation='relu', padding='same'),
layers.MaxPooling2D((2,2)), layers.Dropout(0.25),
layers.Conv2D(64, (3,3), activation='relu',
padding='same'), layers.Conv2D(64, (3,3), activation='relu',
padding='same'), layers.MaxPooling2D((2,2)),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.5),
layers.Dense(num_classes, activation='softmax')
])
# 4) Compile
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy']) model.summary()
# 5) Train
history = model.fit(x_train, y_train, validation_split=0.1, epochs=10,
batch_size=64, verbose=1)
# 6) Evaluate
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f"Test accuracy: {test_acc:.3f}")
# 7) Plot training curves plt.figure()
plt.plot(history.history['accuracy'], label='train_acc')
plt.plot(history.history['val_accuracy'], label='val_acc') plt.legend();
plt.title('Accuracy'); plt.xlabel('Epoch'); plt.ylabel('Acc');
plt.show()
Output
• Model summary in console; training/validation curves; final test accuracy (typically
~70% with this simple model/epochs).
• You can also add a confusion matrix and sample predictions for richer analysis.
Conclusion
A simple CNN successfully learns image features and classifies CIFAR-10 images. Accuracy
improves with data augmentation, deeper networks, longer training, and regularization.
Experiment 11: Mini Project — Real-Time
Lane Detection (Classical CV Pipeline)
Objective
Detect road lane lines from a forward-facing driving video using classical CV: color/gradient
thresholding, Canny edges, ROI masking, Hough transform.
Theory
Lane detection can be approached by:
• Preprocessing: undistort/denoise, convert to grayscale/HSV.
• Edge & Thresholding: emphasize lane markings (white/yellow).
• ROI Mask: focus on trapezoid covering road ahead.
• Hough Transform: find line segments; split into left/right lanes based on slope.
Diagram (one figure)
Frame → Undistort/Filter → Canny Edges → ROI Mask → Hough Lines → Overlay
on Frame
Algorithm
1. Read video frames.
2. Convert to grayscale; apply Gaussian blur.
3. Detect edges with Canny.
4. Define polygonal ROI mask and apply.
5. Hough transform to detect line segments.
6. Separate left/right using slope; average/extrapolate; overlay on frame.
7. Display real-time result; write output video (optional). Python Code
import cv2 import
numpy as np
def region_of_interest(img, vertices):
mask = np.zeros_like(img)
cv2.fillPoly(mask, [vertices], 255)
return cv2.bitwise_and(img, mask)
def draw_lines(img, lines, color=(0,255,0),
thickness=6):
if lines is None: return img
line_img = np.zeros_like(img)
left, right = [], [] for l in
lines:
x1,y1,x2,y2 = l[0]
if x2==x1:
continue slope = (y2-
y1)/(x2-x1)
if slope < -0.5: left.append((x1,y1,x2,y2))
elif slope > 0.5: right.append((x1,y1,x2,y2))
def average_lane(points):
if len(points)==0: return None
xs = []; ys = [] for
x1,y1,x2,y2 in points: xs
+= [x1,x2]; ys += [y1,y2] if
len(xs)<2: return None
poly = np.polyfit(xs, ys, 1) # y = m x + b
return poly
def make_points(img,
poly):
if poly is None: return None
m,b = poly y1 = img.shape[0]
y2 = int(y1*0.6) x1 =
int((y1 - b)/m) x2 = int((y2
- b)/m) return x1,y1,x2,y2
left_lane = average_lane(left)
right_lane = average_lane(right) for
lane in [left_lane, right_lane]:
pts = make_points(img, lane) if
pts:
x1,y1,x2,y2 = pts
cv2.line(line_img, (x1,y1), (x2,y2), color, thickness)
return cv2.addWeighted(img, 1.0, line_img, 0.8, 0)
cap = cv2.VideoCapture("drive.mp4") # supply your input
video while cap.isOpened(): ret, frame = cap.read() if
not ret: break
gray = cv2.cvtColor(frame,
cv2.COLOR_BGR2GRAY) blur =
cv2.GaussianBlur(gray, (5,5), 0) edges =
cv2.Canny(blur, 50, 150)
h, w = edges.shape
roi_vertices = np.array([
(int(0.1*w), h),
(int(0.45*w), int(0.6*h)),
(int(0.55*w), int(0.6*h)),
(int(0.9*w), h)
], dtype=np.int32)
masked = region_of_interest(edges, roi_vertices)
lines = cv2.HoughLinesP(masked, 1, np.pi/180, threshold=40,
minLineLength=30, maxLineGap=150) out = draw_lines(frame,
lines)
cv2.imshow("Lane Detection", out) if
cv2.waitKey(1) & 0xFF == 27: # ESC to stop
break
cap.release()
cv2.destroyAllWindows()
Output
• Real-time video with green left/right lane lines overlaid. Works best on clear
daylight highway footage.
Conclusion
Classical CV pipeline provides a fast and understandable lane detector. Robustness can be
improved with color threshold tuning, perspective transform (“bird’s-eye view”), and
temporal smoothing; modern approaches use deep segmentation networks.
Experiment 12: Study of Image Processing
Libraries
Objective
Study and compare commonly used Python libraries for image processing and computer
vision: OpenCV, scikit-image, Pillow (PIL), Matplotlib, TensorFlow/Keras, and PyTorch.
Theory / Comparative Study
• OpenCV (cv2)
o Strengths: Fast, C++ core, huge toolbox (I/O, filtering, features, geometry,
DNN). o Use cases: Real-time CV, video, feature detection, classical
pipelines. o Install: pip install opencv-python (or opencv-contrib-
python for extra modules).
• scikit-image (skimage) o Strengths: Research/education friendly; lots of
algorithms (HOG, segmentation, morphology) with clean APIs. o Use cases:
Prototyping, scientific computing with NumPy. o Install: pip install
scikit-image.
• Pillow (PIL fork) o Strengths: Lightweight image I/O and basic transforms;
integrates well with Python ecosystem. o Use cases: Simple preprocessing,
thumbnails, format conversion.
o Install: pip install pillow.
• Matplotlib o Strengths: Visualization and plotting; image display with color
maps; debugging. o Use cases: Compare outputs, draw overlays, report plots. o
Install: pip install matplotlib.
• TensorFlow / Keras o Strengths: End-to-end deep learning (training, deployment),
high-level Keras API for CNNs. o Use cases: Image classification, detection,
segmentation; model training.
o Install: pip install tensorflow (or GPU variant).
• PyTorch
o Strengths: Dynamic graphs, pythonic API, research-friendly; strong ecosystem
(torchvision). o Use cases: Custom models, fast prototyping, SOTA
research.
o Install: pip install torch torchvision torchaudio (GPU options vary
by CUDA).
Diagram (one figure)
I/O & Classical CV → OpenCV / scikit-image / Pillow Visualization
→ Matplotlib
Deep Learning → TensorFlow-Keras / PyTorch
Sample Snippets (Quick Reference)
# OpenCV: read/resize/save import
cv2
img = cv2.imread("img.jpg") img2 =
cv2.resize(img, (256, 256))
cv2.imwrite("out.jpg", img2)
# scikit-image: HOG from
skimage.feature import hog from
skimage.color import rgb2gray
desc, hog_img = hog(rgb2gray(img2), visualize=True)
# Pillow: open/rotate from
PIL import Image
im = Image.open("img.jpg").rotate(45) im.save("rotated.jpg")
# Matplotlib: show import
matplotlib.pyplot as plt
plt.imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)); plt.axis('off');
plt.show()
Conclusion
Each library serves a distinct role: OpenCV for fast classical CV, skimage for clean
algorithmic tools, Pillow for simple I/O/manipulations, Matplotlib for visualization, and
TF/PyTorch for deep learning. In real projects, they are often combined.