0% found this document useful (0 votes)
172 views8 pages

CNN Assignment 3 Solutions

The document provides solutions for various problems related to Convolutional Neural Networks, including convolution operations, edge detection, padding calculations, strided convolutions, multi-channel convolution, CNN architecture design, and pooling operations. It details calculations for output sizes, parameters, and operations for different configurations and layers. The document emphasizes the importance of kernel types for edge detection and provides a comprehensive summary of the CNN architecture with parameter counts.

Uploaded by

salamat ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
172 views8 pages

CNN Assignment 3 Solutions

The document provides solutions for various problems related to Convolutional Neural Networks, including convolution operations, edge detection, padding calculations, strided convolutions, multi-channel convolution, CNN architecture design, and pooling operations. It details calculations for output sizes, parameters, and operations for different configurations and layers. The document emphasizes the importance of kernel types for edge detection and provides a comprehensive summary of the CNN architecture with parameter counts.

Uploaded by

salamat ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Assignment 3: Convolutional Neural Networks - Solutions

Problem 1: Basic Convolution Operation

Given:
Image I (4×4): [[1,0,1,2], [0,1,0,1], [2,1,2,0], [1,0,1,1]]
Kernel K (3×3): [[1,0,-1], [1,0,-1], [1,0,-1]]

(a) Complete Convolution I * K


Output element (0,0):

Region: [[1,0,1], [0,1,0], [2,1,2]]


Calculation: (1×1 + 0×0 + 1×(-1)) + (0×1 + 1×0 + 0×(-1)) + (2×1 + 1×0 + 2×(-1))
= (1 + 0 - 1) + (0 + 0 + 0) + (2 + 0 - 2) = 0 + 0 + 0 = 0

Output element (0,1):

Region: [[0,1,2], [1,0,1], [1,2,0]]


Calculation: (0×1 + 1×0 + 2×(-1)) + (1×1 + 0×0 + 1×(-1)) + (1×1 + 2×0 + 0×(-1))
= (0 + 0 - 2) + (1 + 0 - 1) + (1 + 0 + 0) = -2 + 0 + 1 = -1

Output element (1,0):

Region: [[0,1,0], [2,1,2], [1,0,1]]


Calculation: (0×1 + 1×0 + 0×(-1)) + (2×1 + 1×0 + 2×(-1)) + (1×1 + 0×0 + 1×(-1))
= (0 + 0 + 0) + (2 + 0 - 2) + (1 + 0 - 1) = 0 + 0 + 0 = 0

Output element (1,1):

Region: [[1,0,1], [1,2,0], [0,1,1]]


Calculation: (1×1 + 0×0 + 1×(-1)) + (1×1 + 2×0 + 0×(-1)) + (0×1 + 1×0 + 1×(-1))
= (1 + 0 - 1) + (1 + 0 + 0) + (0 + 0 - 1) = 0 + 1 - 1 = 0

Complete Output Feature Map:

[[0, -1],
[0, 0]]
(b) Output Size
Formula: Output size = (Input size - Kernel size + 1)
Output size = (4 - 3 + 1) × (4 - 3 + 1) = 2 × 2

(c) Edge Detection Type


This kernel performs vertical edge detection. The pattern [1,0,-1] in each row detects changes from left
to right, highlighting vertical edges where pixel intensity changes horizontally.

Problem 2: Edge Detection

Given:
Image I (6×6): [[3,0,1,2,7,4], [1,5,8,2,3,0], [2,7,2,5,1,3], [0,1,3,1,7,8], [4,2,1,6,2,8], [2,4,5,2,3,9]]

(a) Vertical Edge Detection - First Row


Kernel Kv = [[1,0,-1], [1,0,-1], [1,0,-1]]

Position (0,0):

Region: [[3,0,1], [1,5,8], [2,7,2]]


Result: (3×1 + 0×0 + 1×(-1)) + (1×1 + 5×0 + 8×(-1)) + (2×1 + 7×0 + 2×(-1))
= (3 + 0 - 1) + (1 + 0 - 8) + (2 + 0 - 2) = 2 + (-7) + 0 = -5

Position (0,1):

Region: [[0,1,2], [5,8,2], [7,2,5]]


Result: (0×1 + 1×0 + 2×(-1)) + (5×1 + 8×0 + 2×(-1)) + (7×1 + 2×0 + 5×(-1))
= (0 + 0 - 2) + (5 + 0 - 2) + (7 + 0 - 5) = -2 + 3 + 2 = 3

Position (0,2):

Region: [[1,2,7], [8,2,3], [2,5,1]]


Result: (1×1 + 2×0 + 7×(-1)) + (8×1 + 2×0 + 3×(-1)) + (2×1 + 5×0 + 1×(-1))
= (1 + 0 - 7) + (8 + 0 - 3) + (2 + 0 - 1) = -6 + 5 + 1 = 0

Position (0,3):
Region: [[2,7,4], [2,3,0], [5,1,3]]
Result: (2×1 + 7×0 + 4×(-1)) + (2×1 + 3×0 + 0×(-1)) + (5×1 + 1×0 + 3×(-1))
= (2 + 0 - 4) + (2 + 0 + 0) + (5 + 0 - 3) = -2 + 2 + 2 = 2

First row of vertical edge detection: [-5, 3, 0, 2]

(b) Horizontal Edge Detection - First Row


Kernel Kh = [[1,1,1], [0,0,0], [-1,-1,-1]]

Position (0,0):

Region: [[3,0,1], [1,5,8], [2,7,2]]


Result: (3×1 + 0×1 + 1×1) + (1×0 + 5×0 + 8×0) + (2×(-1) + 7×(-1) + 2×(-1))
= (3 + 0 + 1) + (0 + 0 + 0) + (-2 - 7 - 2) = 4 + 0 - 11 = -7

Position (0,1):

Region: [[0,1,2], [5,8,2], [7,2,5]]


Result: (0×1 + 1×1 + 2×1) + (5×0 + 8×0 + 2×0) + (7×(-1) + 2×(-1) + 5×(-1))
= (0 + 1 + 2) + (0 + 0 + 0) + (-7 - 2 - 5) = 3 + 0 - 14 = -11

Position (0,2):

Region: [[1,2,7], [8,2,3], [2,5,1]]


Result: (1×1 + 2×1 + 7×1) + (8×0 + 2×0 + 3×0) + (2×(-1) + 5×(-1) + 1×(-1))
= (1 + 2 + 7) + (0 + 0 + 0) + (-2 - 5 - 1) = 10 + 0 - 8 = 2

Position (0,3):

Region: [[2,7,4], [2,3,0], [5,1,3]]


Result: (2×1 + 7×1 + 4×1) + (2×0 + 3×0 + 0×0) + (5×(-1) + 1×(-1) + 3×(-1))
= (2 + 7 + 4) + (0 + 0 + 0) + (-5 - 1 - 3) = 13 + 0 - 9 = 4

First row of horizontal edge detection: [-7, -11, 2, 4]

(c) Comparison
Vertical kernel: Detects horizontal intensity changes (vertical edges)

Horizontal kernel: Detects vertical intensity changes (horizontal edges)


Different response patterns indicate different edge orientations in the image

Problem 3: Padding Calculations

Formula: Output size = ⌊(Input + 2×Padding - Kernel)/Stride⌋ + 1

(a) Input: 32×32, Kernel: 5×5, Padding: Valid (p=0)

Output = ⌊(32 + 2×0 - 5)/1⌋ + 1 = ⌊27⌋ + 1 = 28×28

(b) Input: 32×32, Kernel: 5×5, Padding: Same


For same padding: p = ⌊(k-1)/2⌋ = ⌊(5-1)/2⌋ = 2

Output = ⌊(32 + 2×2 - 5)/1⌋ + 1 = ⌊31⌋ + 1 = 32×32

(c) Input: 64×64, Kernel: 7×7, Padding: p=2

Output = ⌊(64 + 2×2 - 7)/1⌋ + 1 = ⌊61⌋ + 1 = 62×62

(d) Padding for 32×32 input with 9×9 kernel to maintain same size
For output = input: 32 = ⌊(32 + 2×p - 9)/1⌋ + 1

31 = 32 + 2p - 9
31 = 23 + 2p
2p = 8
p=4

Problem 4: Strided Convolutions

Formula: Output size = ⌊(Input + 2×Padding - Kernel)/Stride⌋ + 1

(a) Input: 39×39, Kernel: 3×3, Stride: 2, Padding: Valid (p=0)

Output = ⌊(39 + 2×0 - 3)/2⌋ + 1 = ⌊36/2⌋ + 1 = 18 + 1 = 19×19

(b) Input: 64×64, Kernel: 5×5, Stride: 3, Padding: p=2

Output = ⌊(64 + 2×2 - 5)/3⌋ + 1 = ⌊63/3⌋ + 1 = 21 + 1 = 22×22


(c) Stride for 28×28 → ~7×7 with 5×5 kernel, valid padding

Target: 7 = ⌊(28 + 0 - 5)/s⌋ + 1

6 = ⌊23/s⌋
For s = 4: ⌊23/4⌋ = ⌊5.75⌋ = 5, so output = 6×6
For s = 3: ⌊23/3⌋ = ⌊7.67⌋ = 7, so output = 8×8
For s = 4: output = 6×6 (closest to 7×7)

Answer: Stride = 4

Problem 5: Multi-Channel Convolution

Given: RGB image 32×32×3, 64 filters of 5×5×3, stride 1, same padding

(a) Output Volume Dimensions

Height/Width: (32 + 2×2 - 5)/1 + 1 = 32


Depth: Number of filters = 64
Output: 32×32×64

(b) Total Parameters

Parameters per filter: 5×5×3 + 1 (bias) = 75 + 1 = 76


Total parameters: 76 × 64 = 4,864 parameters

(c) Multiply-Accumulate Operations

Operations per filter per output position: 5×5×3 = 75 MAC ops


Output positions per filter: 32×32 = 1,024
Total operations per filter: 75 × 1,024 = 76,800
Total for all filters: 76,800 × 64 = 4,915,200 MAC operations

Problem 6: CNN Architecture Design

Given Architecture:
Input: 64×64×3
Layer 1: 16 filters, 7×7, stride 2, same padding

Layer 2: 32 filters, 5×5, stride 1, valid padding


Layer 3: 64 filters, 3×3, stride 2, same padding

(a) Output Dimensions After Each Layer


Layer 1:

Padding for same: p = ⌊(7-1)/2⌋ = 3


Output = ⌊(64 + 2×3 - 7)/2⌋ + 1 = ⌊63/2⌋ + 1 = 32×32×16

Layer 2:

Output = ⌊(32 + 0 - 5)/1⌋ + 1 = 28×28×32

Layer 3:

Padding for same: p = ⌊(3-1)/2⌋ = 1


Output = ⌊(28 + 2×1 - 3)/2⌋ + 1 = ⌊27/2⌋ + 1 = 14×14×64

(b) Parameters in Each Layer


Layer 1: (7×7×3 + 1) × 16 = 148 × 16 = 2,368 parameters Layer 2: (5×5×16 + 1) × 32 = 401 × 32 =
12,832 parameters
Layer 3: (3×3×32 + 1) × 64 = 289 × 64 = 18,496 parameters

(c) Total Parameters

Total = 2,368 + 12,832 + 18,496 = 33,696 parameters

(d) Summary Table


Layer Input Size Filters Kernel Stride Padding Output Size Parameters

Input 64×64×3 - - - - 64×64×3 0

Conv1 64×64×3 16 7×7 2 3 32×32×16 2,368

Conv2 32×32×16 32 5×5 1 0 28×28×32 12,832

Conv3 28×28×32 64 3×3 2 1 14×14×64 18,496

Total 33,696

Problem 7: Pooling Operations


Given: Feature map 8×8×64

(a) Max pooling with 2×2 window, stride 2

Output = ⌊(8 - 2)/2⌋ + 1 = 4×4×64

(b) Average pooling with 3×3 window, stride 1

Output = ⌊(8 - 3)/1⌋ + 1 = 6×6×64

(c) Max pooling with 4×4 window, stride 4

Output = ⌊(8 - 4)/4⌋ + 1 = 2×2×64

Additional Calculation for (a):


Given 4×4 input: [[1,3,2,4], [2,1,4,3], [0,2,1,5], [1,0,3,2]]

Pool 1 (top-left 2×2): [[1,3], [2,1]] → max = 3 Pool 2 (top-right 2×2): [[2,4], [4,3]] → max = 4 Pool 3
(bottom-left 2×2): [[0,2], [1,0]] → max = 2 Pool 4 (bottom-right 2×2): [[1,5], [3,2]] → max = 5

Max pooled output: [[3,4], [2,5]]

Problem 8: Complete CNN Analysis

Architecture for CIFAR-10:


Input: 32×32×3

Conv1: 32 filters, 3×3, stride 1, same padding, ReLU

MaxPool1: 2×2, stride 2

Conv2: 64 filters, 3×3, stride 1, same padding, ReLU

MaxPool2: 2×2, stride 2

Conv3: 128 filters, 3×3, stride 1, same padding, ReLU


MaxPool3: 2×2, stride 2

Flatten + FC to 10 outputs

(a) Dimensions After Each Layer


Layer Input Size Output Size Calculation

Input 32×32×3 32×32×3 -

Conv1 32×32×3 32×32×32 Same padding

MaxPool1 32×32×32 16×16×32 (32-2)/2 + 1 = 16

Conv2 16×16×32 16×16×64 Same padding

MaxPool2 16×16×64 8×8×64 (16-2)/2 + 1 = 8

Conv3 8×8×64 8×8×128 Same padding

MaxPool3 8×8×128 4×4×128 (8-2)/2 + 1 = 4

Flatten 4×4×128 2048 4×4×128 = 2048

FC 2048 10 Fully connected

(b) Parameters in Convolutional Layers


Conv1: (3×3×3 + 1) × 32 = 28 × 32 = 896 parameters Conv2: (3×3×32 + 1) × 64 = 289 × 64 = 18,496
parameters Conv3: (3×3×64 + 1) × 128 = 577 × 128 = 73,856 parameters

(c) Parameters in Fully Connected Layer

FC parameters: 2048 × 10 + 10 = 20,490 parameters

(d) Total Parameters

Total = 896 + 18,496 + 73,856 + 20,490 = 113,738 parameters

(e) Memory Requirements for Activations (32-bit floats = 4 bytes)


Layer Size Memory (bytes) Memory (MB)

Input 32×32×3 3,072 × 4 = 12,288 0.012

Conv1 32×32×32 32,768 × 4 = 131,072 0.131

MaxPool1 16×16×32 8,192 × 4 = 32,768 0.033

Conv2 16×16×64 16,384 × 4 = 65,536 0.066

MaxPool2 8×8×64 4,096 × 4 = 16,384 0.016

Conv3 8×8×128 8,192 × 4 = 32,768 0.033

MaxPool3 4×4×128 2,048 × 4 = 8,192 0.008

Total 298,208 bytes ~0.30 MB

You might also like