Assignment 3: Convolutional Neural Networks - Solutions
Problem 1: Basic Convolution Operation
Given:
Image I (4×4): [[1,0,1,2], [0,1,0,1], [2,1,2,0], [1,0,1,1]]
Kernel K (3×3): [[1,0,-1], [1,0,-1], [1,0,-1]]
(a) Complete Convolution I * K
Output element (0,0):
Region: [[1,0,1], [0,1,0], [2,1,2]]
Calculation: (1×1 + 0×0 + 1×(-1)) + (0×1 + 1×0 + 0×(-1)) + (2×1 + 1×0 + 2×(-1))
= (1 + 0 - 1) + (0 + 0 + 0) + (2 + 0 - 2) = 0 + 0 + 0 = 0
Output element (0,1):
Region: [[0,1,2], [1,0,1], [1,2,0]]
Calculation: (0×1 + 1×0 + 2×(-1)) + (1×1 + 0×0 + 1×(-1)) + (1×1 + 2×0 + 0×(-1))
= (0 + 0 - 2) + (1 + 0 - 1) + (1 + 0 + 0) = -2 + 0 + 1 = -1
Output element (1,0):
Region: [[0,1,0], [2,1,2], [1,0,1]]
Calculation: (0×1 + 1×0 + 0×(-1)) + (2×1 + 1×0 + 2×(-1)) + (1×1 + 0×0 + 1×(-1))
= (0 + 0 + 0) + (2 + 0 - 2) + (1 + 0 - 1) = 0 + 0 + 0 = 0
Output element (1,1):
Region: [[1,0,1], [1,2,0], [0,1,1]]
Calculation: (1×1 + 0×0 + 1×(-1)) + (1×1 + 2×0 + 0×(-1)) + (0×1 + 1×0 + 1×(-1))
= (1 + 0 - 1) + (1 + 0 + 0) + (0 + 0 - 1) = 0 + 1 - 1 = 0
Complete Output Feature Map:
[[0, -1],
[0, 0]]
(b) Output Size
Formula: Output size = (Input size - Kernel size + 1)
Output size = (4 - 3 + 1) × (4 - 3 + 1) = 2 × 2
(c) Edge Detection Type
This kernel performs vertical edge detection. The pattern [1,0,-1] in each row detects changes from left
to right, highlighting vertical edges where pixel intensity changes horizontally.
Problem 2: Edge Detection
Given:
Image I (6×6): [[3,0,1,2,7,4], [1,5,8,2,3,0], [2,7,2,5,1,3], [0,1,3,1,7,8], [4,2,1,6,2,8], [2,4,5,2,3,9]]
(a) Vertical Edge Detection - First Row
Kernel Kv = [[1,0,-1], [1,0,-1], [1,0,-1]]
Position (0,0):
Region: [[3,0,1], [1,5,8], [2,7,2]]
Result: (3×1 + 0×0 + 1×(-1)) + (1×1 + 5×0 + 8×(-1)) + (2×1 + 7×0 + 2×(-1))
= (3 + 0 - 1) + (1 + 0 - 8) + (2 + 0 - 2) = 2 + (-7) + 0 = -5
Position (0,1):
Region: [[0,1,2], [5,8,2], [7,2,5]]
Result: (0×1 + 1×0 + 2×(-1)) + (5×1 + 8×0 + 2×(-1)) + (7×1 + 2×0 + 5×(-1))
= (0 + 0 - 2) + (5 + 0 - 2) + (7 + 0 - 5) = -2 + 3 + 2 = 3
Position (0,2):
Region: [[1,2,7], [8,2,3], [2,5,1]]
Result: (1×1 + 2×0 + 7×(-1)) + (8×1 + 2×0 + 3×(-1)) + (2×1 + 5×0 + 1×(-1))
= (1 + 0 - 7) + (8 + 0 - 3) + (2 + 0 - 1) = -6 + 5 + 1 = 0
Position (0,3):
Region: [[2,7,4], [2,3,0], [5,1,3]]
Result: (2×1 + 7×0 + 4×(-1)) + (2×1 + 3×0 + 0×(-1)) + (5×1 + 1×0 + 3×(-1))
= (2 + 0 - 4) + (2 + 0 + 0) + (5 + 0 - 3) = -2 + 2 + 2 = 2
First row of vertical edge detection: [-5, 3, 0, 2]
(b) Horizontal Edge Detection - First Row
Kernel Kh = [[1,1,1], [0,0,0], [-1,-1,-1]]
Position (0,0):
Region: [[3,0,1], [1,5,8], [2,7,2]]
Result: (3×1 + 0×1 + 1×1) + (1×0 + 5×0 + 8×0) + (2×(-1) + 7×(-1) + 2×(-1))
= (3 + 0 + 1) + (0 + 0 + 0) + (-2 - 7 - 2) = 4 + 0 - 11 = -7
Position (0,1):
Region: [[0,1,2], [5,8,2], [7,2,5]]
Result: (0×1 + 1×1 + 2×1) + (5×0 + 8×0 + 2×0) + (7×(-1) + 2×(-1) + 5×(-1))
= (0 + 1 + 2) + (0 + 0 + 0) + (-7 - 2 - 5) = 3 + 0 - 14 = -11
Position (0,2):
Region: [[1,2,7], [8,2,3], [2,5,1]]
Result: (1×1 + 2×1 + 7×1) + (8×0 + 2×0 + 3×0) + (2×(-1) + 5×(-1) + 1×(-1))
= (1 + 2 + 7) + (0 + 0 + 0) + (-2 - 5 - 1) = 10 + 0 - 8 = 2
Position (0,3):
Region: [[2,7,4], [2,3,0], [5,1,3]]
Result: (2×1 + 7×1 + 4×1) + (2×0 + 3×0 + 0×0) + (5×(-1) + 1×(-1) + 3×(-1))
= (2 + 7 + 4) + (0 + 0 + 0) + (-5 - 1 - 3) = 13 + 0 - 9 = 4
First row of horizontal edge detection: [-7, -11, 2, 4]
(c) Comparison
Vertical kernel: Detects horizontal intensity changes (vertical edges)
Horizontal kernel: Detects vertical intensity changes (horizontal edges)
Different response patterns indicate different edge orientations in the image
Problem 3: Padding Calculations
Formula: Output size = ⌊(Input + 2×Padding - Kernel)/Stride⌋ + 1
(a) Input: 32×32, Kernel: 5×5, Padding: Valid (p=0)
Output = ⌊(32 + 2×0 - 5)/1⌋ + 1 = ⌊27⌋ + 1 = 28×28
(b) Input: 32×32, Kernel: 5×5, Padding: Same
For same padding: p = ⌊(k-1)/2⌋ = ⌊(5-1)/2⌋ = 2
Output = ⌊(32 + 2×2 - 5)/1⌋ + 1 = ⌊31⌋ + 1 = 32×32
(c) Input: 64×64, Kernel: 7×7, Padding: p=2
Output = ⌊(64 + 2×2 - 7)/1⌋ + 1 = ⌊61⌋ + 1 = 62×62
(d) Padding for 32×32 input with 9×9 kernel to maintain same size
For output = input: 32 = ⌊(32 + 2×p - 9)/1⌋ + 1
31 = 32 + 2p - 9
31 = 23 + 2p
2p = 8
p=4
Problem 4: Strided Convolutions
Formula: Output size = ⌊(Input + 2×Padding - Kernel)/Stride⌋ + 1
(a) Input: 39×39, Kernel: 3×3, Stride: 2, Padding: Valid (p=0)
Output = ⌊(39 + 2×0 - 3)/2⌋ + 1 = ⌊36/2⌋ + 1 = 18 + 1 = 19×19
(b) Input: 64×64, Kernel: 5×5, Stride: 3, Padding: p=2
Output = ⌊(64 + 2×2 - 5)/3⌋ + 1 = ⌊63/3⌋ + 1 = 21 + 1 = 22×22
(c) Stride for 28×28 → ~7×7 with 5×5 kernel, valid padding
Target: 7 = ⌊(28 + 0 - 5)/s⌋ + 1
6 = ⌊23/s⌋
For s = 4: ⌊23/4⌋ = ⌊5.75⌋ = 5, so output = 6×6
For s = 3: ⌊23/3⌋ = ⌊7.67⌋ = 7, so output = 8×8
For s = 4: output = 6×6 (closest to 7×7)
Answer: Stride = 4
Problem 5: Multi-Channel Convolution
Given: RGB image 32×32×3, 64 filters of 5×5×3, stride 1, same padding
(a) Output Volume Dimensions
Height/Width: (32 + 2×2 - 5)/1 + 1 = 32
Depth: Number of filters = 64
Output: 32×32×64
(b) Total Parameters
Parameters per filter: 5×5×3 + 1 (bias) = 75 + 1 = 76
Total parameters: 76 × 64 = 4,864 parameters
(c) Multiply-Accumulate Operations
Operations per filter per output position: 5×5×3 = 75 MAC ops
Output positions per filter: 32×32 = 1,024
Total operations per filter: 75 × 1,024 = 76,800
Total for all filters: 76,800 × 64 = 4,915,200 MAC operations
Problem 6: CNN Architecture Design
Given Architecture:
Input: 64×64×3
Layer 1: 16 filters, 7×7, stride 2, same padding
Layer 2: 32 filters, 5×5, stride 1, valid padding
Layer 3: 64 filters, 3×3, stride 2, same padding
(a) Output Dimensions After Each Layer
Layer 1:
Padding for same: p = ⌊(7-1)/2⌋ = 3
Output = ⌊(64 + 2×3 - 7)/2⌋ + 1 = ⌊63/2⌋ + 1 = 32×32×16
Layer 2:
Output = ⌊(32 + 0 - 5)/1⌋ + 1 = 28×28×32
Layer 3:
Padding for same: p = ⌊(3-1)/2⌋ = 1
Output = ⌊(28 + 2×1 - 3)/2⌋ + 1 = ⌊27/2⌋ + 1 = 14×14×64
(b) Parameters in Each Layer
Layer 1: (7×7×3 + 1) × 16 = 148 × 16 = 2,368 parameters Layer 2: (5×5×16 + 1) × 32 = 401 × 32 =
12,832 parameters
Layer 3: (3×3×32 + 1) × 64 = 289 × 64 = 18,496 parameters
(c) Total Parameters
Total = 2,368 + 12,832 + 18,496 = 33,696 parameters
(d) Summary Table
Layer Input Size Filters Kernel Stride Padding Output Size Parameters
Input 64×64×3 - - - - 64×64×3 0
Conv1 64×64×3 16 7×7 2 3 32×32×16 2,368
Conv2 32×32×16 32 5×5 1 0 28×28×32 12,832
Conv3 28×28×32 64 3×3 2 1 14×14×64 18,496
Total 33,696
Problem 7: Pooling Operations
Given: Feature map 8×8×64
(a) Max pooling with 2×2 window, stride 2
Output = ⌊(8 - 2)/2⌋ + 1 = 4×4×64
(b) Average pooling with 3×3 window, stride 1
Output = ⌊(8 - 3)/1⌋ + 1 = 6×6×64
(c) Max pooling with 4×4 window, stride 4
Output = ⌊(8 - 4)/4⌋ + 1 = 2×2×64
Additional Calculation for (a):
Given 4×4 input: [[1,3,2,4], [2,1,4,3], [0,2,1,5], [1,0,3,2]]
Pool 1 (top-left 2×2): [[1,3], [2,1]] → max = 3 Pool 2 (top-right 2×2): [[2,4], [4,3]] → max = 4 Pool 3
(bottom-left 2×2): [[0,2], [1,0]] → max = 2 Pool 4 (bottom-right 2×2): [[1,5], [3,2]] → max = 5
Max pooled output: [[3,4], [2,5]]
Problem 8: Complete CNN Analysis
Architecture for CIFAR-10:
Input: 32×32×3
Conv1: 32 filters, 3×3, stride 1, same padding, ReLU
MaxPool1: 2×2, stride 2
Conv2: 64 filters, 3×3, stride 1, same padding, ReLU
MaxPool2: 2×2, stride 2
Conv3: 128 filters, 3×3, stride 1, same padding, ReLU
MaxPool3: 2×2, stride 2
Flatten + FC to 10 outputs
(a) Dimensions After Each Layer
Layer Input Size Output Size Calculation
Input 32×32×3 32×32×3 -
Conv1 32×32×3 32×32×32 Same padding
MaxPool1 32×32×32 16×16×32 (32-2)/2 + 1 = 16
Conv2 16×16×32 16×16×64 Same padding
MaxPool2 16×16×64 8×8×64 (16-2)/2 + 1 = 8
Conv3 8×8×64 8×8×128 Same padding
MaxPool3 8×8×128 4×4×128 (8-2)/2 + 1 = 4
Flatten 4×4×128 2048 4×4×128 = 2048
FC 2048 10 Fully connected
(b) Parameters in Convolutional Layers
Conv1: (3×3×3 + 1) × 32 = 28 × 32 = 896 parameters Conv2: (3×3×32 + 1) × 64 = 289 × 64 = 18,496
parameters Conv3: (3×3×64 + 1) × 128 = 577 × 128 = 73,856 parameters
(c) Parameters in Fully Connected Layer
FC parameters: 2048 × 10 + 10 = 20,490 parameters
(d) Total Parameters
Total = 896 + 18,496 + 73,856 + 20,490 = 113,738 parameters
(e) Memory Requirements for Activations (32-bit floats = 4 bytes)
Layer Size Memory (bytes) Memory (MB)
Input 32×32×3 3,072 × 4 = 12,288 0.012
Conv1 32×32×32 32,768 × 4 = 131,072 0.131
MaxPool1 16×16×32 8,192 × 4 = 32,768 0.033
Conv2 16×16×64 16,384 × 4 = 65,536 0.066
MaxPool2 8×8×64 4,096 × 4 = 16,384 0.016
Conv3 8×8×128 8,192 × 4 = 32,768 0.033
MaxPool3 4×4×128 2,048 × 4 = 8,192 0.008
Total 298,208 bytes ~0.30 MB