CS490 ̶ Advanced Topics in Computing
(Deep Learning)
Lecture 16: Convolutional Neural Networks (CNNs)
Dr. Muhammad Shahzad
[email protected]Department Of Computing (DOC),
School of Electrical Engineering & Computer Science (SEECS),
National University of Sciences & Technology (NUST)
12/04/2021
Fully Connected Layer
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 2
Motivation: Deep Learning on Images
How many entries does
the weight matrix 𝑤 1
has assuming that the
12288-dimensional first hidden layer have
input vector 1000 units?
64 x 64 x 3 3 Billion!
Shape of 𝒘𝟏 is 1000 x 3M
i.e., adding 1000 biases,
we need to train more
1000 x 1000 x 3 3 Million-dimensional input than 3 Billion parameters
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 3
Convolutional Neural Networks
▪ Similar to regular Neural Networks except that they make the
explicit assumption that the inputs are images, which allows us to
encode certain properties into the architecture
▪ These then make the forward function more efficient to implement
and vastly reduce the amount of parameters in the network, e.g.,
using local receptive field and parameter sharing scheme
A ConvNet is made up of Layers
Every Layer has a simple API: It transforms an input 3D volume to an output 3D
volume with some differentiable function that may or may not have parameteras
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 4
Layers used to build ConvNets
▪ A ConvNet architecture is in the simplest case a list of Layers that
transform the image volume into an output volume (e.g. holding
the class scores)
▪ Three main types of layers that are stacked to build ConvNet
architectures:
► Convolutional Layer
► Pooling Layer
► Fully-Connected Layer (exactly as seen in regular Neural
Networks)
▪ Each Layer accepts an input 3D volume and transforms it to an
output 3D volume through a differentiable function
▪ Each Layer may or may not have parameters (e.g. CONV/FC do,
RELU/POOL don’t)
▪ Each Layer may or may not have additional hyperparameters (e.g.
CONV/FC/POOL do, RELU doesn’t)
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 5
How does Convolution work?
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 6
Edge Detection Via Convolution Operation
3x1 + 1x1 + 2x1 + 0x0 + 5x0 + 7x0 + 1x(-1) + 8x(-1) + 2x(-1) = -5
-5
1 0 -1
1 0 -1
1 0 -1
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 7
Edge Detection Via Convolution Operation
0x1 + 5x1 + 7x1 + 1x0 + 8x0 + 2x0 + 2x(-1) + 9x(-1) + 5x(-1) = -4
-5 -4
1 0 -1
1 0 -1
1 0 -1
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 8
Edge Detection Via Convolution Operation
1x1 + 8x1 + 2x1 + 2x0 + 9x0 + 5x0 + 7x(-1) + 3x(-1) + 1x(-1) = 0
-5 -4 0
1 0 -1
1 0 -1
1 0 -1
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 9
Edge Detection Via Convolution Operation
1x1 + 6x1 + 2x1 + 7x0 + 2x0 + 3x0 + 8x(-1) + 8x(-1) + 9x(-1) = -16
-5 -4 0 8
1 0 -1
-10 -2 2 3
1 0 -1
0 -2 -4 -7
1 0 -1
-3 -2 -3 -16
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 10
How does Convolution work?
▪ Convolution of the image with a filter (also called kernel,
window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 11
How does Convolution work?
▪ Convolution of the image with a filter (also called kernel,
window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 12
How does Convolution work?
▪ Convolution of the image with a filter (also called kernel,
window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 13
How does Convolution work?
▪ Convolution of the image with a filter (also called kernel,
window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Similarly we can also compute image derivatives to compute
edges in the input image
Any idea what could be the filter coefficients?
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 14
Edge Detection Via Convolution Operation
The natural derivative operator can be defined as the
difference between the intensity of neighbouring pixels
f
= f ( x + 1) − f ( x)
x
z1 z2 z3
z4 z5 z6
z7 z8 z9
z5 = -1 z6 = -1
z8 = 1 z9 = 1
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 15
Edge Detection Via Convolution Operation
Vertical edges
Horizontal edges
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 16
Edge Detection Via Convolution Operation
10x1 + 10x1 + 10x1 + 0x0 + 0x0 + 0x0 + 0x(-1) + 0x(-1) + 0x(-1) = 30
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 17
Edge Detection Via Convolution Operation
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 18
Learning To Detect Edges
3 -3
2 -2 10 -10
3 -3
Prewitt Sobel Schar
With the rise of deep
learning, it is possible to
automatically learn these
filter coefficients more
robustly via backpropagation
for a specific task e.g., edge
detection
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 19
Edge Detection Via Convolution Operation
Vertical edges
Horizontal edges
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 20
Spatial Dimensions: A Closer Look
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 21
Spatial Dimensions: A Closer Look
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 22
Spatial Dimensions: A Closer Look
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 23
Spatial Dimensions: A Closer Look
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 24
Spatial Dimensions: A Closer Look
Output
dimension?
7
5x5 output
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 25
Spatial Dimensions: A Closer Look
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 26
Spatial Dimensions: A Closer Look
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 27
Spatial Dimensions: A Closer Look
Output
dimension?
7
3x3 output
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 28
Spatial Dimensions: A Closer Look
Doesn’t fit!
7 Cannot apply
3x3 filter on
7x7 input 7x7 input with
(spatially) stride 3
assume
3x3 filter
applied with
stride 3
7
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 29
Spatial Dimensions: A Closer Look
Output size?
(N - F) / stride + 1
E.g., with N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 30
Common Practice: Zero Padding At Borders
(N+2P-F)/stride + 1
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 31
Valid vs Same Convolutions
(N+2P-F)/stride + 1
▪ Valid convolution: The spatial dimensions of the resulting image
after convolution shrinks
▪ Same convolution: The spatial dimensions of the resulting image
after the convolution stays the same
► Acheived via zero-padding
(N+2P-F)/S + 1 = N
For S=1,
N+2P-F + 1 = N
=> P = (F-1)/2
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 32
Convolution Layer
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 33
Convolution Over Volumes
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 34
Convolution Over Volumes
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 35
Convolution Over Volumes
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 36
Convolution Over Volumes
6x6x3 3x3x3 4x4
Note we have now 27 learnable coefficients
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 37
Convolution Over Volumes
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 38
Convolution Over Volumes
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 39
Convolution Over Volumes
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 40
Convolutional Layer: Neuron View
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 41
Receptive Field
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 42
Convolutional Layer: Neuron View
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 43
Single Convolutional Layer
with 6 5x5x3 filters
𝑤1
(75x6 entries)
𝑎0 𝑎1
𝑧 1 = 𝑤 1 𝑎 0 + 𝑏1
𝑎1 = 𝑔(𝑧1 )
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 44
ConvNets
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 45
ConvNets
Flatten the last volume, e.g., 24 x 24 x 10 volume into 5760-d vector of
neurons and feed them to Fully Connected (FC) layer followed by a softmax
unit for prediction
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 46
Example
Input volume: 32x32x3
10 5x5x3 filters with stride 1, pad 2
Output volume size?
(32+2*2-5)/1+1 = 32 spatially, so
32x32x10
Number of parameters in this layer?
each filter has 5*5*3 + 1 = 76 params (+1 for bias)
=> 76*10 = 760
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 47
ConvNet Dimensions
Common settings:
K = (powers of 2, e.g. 32, 64, 128, 512)
- F = 3, S = 1, P = 1
- F = 5, S = 1, P = 2
- F = 5, S = 2, P = ? (whatever fits)
- F = 1, S = 1, P = 0
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 48
1x1 convolution
2x
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 49
1x1 convolution
2x
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 50
1x1 convolution layer
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 51
Pooling Layer
▪ Makes the representations smaller and more manageable
▪ Operates over each activation map independently
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 52
MAX Pooling
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 53
Average-Pooling
3.75 1.25
4 2
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 54
MAX-Pooling
What would be the
results of appliying
Max-POOl using
F=3&S=1?
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 55
MAX-Pooling
9 9 5
9 9 5
8 6 9
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 56
Pooling Dimensions
Common settings:
F = 2, S = 2
F = 3, S = 2
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 57
Example: ConvNets
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 58
Summary of Typical ConvNet Design
▪ ConvNets stack CONV,POOL,FC layers
▪ Trend towards smaller filters and deeper architectures
▪ Trend towards getting rid of POOL/FC layers (just CONV)
▪ Historically architectures looked like
[(CONV-RELU)*N-POOL?]*M - (FC-RELU)*K, SOFTMAX
where N is usually up to ~5, M is large, 0 <= K <= 2
▪ However, recent advances such as ResNet/GoogLeNet have
challenged this paradigm
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 59
CNNs vs FC Neural Networks
Two major advantages of CNNs over FC neural networks
▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
For a regular neural network with 32 x 32 x 3(= 3072) convolved
dense connections, this means you with 6 filters 5 x 5 x 3 resulting
have 3072 x 4704 ≈ 14 Million weights in 28 x 28 x 6 volume (= 4704)
How many parameters do we
need for Conv layer?
32 x 32 x 3 28 x 28 x 6
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 60
CNNs vs FC Neural Networks
Two major advantages of CNNs over FC neural networks
▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
32 x 32 x 3(= 3072) convolved
(75 + 1) x 6 = 456 only with 6 filters 5 x 5 x 3 resulting
in 28 x 28 x 6 volume (= 4704)
How many parameters do we
need for Conv layer?
32 x 32 x 3 28 x 28 x 6
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 61
CNNs vs FC Neural Networks
Two major advantages of CNNs over FC neural networks
▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
▪ Sparsity of connections (i.e., Local receptive field)
► In each layer, each output value depends only on a small
number of inputs
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 62
Acknowledgements
Various contents in this presentation have been taken from
different books, lecture notes (particularly CS231n Stanford, MIT
6.S191, deeplearning.ai & neuralnetworksanddeeplearning.com),
and the web. These solely belong to their owners and are here used
only for clarifying various educational concepts. Any copyright
infringement is not intended.
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 63