0% found this document useful (0 votes)
19 views71 pages

Convolutional Neural Networks

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their structure and functionality in image classification. It discusses key components such as convolution layers, non-linearity layers, and pooling layers, emphasizing local connectivity and weight sharing. The document also highlights the training and testing phases of CNNs, including feature extraction and the minimization of error through back-propagation.

Uploaded by

Rashika Bangroo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views71 pages

Convolutional Neural Networks

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their structure and functionality in image classification. It discusses key components such as convolution layers, non-linearity layers, and pooling layers, emphasizing local connectivity and weight sharing. The document also highlights the training and testing phases of CNNs, including feature extraction and the minimization of error through back-propagation.

Uploaded by

Rashika Bangroo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

Convolutional Neural Networks

Source: Computer Vision Course by Dr. Shiv Ram Dubey


https://sites.google.com/site/iiitscv/spring2018
Previous class
• Gradient Descent
– Back prop
– Chain rule

• Perceptron/Neuron
– A non-linear function

• Multilayer Neural Networks


– Hidden layers
– Deep networks
Assignment 3
Results: Number of Training Testing
neurons accuracy accuracy
For both SSD 5
and 6
7
Cross-entropy loss .

Functions. .
.
Today’s class
• Overview of image classification using hand-
crafted features

• Convolutional Neural Network (CNN)


– Convolution Layer
– Non-linearity Layer
– Pooling Layer
Image Categorization: Training phase

Training Training
Images
Training Labels

Image Classifier Trained


Features Training Classifier

Ex: Assignment 3
Features are obtained using a CNN model
Image Categorization: Testing phase

Training Training
Images
Training Labels

Image Classifier Trained


Features Training Classifier

Testing
Image Trained Prediction
Features Classifier Outdoor
Test Image
Features are the Keys

SIFT [Loewe IJCV 04] LBP [Ojala et al. PAMI 02]

HOG [Dalal and Triggs CVPR 05]


SPM [Lazebnik et al. CVPR 06]

Color Descriptor [Van De Sande et al. PAMI 10]


Neural Networks

Source: http://cs231n.github.io
Multi-layer Neural Network
• A non-linear classifier
• Training: find network weights w to minimize the
error between true and estimated outputs of
training examples:

• Minimization can be done by gradient descent


provided is differentiable
• This training method is called
back-propagation
Multi-layer Neural Network
• A non-linear classifier
• Training: find network weights w to minimize the
error between true and estimated outputs of
training examples:

• Minimization can be done by gradient descent


provided is differentiable
• This training method is called
back-propagation
Multi-layer Neural Network
• A non-linear classifier
• Training: find network weights w to minimize the
error between true and estimated outputs of
training examples:

• Minimization can be done by gradient descent


provided is differentiable
• This training method is called
back-propagation
Deep Learning:
Learning a Hierarchy of Feature Extractors

• Each layer of hierarchy extracts low level to high level


features progressively.
• All the way from pixels  classifier

Image/Video
Image/video
Pixels Layer 1 Layer 2 Layer 3 Labels
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems ?
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems:
High dimensionality (200x200x3=120000)
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems:
High dimensionality (200x200x3=120000)
Local relationship
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems: Solution ?
High dimensionality
Local relationship
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems: Solution:
High dimensionality Convolutional Neural Network
Local relationship
Convolutional Neural Networks

Source: cs231n, Stanford University


Convolutional Neural Networks (CNN)
• Also known as
ConvNet,
DCNN,
DNN

• CNN = a multi-layer neural network with


1. Local connectivity
2. Weight sharing
CNN: Local Connectivity

Hidden layer

Input layer

Global connectivity Local connectivity


• # input units (neurons): 7
• # hidden units: 3
• Number of parameters
– Global connectivity: ?
– Local connectivity: ?
CNN: Local Connectivity

Hidden layer

Input layer

Global connectivity Local connectivity


• # input units (neurons): 7
• # hidden units: 3
• Number of parameters
– Global connectivity: ?
– Local connectivity: ?
CNN: Local Connectivity

Hidden layer

Input layer

Global connectivity Local connectivity


• # input units (neurons): 7
• # hidden units: 3
• Number of parameters
– Global connectivity: 3 x 7 = 21
– Local connectivity: 3 x 3 = 9
https://stats.stackexchange.com/questions/159588/how-does-local-con
nection-implied-in-the-cnn-algorithm
CNN: Weight Sharing

Hidden layer

w1 w3 w5 w7 w9 w1 w3 w2 w1 w3
w2 w4 w6 w8 w2 w1 w3 w2

Input layer

Without weight sharing With weight sharing

• # input units (neurons): 7


• # hidden units: 3
• Number of parameters
– Without weight sharing: ?
– With weight sharing : ?
CNN: Weight Sharing

Hidden layer

w1 w3 w5 w7 w9 w1 w3 w2 w1 w3
w2 w4 w6 w8 w2 w1 w3 w2

Input layer

Without weight sharing With weight sharing

• # input units (neurons): 7


• # hidden units: 3
• Number of parameters
– Without weight sharing: ?
– With weight sharing : ?
CNN: Weight Sharing

Hidden layer

w1 w3 w5 w7 w9 w1 w3 w2 w1 w3
w2 w4 w6 w8 w2 w1 w3 w2

Input layer

Without weight sharing With weight sharing

• # input units (neurons): 7


• # hidden units: 3
• Number of parameters
– Without weight sharing: 3 x 3 = 9
– With weight sharing : 3 x 1 = 3
Layers used to build ConvNets
Input Layer (Input image)

Convolutional Layer (Today’s discussion)

Non-linearity Layer (such as Sigmoid, Tanh, ReLU, etc.)

Pooling Layer (Today’s Discussion)

Fully-Connected Layer (exactly as seen in Artificial Neural


Networks (ANN))

Classification Layer (Softmax, SVM loss, etc.)


Convolutional Layer

32×32×3 Image

Width
32

Height 32

3 Depth
Convolutional Layer

32×32×3 Image

Width
32
5×5×3 Filter

Height 32

Convolve the filter with the image i.e.


“slide over the image spatially, computing
3 Depth dot products”
Convolutional Layer
Filters always extend the full depth of
the input volume
32×32×3 Image

Width
32
5×5×3 Filter

Height 32

Convolve the filter with the image i.e.


“slide over the image spatially, computing
3 Depth dot products”
Convolutional Layer

32×32×3 Image

Width
weight mask
32

5×5×3 Filter

Height 32

3 Depth
Convolutional Layer

32×32×3 Image

Width
weight mask
32

5×5×3 Filter

A single value
Height 32
the result of taking a dot product
between the filter and a small 5x5x3
chunk of the image (i.e. 5*5*3 = 75-
3 Depth dimensional dot product + bias)
wT.x + b
Convolutional Layer
Activation map

32×32×3 Image

Width
weight mask
32
28
5×5×3 Filter

Height 32
convolve (slide)
over all spatial 28
locations
3 Depth 1
Convolutional Layer
Handling multiple output maps
32×32×3 Image Activation maps

Width
weight mask
32
28
5×5×3 Filter

Height 32
Second filter
28

3 Depth 1 1
Convolutional Layer
Handling multiple output maps
32×32×3 Image Activation maps

Width
weight mask
32
28
5×5×3 Filter

Height 32
Third filter
28

3 Depth 1 1 1
Convolutional Layer
Handling multiple output maps
32×32×3 Image Activation maps

Width
weight mask
32
28
5×5×3 Filter

Height 32
Total 96 filters
28

3 Depth
Depth of output volume: 96
Convolutional Layer

32×32×3 Image Activation maps

32
28

32
CONV, 28
e.g.
96
3 5x5x3 96
filters
Convolutional Layer

32×32×3 Image Activation maps

32
28
One
number
5×5×96
Filter
32
CONV, 28
e.g.
96
3 5x5x3 96
filters
Convolutional Layer

32×32×3 Image Deeper activation


Activation maps
map

32
28
24

5×5×96
Filter
32
CONV, 28 24
e.g. convolve (slide)
96 over all spatial
3 5x5x3 96 locations 1
filters
Convolutional Layer

32×32×3 Image Deeper activation


Activation maps
maps

32
28
24

128
5×5×96
32 Filters
CONV, 28
e.g. 24
96
3 5x5x3 96
filters 128
Multilayer Convolution

...
CONV, CONV, CONV
e.g. e.g.
96 128
32 5x5x3 28 5x5x96 24
filters filters

32 28 24

3 96 128
Any Convolution Layer
• Local connectivity
• Weight sharing

Weight sharing

Local connectivity

# input channels # output (activation) maps Image credit: A. Karpathy


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

7
5×5 output

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 2
7

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 2
7

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 2
7
3×3 output

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 3
7

Source: cs231n, Stanford University


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 3
7
doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.

Source: cs231n, Stanford University


A closer look at spatial dimensions
N
Output size
(N - F) / stride + 1
F

F N e.g. N = 7, F = 3
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33

Source: cs231n, Stanford University


In practice: common to zero pad
0 0 0 0 0 0 0 0 0 e.g. input 7×7 (spatially)
0 0 3×3 filter, applied with stride 1
0 0 pad with 1 pixel border
0 0
0 0 What is the output dimension?
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0

Source: cs231n, Stanford University


In practice: common to zero pad
0 0 0 0 0 0 0 0 0 e.g. input 7×7 (spatially)
0 0 3×3 filter, applied with stride 1
0 0 pad with 1 pixel border
0 0
0 0 7×7 Output
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0

Source: cs231n, Stanford University


In practice: common to zero pad
0 0 0 0 0 0 0 0 0 e.g. input 7×7 (spatially)
0 0 3×3 filter, applied with stride 1
0 0 pad with 1 pixel border
0 0
0 0 7×7 Output
0 0 in general, common to see CONV
0 0
layers with stride 1, filters of size
F×F, and zero-padding with
0 0
(F-1)/2. (will preserve size spatially)
0 0 0 0 0 0 0 0 0
e.g.
F = 3 => zero pad with 1
F = 5 => zero pad with 2
F = 7 => zero pad with 3

Source: cs231n, Stanford University


A closer look at spatial dimensions

...
CONV, CONV, CONV
e.g. e.g.
96 128
32 5x5x3 28 5x5x96 24
filters filters

32 28 24

3 96 128

E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks


volumes spatially! (32 -> 28 -> 24 ...). Shrinking too fast is not
good.
Source: cs231n, Stanford University
Example
Input volume: 32x32x3
10 filters of dimension 5x5 with
stride 1, pad 2

Output volume size: ?

Source: cs231n, Stanford University


Example
Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2

Output volume size:


(32+2*2-5)/1+1 = 32 spatially, so
32x32x10

Source: cs231n, Stanford University


Example
Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2

Number of parameters in this layer?

Source: cs231n, Stanford University


Example
Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2

Number of parameters in this layer?

each filter has


5*5*3 + 1 = 76 params (+1 for bias)
=> 76*10 = 760

Source: cs231n, Stanford University


Source: cs231n, Stanford University
Pooling Layer
- makes the representations smaller and more manageable
- operates over each activation map independently:

Source: cs231n, Stanford University


Max Pooling

Source: cs231n, Stanford University


Pooling Layer

Source: cs231n, Stanford University


Convolutional Neural Networks

Feature maps

Spatial pooling

Non-linearity

Convolution
(Learned)

Input Image

slide credit: S. Lazebnik


LeNet
• Neural network with specialized
connectivity structure
• Stack multiple stages of feature
extractors
• Higher stages compute more global,
more invariant features
• Classification layer at the end

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,


Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278–2324, 1998.
AlexNet
• Similar framework to LeCun’98 but:
• Bigger model (7 hidden layers, 650,000 units, 60,000,000 params)
• More data (106 vs. 103 images)
• GPU implementation (50x speedup over CPU)
• Trained on two GPUs (3GB each) for a week

A. Krizhevsky, I. Sutskever, and G. Hinton,


ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012
Gradient-Based Learning Applied to Document
Recognition, LeCun, Bottou, Bengio and Haffner, Proc. of
the IEEE, 1998

Imagenet Classification with Deep Convolutional Neural


Networks, Krizhevsky, Sutskever, and Hinton, NIPS 2012
Slide Credit: L. Zitnick
Resources
• http://deeplearning.net/
– Hub to many other deep learning resources

• https://github.com/ChristosChristofidis/awesome-deep-learn
ing
– A resource collection deep learning

• https://github.com/kjw0612/awesome-deep-vision
– A resource collection deep learning for computer vision

• http://cs231n.stanford.edu/syllabus.html
– Nice course on CNN for visual recognition
Things to remember
• Overview
– multi-layer neural networks
• Convolutional neural network (CNN)
– Convolution,
– nonlinearity,
– max pooling.

You might also like