0% found this document useful (0 votes)
16 views71 pages

Module 3 A

Uploaded by

anoop042004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views71 pages

Module 3 A

Uploaded by

anoop042004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

This Session

• Neural Network and Image


– Dimensionality
– Local relationship

• Convolutional Neural Network (CNN)


– Convolution Layer
– Non-linearity Layer
– Pooling Layer
– Fully Connected Layer
– Classification Layer

• ImageNet Challenge
– Progress
– Human Level Performance
Neural Networks

Source: [Link]
Multi-layer Neural Network & Image

How to apply NN over Image?


Multi-layer Neural Network & Image

Stretch pixels
in single
column vector
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems ?
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems:
High dimensionality
Local relationship
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems: Solution ?
High dimensionality
Local relationship
Multi-layer Neural Network & Image

Stretch pixels
in single
column vector

Problems: Solution:
High dimensionality Convolutional Neural Network
Local relationship
Convolutional Neural Networks
• Also known as
CNN,
ConvNet,
DCN

• CNN = a multi-layer neural network with


1. Local connectivity
2. Weight sharing
CNN: Local Connectivity

Hidden layer

Input layer

Global connectivity Local connectivity


• # input units (neurons): 7
• # hidden units: 3
• Number of parameters
– Global connectivity: ?
– Local connectivity: ?
CNN: Local Connectivity

Hidden layer

Input layer

Global connectivity Local connectivity


• # input units (neurons): 7
• # hidden units: 3
• Number of parameters
– Global connectivity: ?
– Local connectivity: ?
CNN: Local Connectivity

Hidden layer

Input layer

Global connectivity Local connectivity


• # input units (neurons): 7
• # hidden units: 3
• Number of parameters
– Global connectivity: 3 x 7 = 21
– Local connectivity: 3 x 3 = 9
CNN: Weight Sharing

Hidden layer

w1 w3 w5 w7 w9 w1 w3 w2 w1 w3
w2 w4 w6 w8 w2 w1 w3 w2

Input layer

Without weight sharing With weight sharing

• # input units (neurons): 7


• # hidden units: 3
• Number of parameters
– Without weight sharing: ?
– With weight sharing : ?
CNN: Weight Sharing

Hidden layer

w1 w3 w5 w7 w9 w1 w3 w2 w1 w3
w2 w4 w6 w8 w2 w1 w3 w2

Input layer

Without weight sharing With weight sharing

• # input units (neurons): 7


• # hidden units: 3
• Number of parameters
– Without weight sharing: ?
– With weight sharing : ?
CNN: Weight Sharing

Hidden layer

w1 w3 w5 w7 w9 w1 w3 w2 w1 w3
w2 w4 w6 w8 w2 w1 w3 w2

Input layer

Without weight sharing With weight sharing

• # input units (neurons): 7


• # hidden units: 3
• Number of parameters
– Without weight sharing: 3 x 3 = 9
– With weight sharing : 3 x 1 = 3
Convolutional Neural Networks

Source: cs231n, Stanford University


Layers used to build ConvNets
Input Layer (Input image)

Convolutional Layer

Non-linearity Layer (such as Sigmoid, Tanh, ReLU, PReLU,


ELU, Swish, etc.)

Pooling Layer (such as Max Pooling, Average Pooling, etc.)

Fully-Connected Layer

Classification Layer (Softmax, etc.)


Convolutional Layer

32×32×3 Image -> preserve spatial structure

Width
32

Height 32

3 Depth
Convolutional Layer

32×32×3 Image

Width
32
5×5×3 Filter

Height 32

Convolve the filter with the image i.e.


“slide over the image spatially, computing
3 Depth dot products”
Convolutional Layer
Handling multiple input channels
Filters always extend the full depth of
the input volume
32×32×3 Image

Width
32
5×5×3 Filter

Height 32

Convolve the filter with the image i.e.


“slide over the image spatially, computing
3 Depth dot products”
Convolutional Layer

32×32×3 Image

Width
weight mask
32

5×5×3 Filter

Height 32

3 Depth
Convolutional Layer

32×32×3 Image

Width
weight mask
32

5×5×3 Filter

A single value
Height 32
the result of taking a dot product
between the filter and a small 5x5x3
chunk of the image (i.e. 5*5*3 = 75-
3 Depth dimensional dot product + bias)
wT.x + b
Convolutional Layer

32×32×3 Image

Width
weight mask
32

5×5×3 Filter

A single value
Height 32
the result of taking a dot product
between the filter and a small 5x5x3
chunk of the image (i.e. 5*5*3 = 75-
3 Depth dimensional dot product + bias)
wT.x + b
Convolutional Layer
Activation map

32×32×3 Image

Width
weight mask
32
28
5×5×3 Filter

Height 32
convolve (slide)
over all spatial 28
locations
3 Depth 1
Convolutional Layer
Handling multiple output maps
32×32×3 Image Activation maps

Width
weight mask
32
28
5×5×3 Filter

Height 32
Second filter
28

3 Depth 1 1
Convolutional Layer
Handling multiple output maps
32×32×3 Image Activation maps

Width
weight mask
32
28
5×5×3 Filter

Height 32
Third filter
28

3 Depth 1 1 1
Convolutional Layer
Handling multiple output maps
32×32×3 Image Activation maps

Width
weight mask
32
28
5×5×3 Filter

Height 32
Total 96 filters
28

3 Depth
Depth of output volume: 96
Image Source: cs231n, Oxford University
Image Source: cs231n, Oxford University
Image Source: cs231n, Oxford University
Convolutional Layer
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation
functions

32×32×3 Image Activation maps

32
28

32
CONV, 28
e.g.
96
3 5x5x3 96
filters
Convolutional Layer
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation
functions

32×32×3 Image Activation maps

32
28
One
number
5×5×96
Filter
32
CONV, 28
e.g.
96
3 5x5x3 96
filters
Convolutional Layer
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation
functions

32×32×3 Image Deeper activation


Activation maps
map

32
28
24

5×5×96
Filter
32
CONV, 28 24
e.g. convolve (slide)
96 over all spatial
3 5x5x3 96 locations 1
filters
Convolutional Layer
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation
functions

32×32×3 Image Deeper activation


Activation maps
maps

32
28
24

128
5×5×96
32 Filters
CONV, 28
e.g. 24
96
3 5x5x3 96
filters 128
Multilayer Convolution
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation
functions

...
CONV, CONV, CONV
e.g. e.g.
96 128
32 5x5x3 28 5x5x96 24
filters filters

32 28 24

3 96 128
Any Convolution Layer
• Local connectivity
• Weight sharing
• Handling multiple input channels
• Handling multiple output maps
Weight sharing

Local connectivity

# input channels # output (activation) maps Image credit: A. Karpathy


A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

7
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

7
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

7
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

7
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter

7
5×5 output
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 2
7
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 2
7
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 2
7
3×3 output
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 3
7
A closer look at spatial dimensions
7
7×7 input (spatially)
assume 3×3 filter
applied with stride 3
7
doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.
A closer look at spatial dimensions
N
Output size
(N - F) / stride + 1
F

F N e.g. N = 7, F = 3
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33
A closer look at spatial dimensions

...
CONV, CONV, CONV
e.g. e.g.
96 128
32 5x5x3 28 5x5x96 24
filters filters

32 28 24

3 96 128

E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks


volumes spatially! (32 -> 28 -> 24 ...). Shrinking too fast is not
good, doesn’t work well.
Source: cs231n, Stanford University
In practice: common to zero pad

0 0 0 0 0 0 0 0 0 e.g. input 7×7 (spatially)


0 0 3×3 filter, applied with stride 1
0 0 pad with 1 pixel border
0 0
0 0 What is the output dimension?
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0
In practice: common to zero pad

0 0 0 0 0 0 0 0 0 e.g. input 7×7 (spatially)


0 0 3×3 filter, applied with stride 1
0 0 pad with 1 pixel border
0 0
0 0 7×7 Output
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0
In practice: common to zero pad

0 0 0 0 0 0 0 0 0 e.g. input 7×7 (spatially)


0 0 3×3 filter, applied with stride 1
0 0 pad with 1 pixel border
0 0
0 0 7×7 Output
0 0
in general, common to see CONV
layers with stride 1, filters of size
0 0
F×F, and zero-padding with
0 0
(F-1)/2. (will preserve size spatially)
0 0 0 0 0 0 0 0 0
e.g.
F = 3 => zero pad with 1
F = 5 => zero pad with 2
F = 7 => zero pad with 3
Example
Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2

Output volume size: ?


Example
Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2

Output volume size:


(32+2*2-5)/1+1 = 32 spatially, so
32x32x10
Example
Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2

Number of parameters in this layer?


Example
Input volume: 32x32x3
10 5x5 filters with stride 1, pad 2

Number of parameters in this layer?

each filter has


5*5*3 + 1 = 76 params (+1 for bias)
=> 76*10 = 760
Source: cs231n, Stanford University
Convolution as feature extraction

Source: cs231n, Stanford University


Non-linearity Layer

Source: cs231n, Stanford University


Pooling Layer
- makes the representations smaller and more manageable
- operates over each activation map independently:

Source: cs231n, Stanford University


Max Pooling

Source: cs231n, Stanford University


Pooling Layer

Source: cs231n, Stanford University


Fully Connected Layer
• Connect every neuron in one layer to every neuron in
another layer

• Same as the traditional multi-layer perceptron neural


network

Image Source: [Link]


Fully Connected Layer
• Connect every neuron in one layer to every neuron in
another layer

• Same as the traditional multi-layer perceptron neural


network

No. of Neurons (Last FC)


= No. of classes

Image Source: [Link]


Loss/Classification Layer
• SVM Classifier (SVM Loss/Hinge Loss/Max-
margin Loss)

• Softmax Classifier (Softmax Loss/Cross-


entropy Loss)
A typical CNN structure

Image Source: [Link]


A typical CNN structure

Image Source: [Link]


ImageNet Challenge
Validation classification

Validation classification • ~14 million labeled images, 20k


Validation classification classes

• Images gathered from Internet

• Human labels via Amazon MTurk

• Challenge: 1.2 million training


images, 1000 classes

[Link]/challenges/LSVRC/
Progress on ImageNet Challenge
ImageNet Image Classification Top5 Error
18 16.4
16
14
11.7
12
10
8 7.3 6.7
6
4 3.57 3.06
2.251
2
0
Progress on ImageNet Challenge
ImageNet Image Classification Top5 Error
18 16.4
16
14
11.7
12
10
8 7.3 6.7
6
4 3.57 3.06
2.251
2
0

Best Non-ConvNet in 2012: 26.2%


Things to remember
• Neural network and Image
– Neuroscience, Perceptron, Problems due to High
Dimensionality and Local Relationship
• Convolutional neural network (CNN)
– Convolution Layer,
– Nonlinearity Layer,
– Pooling Layer,
– Fully Connected Layer,
– Loss/Classification Layer
• Progress on ImageNet challenge
– Latest SENet, Winner 2017
Acknowledgements
• Thanks to the following researchers for making their teaching/research
material online
– Forsyth
– Steve Seitz
– Noah Snavely
– J.B. Huang
– Derek Hoiem
– D. Lowe
– A. Bobick
– S. Lazebnik
– K. Grauman
– R. Zaleski
– Antonio Torralba
– Rob Fergus
– Leibe
– And many more ………..

You might also like