CONVOLUTION
NEURAL NETWORKS
By: Ibrahim Isleem
Supervisor: Dr. Akram Abu Garad.
OVERVIEW
Theory of CNN
Feed forward details
Back propagation details
2
INTRODUCTION
Very Popular:
Toolboxes: tensorflow, cuda-convnet and caffe (user friendlier)
A high performance Classifier (multi-class)
Successful in object recognition, handwritten optical character OCR recognition, image noise
removal etc.
Easy to implementation
Slow in learning
Fast in classification
3
PART 1
THEORY OF CNN
Convolution Neural Networks
THE BASIC IDEA OF CONVOLUTION NEURAL NETWORKS
CNN
SAME IDEA AS BACK-PROPAGATION-NEURAL NETWORKS
(BPNN) BUT DIFFERENT IMPLEMENTATION
After vectorized (vec),
the 2D arranged inputs
become 1D vectors.
Then the network is
https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner%27s-
just like a BPNN
Guide-To-Understanding-Convolutional-Neural-Networks/ (Back propagation
neural networks )
5
BASIC
STRUCTURE
OF CNN 6
The convolution layer: see how to use convolution for feature identifier
THE BASIC STRUCTURE
Input conv. subs. conv subs fully fully output
• Alternating Convolution (conv) and subsampling layer (subs)
• Subsampling allows the features to be flexibly positioned
7
CONVOLUTION (CONV) LAYER:
EXAMPLE: FROM THE INPUT LAYER TO THE FIRST HIDDEN
LAYER
The first hidden layer
represents the filter
outputs of a certain
feature
So, what is a feature?
Answer is in the next
slide
8
CONVOLUTION (CONV) LAYER
IDEA OF A FEATURE IDENTIFIER
We would like to extract a curve (feature) from the image
9
DISCRETE CONVOLUTION:
CORRELATION IS MORE INTUITIVE
so we use correlation of the flipped version of h to implement
convolution[1]
1 4 1 1 1
I ,h ,find I * h
2 5 3 1 1 convolution
j k
C ( m, n ) h(m j, n k ) I ( j, k )
j k
Flipped h
j k
( m j , n k ) I ( j , k )
h ( flip)
j k
correlation 10
CORRELATION IS MORE INTUITIVE, SO WE USE
CORRELATION TO IMPLEMENT CONVOLUTION.
k k
1 4 1 1 1
𝐶 (𝑚,𝑛)=¿
k=1
I
,h
2
k=0 5 3 1 1
j=0 1 2 j j=0 1
j
Flip h k
1 1
h ( flip )
( m 0, n 0) j=0 ,
1
1 1 j
Discrete convolution I*h, flip h ,shift h and correlate with I [1]
11
DISCRETE CONVOLUTION I*H, FLIP H ,SHIFT H AND
CORRELATE WITH I [1] k
k
1 4 1 1 1
𝐶 (𝑚,𝑛)=¿
I ,h n
2 5 3 1 1
j j
j=0 1 C(m,n)
Flip h: is like this after the flip k
m
and no shift (m=0,n=0)
1 1 The trick: I(j=0,k=0) needs to
h ( flip )
( m 0, n 0) ,
1 since m=1, n=0, so we shift the
multiply to h (-m+0,-n+0),
(flip)
j
1 h pattern 1-bit to the right so
(flip)
we just multiply overlapped
Shift Flipped h to m=1,n=0 elements of I and h(flip).
k 1 1 Similarly, we do the same for all
h ( flip )
( m 1, n 0) , j
m,n values
1 1 12
FIND C(M,N)
𝐶 (𝑚,𝑛)=¿
Shift Flipped h to m=1,n=0
K
K 1 4 1
I
1 1 2 5 3
h ( flip )
(m 1, n 0) ,
J
1 1 J
multiply overlapped elements
and add (see next slide)
hence, C ( m 1, n 0) 2 5 3,
13
FIND C(M,N) 𝐶 (𝑚,𝑛)=¿
Shift Flipped h to m=1,n=0
K
1 4 1
I K
2 5 3
n
1 1 C(m,n)
h ( flip )
(m 1, n 0) ,J
1 1 J m
multiply overlapped elements
and add
C ( m 1, n 0) (2 1) (5 1) 3,
14
STEPS TO FIND C(M,N)
Step1: Step 3:
1 4 1 1 4 1
C(0,0) C(2,0)
2 5 3 2 5 3
=1x2=2 -1 1 = -1*5+1*3 -1 1
=-2
1 1 1 1
Step 2:
C(1,0) Step 4:
= -1*2+1*5=3 1 4 1 C(3,0) 1 4 1
2 5 3 = -1*3 2 5 3
-1 1 -1 1
=-3
1 1 1 1
C(0,0) C(1,0) C(2,0) C(3,0)
C(m,n)
C(0,0) C(1,0) C(2,0) C(3,0)
=
C(0,0)=2 C(1,0)=3 C(2,0)=-2 C(3,0)=-3
15
STEPS CONTINUE 1 4 1
Step 5: Step 7: -1 1
1 4 1
C(0,1) -1 1 C(2,1) 2 5 3
2 5 3 1 1
=1x1+1*2 1 1 = -1*4+1*1+1*5+1*3
=3 =5
1 4 1 Step 8:
-1 1
Step 6: 2 5 3 C(3,1) 1 4 1-1 1
1 1
C(1,1) = -1*1+1*3 2 5 3
1 1
= -1*1+1*4+1*2+1*5 =2
=10
C(0,2) C(1,2) C(2,2) C(3,2)
C(m,n)=
C(0,1)=3 C(1,1)=10 C(2,1)=5 C(3,1)=2
C(0,0)=2 C(1,0)=3 C(2,0)=-2 C(3,0)=-3
16
FIND ALL ELEMENTS IN C FOR
ALL POSSIBLE M,N
c( m 0, n 0) 2,
c( m 1, n 0) 2 5 3,
n
c( m 1, n 1) 10,
,...., etc. C(m,n)
n 1 5 1
5 m
I * h c[] 3 10 5 2
2 3 2 3
m 17
CONVOLUTION (CONV) LAYER
THE CURVE FEATURE IN AN IMAGE
So for this part of the image, there is such as a curve feature to be found.
18
CONVOLUTION (CONV) LAYER: WHAT DOES IT DO?
CONVOLUTION IS IMPLEMENTED BY CORRELATION
(SEE APPENDIX)
Correlation(X,B)=multiplication
of summation of image I and B=
0.
Image X has no curve feature B
Input image=X A curve feature=B
Correlation(A,B)=Multi_and_Sum
=
3*(30*50)+(20*30)+50*30=6600
is large . That means image A has
curve feature B
Input image=A A curve feature=B
19
TO COMPLETE THE
CONVOLUTION LAYER
After convolution (multiplication and summation) the output is passed on to a non-
linear activation function (Sigmoid or Tanh or Relu), same as Back –Propagation
NN
iI
y f (u ) with u w(i)x(i) b,
i 1
b bias, x input, w weight, u internal signal
x (i 1) Typically f () is an activation function,
w(i 1) e.g. logistic (sigmoid), i.e.
w(i 2) u 1
x (i 2 )
f (u ) , assume 1 for simplicity ,
f u y 1 e u
1
therefore y f (u )
x (i I ) w(I ) iI
( i ) x ( i ) b
1 e i 1
20
ACTIVATION FUNCTION
https://imiloainf.wordpress.com/2013/11/06/rectifier-nonlinearities/
https://www.simonwenkel.com/2018/05/15/activation-functions-for-neural-networks.html#softplus
CHOICES
Sigmoid :
1 ex
g(x) , g ' (x) g ( x) (1 g ( x))
1 e x
1 ex 2
Tanh :
sinh (x) 4
g(x) , g ' (x)
cosh( x) e x e x 2
Rectifier(hard Relu) :
Relu is now very popular
1, if x 0 and shown to be working
g(x) max ( 0 ,x), g ' (x) better other methods
0, if x 0
Softplus :
1
g(x) ln ( 1 e x ), g ' (x)
1 ex
21
EXAMPLE (LENET)
An implementation example http://yann.lecun.com/exdb/lenet/
Input C1:conv. S2:subs. C3:conv S4:subs C5:fully F6:fully output
• Each feature filter uses one kernel (e.g. 5x5) to generate a
feature map
• Each feature map represents the output of a particular feature
filter output.
• Alternating Convolution (conv) and subsampling layer
(subs)
• Subsampling allows the features to be flexibly positioned
22
(array of feature maps
INPUT C1: INPUT TO
CONVOLUTION
For each 5x5 kernel you need 5x5 weights (per
6x28x28
convolution layer) . Unlike a fully connected NN,
the weights are shared. Meaning that, when you 32x32
convolve the kernel with the input , only one set of
5x5 kernel (weights) is used for the convolution.
Convolution layer
Input one feature map, you only need 5x5 weights ,
one bias
For 6 feature maps you need 6x25x25. (very efficient)
Convolution layer
If this is to be implemented by a fully connected 5x5 kernel
layer without convolution
You need 32x32x28x28 weights.
For 6 feature maps need 6x 32x32x28x28 Front input to one of the 6 convolution maps
Training by backpropagation is the same as a
common back propagation neural network BPNN.
The kernel is like a feature extractor.
23
C1 S2: FROM CONVOLUTION MAPS
(C1) TO SUBSAMPLING TO MAPS (S2)
No weights involved,
just by calculation
Sample s=( )
One layer to the next
corresponding layer, no
cross layer connections.
It can be achieved by
one of two methods: From convolution maps
Take average : to subsampling to maps
s=(a+b+c+d)/4, or
Max pooling : s=
max(a,b,c,d) 24
S2C3: FROM SUBSAMPLING MAPS (S2)
TO THE NEXT CONVOLUTION LAYER (C3)
For each element in the C3
(feature) map, it is connected to all
6 feature maps from S2.
You need 5x5 weights for each
input from each S2 map, so
altogether you need 6x(5x5)
weights to generate a C3 feature
map from all 6 S2 feature maps.
How to add up the input to become
the output depends on your design.
Will show later
For all 16 feature maps of C3, you
need 16x(6x(5x5)) weights.
25
C3S4:FROM SUBSAMPLING MAPS (C3)
TO THE NEXT CONVOLUTION LAYER (S4)
No weights involved, just
by calculation
Sample s=( )
One layer to the next
corresponding layer, no
cross layer connections.
It can be achieved by two
methods:
Take average :
s=(a+b+c+d)/4, or
Max pooling : s=
max(a,b,c,d)
From C3S4
26
S4C5: FROM FULLY CONNECTION LAYER (S4) TO A FULLY
CONNECTION VECTOR (C5), AND SO ON
F6output
C5F6
S4C5 Fully connection vector
Each element in the fully connection vector (C5) connects to all elements in
S4 (S4 has 16x5x5 neurons). Therefore, we need 120x(16x5x5) weights
Likewise, from C5 to F6, you need 120x84 weights
From F6 to output , you need 84x10 weights.
27
DESCRIPTION
OF THE LAYERS
Subsampling 28
Layer to layer connections
SUBSAMPLING (SUBS)
Subsampling allows features to be flexibly positioned around a specific
area, example:
Subsample an output (a matrix of 2x2)
a b
Sample s=( c d)
It can be achieved by two methods:
Take average : s=(a+b+c+d)/4, or
Max pooling
Max pooling : s= max(a,b,c,d)
29
HOW TO FEED ONE FEATURE LAYER TO MULTIPLE
FEATURES LAYERS
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6
6 feature maps
You can combine multiple
feature maps of one layer
into one feature map in the
next layer
See next slide for details
30
31
EXAMPLE
Using a program 32
EXAMPLE: OVERVIEW OF
TEST_EXAMPLE_CNN.M
Read data base
Part I:
cnnsetup.m
– Layer 1: input layer (do nothing)
– Layer 2 convolution(conv.) Layer, output maps=6, kernel size=5x5
– Layer 3 sub-sample (subs.) Layer, scale=2
– Layer 4 conv. Layer, output maps =12, kernel size=5x5
– Layer 5 subs. Layer (output layer), scale =2
Part 2:
cnntrain.m % train weights using 60,000 samples
– cnnff( ) % CNN feed forward
– cnndb( ) % CNN feed back to train weighted in kernels
– cnnapplygrads( ) % update weights
cnntest.m % test the system using 10000 samples and show error rate
33
ARCHITECTURE Layer 34:
EXAMPLE 12 conv.
Maps (C)
Layer 12: Layer 23: Each output
Layer 1: InputMaps=6 Layer 45:
neuron
One input 6 conv.Maps (C) 6 sub-sample OutputMaps 12 sub-sample
InputMaps=6 Map (S) Map (S) corresponds
(I) =12
OutputMaps=6 InputMaps=6 Fan_in= InputMaps=12 to a
Fan_in=52=25 OutputMaps= 6x52=150 OutputMaps=12 character
Fan_out=6x52= 12 (0,1,2,..,9
Fan_out=
150 etc.)
12x52=300
Layer 1:
Layer 2 Layer 4 Layer 5
Image Layer 3
(hidden): (subsample):
Input (subsample): (hidden):
6x24x24 12x8x8 12x4x4
1x28x28 6x12x12
10
outputs
Conv.
Kernel Subs Kernel
=5x5 =5x5 Conv.
2x2 Subs
I=input
C=Conv.=convolution 2x2
34
S=Subs=sub sampling or mean or max pooling
DATA USED IN TRAINING OF A
NEURAL NETWORKS
Training set
Around 60-70 % of the total data
Used to train the system
Validation set (optional)
Around 10-20 % of the total data
Used to tune the parameters of the model of the system
Test set
Around 10-20 % of the total data
Used to test the system
Data in the above sets cannot be overlapped, the exact % depends on applications and your choice.
35
WARNING: HOW TO TRAIN A NEURAL
NETWORK TO AVOID DATA OVER
FITTING
Error from loss function
Over-fitting: the system works well for
training data but not testing data, so
extensive training may not help. Test error
What should we do: Use validation data curve using
to tune the system to reduce the test testing data
error at early stop.
Early stopping
test error at early stop
Training cycles (epoch) Training error
using training
data
https://stats.stackexchange.com/questions/131233/neural-network-over-fitting 36
SAME IDEA FROM THE VIEW POINT OF ACCURACY
https://www.researchgate.net/publication/313508637_Detection_and_characterization_of_Coordinate_Measuring_Ma-_chine_CMM_probes_using_dee
p_networks_for_improved_quality_assurance_of_machine_parts/figures?lo=1
By https://www.researchgate.net/profile/Binu_Nair
37
PART 2
FEEDFORWARD DETAILS
Matlab example 38
http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
Feed forward part of
cnnff( )
CNNFF.M
CONVOLUTION NEURAL NETWORKS FEED
FORWARD
This is the feed forward part
Assume all the weights are initialized or calculated, we show how to get the output from
inputs.
Ref: CNN Matlab example
http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox
39
LAYER 12 Convolute layer 1 with different kernels
(INPUT TO HIDDEN): (map_index1=1,2,.,6) and produce 6 output maps
Inputs :
input layer 1, a 28x28 image
Layer 1: Layer 12: 6 different kernels : k(1),.,,,k(6) , each k is 5x5, K are
One input 6 conv.Maps (C) dendrites of neurons
(I) InputMaps=6 Output : 6 output maps each 24x24
OutputMaps=6
Fan_in=52=25 Algorithm
Fan_out=6x52= For(map_index=1:6)
150 {
Layer 1: layer_2(map_index)=
Image Layer 2(c): I*k(map_index)valid
Input (i) 6x24x24 Map_index= }
1x28x28 1 Discussion
i
2 “Valid” means only consider overlapped areas, so if layer
Conv.*K(1) 1 is 28x28, kernel is 5x5 each, each output map is 24x24
:
In Matlab > use convn(I,k,’valid’)
6
Example:
Kernel I=rand(28,28)
Conv.*K(6)
j =5x5 k=rand(5,5)
2x2 size(convn(I,k,’valid’))
I=input
> ans
C=Conv.=convolution
> 24 24 40
S=Subs=sub sampling
LAYER 23:
(HIDDEN TO SUBSAMPLE) • Sub-sample layer 2 to layer 3
• Inputs :
Layer 23: • 6 maps of layer 2, each is
6 sub-sample 24x24
Map (S) • Output : 6 maps of layer 3,
InputMaps=6 each is 12 x12
OutputMaps=
12
• Algorithm
• For(map_index=1:6)
Layer 2 (c): Layer 3 (s): • {
6x24x24 6x12x12 • For each input map, calculate
Map_index=
the average of 2x2 pixels and
1
the result is saved in output
2
:
maps.
6 • Hence resolution is reduced
Subs from 24x24 to 12x12
2x2
• }
• Discussion
41
LAYER 34:
•
(SUBSAMPLE TO HIDDEN) Conv. layer 3 with kernels to produce layer
4
Layer 34: • Inputs :
12 conv. • 6 maps of layer3(L3{i=1:6}), each is
12x12
Maps (C) • Kernel set: totally 6x12 kernels, each is
InputMaps=6 5x5,i.e.
OutputMaps=12 • K{i=1:6}{j=1:12}, each K{i}{j} is 5x5
• 12 bias{j=1:12} in this layer, each is a
Fan_in= scalar
6x52=150 • Output : 12 maps of layer4(L4{j=1:12}),
Fan_out= each is 8x8
12x52=300
• Algorithm
Layer3 L3(s): Layer 4(c): net.layers{l}.a{j} • for(j=1:12)
6x12x12 12x8x8 • { for (i=1:6)
Index=i=1:6 Index=j=1:12 • {clear z, i.e. z=0;
• z=z+covn (L3{i}, k{i}{j},’valid’)] %z is 8x8
: • }
• L4{j}=sigm(z+bais{j}) %L4{j} is 8x8
• }
Kernel • function X = sigm(P)
• X = 1./(1+exp(-P));
=5x5
• End
Feature maps in the previous layer can be combined to
42
become feature maps in next layer
LAYER 45
(HIDDEN TO SUBSAMPLE) • Subsample layer 4 to layer
Layer 45:
12 sub-sample
5
Map (S) • Inputs :
InputMaps=12 • 12 maps of
OutputMaps=12 layer4(L4{i=1:12}), each
Layer 4: Layer 5:
12x8x8 12x4x4
is 12x8x8
• Output : 12 maps of
layer5(L5{j=1:12}), each
is 4x4
• Algorithm
Subs • Sub sample each 2x2 pixel
2x2 window in L4 to a pixel in
L5
10
43
LAYER 5OUTPUT:
• Subsample layer 4 to layer 5
(SUBSAMPLE TO OUTPUT) • Inputs :
Layer 45:
Totally
• 12 maps of layer5(L5{i=1:12}),
12 sub-sample each is 4x4, so L5 has 192 pixels
192 Each output in total
Map (S)
weights neuron • Output layer weights:
InputMaps=12
for each corresponds to Net.ffW{m=1:10}{p=1:192}, total
OutputMaps=12
output a character number of weights is 192
neuron (0,1,2,..,9 etc.)
Layer 5 (L5{j=1:12}:
12x4x4=192 net.o{m=1:10} • Output : 10 output neurons
(net.o{m=1:10})
Totally 192 pixels • Algorithm
• For m=1:10%each output neuron
: • {clear net.fv
: • net.fv=Net.ffW{m}{all 192
weight}.*L5(all corresponding 192
pixels)
• net.o{m}=sign(net.fv + bias)
• }
• Discussion
Same for each output neuron
10
44
PART 3
BACK
PROPAGATION
DETAILS
Back propagation part
cnnbp( )
45
cnnapplyweight( )
CNNBP( )
OVERVIEW (OUTPUT BACK TO LAYER 5)
E
( y t ) y (1 y ) xi
wi
in _ cnnbp.m
out.o y
net.e ( y t )
E
( y t ) y (1 y ) xi wi
xi
E 1
net.od net.e . * (net.o . * (1 - net.o))
xi wi
E
net.od * wi net.e . * (net.o . * (1 - net.o)) * wi
xi
so in code cnnbp.m
E
net.fvd (net.ffW' * net.od)
xi
46
CALCULATE GRADIENT
From later 2 to layer 3
From later 3 to layer 4
Net.ffW
Net.ffb found
The method is similar to a typical Back propagation neural network BPNN
47
DETAILS OF CALC
GRADIENTS
% part % reshape feature vector deltas into output map style
L4(c) run expand only
L3(s) run conv (rot180, fill), found d
L2(c) run expand only
%Part %% calc gradients
L2(c) run conv (valid), found dk and db
L3(s) not run here
L4(c) run conv(valid), found dk and db
Done , found these for the output layer L5:
net.dffW = net.od * (net.fv)' / size(net.od, 2);
net.dffb = mean(net.od, 2);
48
CNNAPPLYGRADS(NET, OPTS)
For the convolution layers, L2, L4
From k and dk find new k (weights)
From b and db find new b (bias)
For the output layer L5
net.ffW = net.ffW - opts.alpha * net.dffW;
net.ffb = net.ffb - opts.alpha * net.dffb;
opts.alpha is to adjust learning rate
49
SUMMARY
Studied the basic operation of Convolutional Neural networks (CNN)
Demonstrate how a simple CNN can be implemented
50
REFERENCES
Wiki
http://en.wikipedia.org/wiki/Convolutional_neural_network
http://en.wikipedia.org/wiki/Backpropagation
Matlab programs
Neural Network for pattern recognition- Tutorial
http://www.mathworks.com/matlabcentral/fileexchange/19997-neural-network-for
-pattern-recognition-tutorial
CNN Matlab example
http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-tool
box
CNN tutorial
http://cogprints.org/5869/1/cnn_tutorial.pdf
51