0% found this document useful (0 votes)
20 views101 pages

Ispr 21 22 CNN

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their components, architectures, and applications in machine vision. It covers topics such as convolution, pooling, and strides, as well as the historical development of CNNs from early models to advanced architectures. Additionally, it includes practical coding examples using Keras to implement CNNs.

Uploaded by

Lamiss Kara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views101 pages

Ispr 21 22 CNN

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their components, architectures, and applications in machine vision. It covers topics such as convolution, pooling, and strides, as well as the historical development of CNNs from early models to advanced architectures. Additionally, it includes practical coding examples using Keras to implement CNNs.

Uploaded by

Lamiss Kara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

Convolutional Neural

Networks
INTELLIGENT SYSTEMS FOR PATTERN RECOGNITION (ISPR)

DAVIDE BACCIU – DIPARTIMENTO DI INFORMATICA - UNIVERSITA’ DI PISA

[email protected]
Lecture Outline
○ Introduction and historical perspective
○ Dissecting the components of a CNN
● Convolution, stride, pooling
○ CNN architectures for machine vision
● Putting components back together
● From LeNet to ResNet
○ Advanced topics
● Interpreting convolutions Split in two
● Advanced models and applications lectures

DAVIDE BACCIU - ISPR COURSE 2


CNN Lecture – Part I
Introduction
Convolutional Neural Networks

DAVIDE BACCIU - ISPR COURSE 4


Introduction
Convolutional Neural Networks

Destroying Machine Vision research since 2012


DAVIDE BACCIU - ISPR COURSE 5
Neocognitron
○ Hubel-Wiesel (‘59) model of brain
visual processing
● Simple cells responding to localized
features
● Complex cells pooling responses of
simple cells for invariance
○ Fukushima (‘80) built the first
hierarchical image processing
architecture exploiting this model
Trained by unsupervised learning
DAVIDE BACCIU - ISPR COURSE 6
CNN for Sequences
○ Apply a bank of 16 convolution kernels
to sequences (windows of 15 elements)
○ Trained by backpropagation with
parameter sharing
○ Guess who introduced it?
…yeah, HIM!

Time delay neural network


(Waibel & Hinton, 1987)
DAVIDE BACCIU - ISPR COURSE 7
CNN for Images

First convolutional neural network for images dates back to 1989 (LeCun)

DAVIDE BACCIU - ISPR COURSE 8


Dense Vector Multiplication
Processing images: the dense way
32x32x3 image
Reshape it into An input-sized weight vector
a vector for each hidden neuron

𝒙 𝑾
3072
100x3072
𝑻
𝑾𝒙
Each element contains
the activation of 1 neuron 100

DAVIDE BACCIU - ISPR COURSE 10


About invariances
MLPs are positional

We (most likely) need


translation
invariance!

• If we unfold the two images into two vectors, the features


identifying the cat will be in different positions
• But this still remains a picture of a cat, which we would like to
classify as such irrespectively of its position in the image

DAVIDE BACCIU - ISPR COURSE 11


An inductive bias to keep in mind
Nearby pixels are
more correlated
than far away
ones

The input
representation
should not
destroy pixel
relationships (like
vectorization
does)
DAVIDE BACCIU - ISPR COURSE 12
Convolution (Refresher)
filter
5x5

sum 25 multiplications + bias


32x32

Matrix input preserving


spatial structure

DAVIDE BACCIU - ISPR COURSE 13


Adaptive Convolution
1 0 1 𝑐1 = 𝑤1 + 𝑤3 + 2𝑤4 + 3𝑤5 +4𝑤6 + 𝑤7 + 𝑤9
2 3 4 𝑐1

1 0 1

1 0 1
0 2 0 𝑐2

1 0 1
𝑐2 = 𝑤1 + 𝑤3 + 2𝑤5 + 𝑤7 + 𝑤9
𝒘𝑇 𝒙2,2 𝒘𝑇 𝒙9,7
𝑤1 𝑤2 𝑤3
𝑤4 𝑤5 𝑤6 Convolutional filter (kernel) with
𝑤7 𝑤8 𝑤9 (adaptive) weights 𝑤𝑖
DAVIDE BACCIU - ISPR COURSE 14
Convolutional Features

Convolution
features

32x32 28x28
Slide the filter on the image computing
elementwise products and summing up

DAVIDE BACCIU - ISPR COURSE 15


Multi-Channel Convolution
Convolution filter
has a number of
slices equal to
the number of
5x5x3 image channels
32x32x3

DAVIDE BACCIU - ISPR COURSE 16


Multi-Channel Convolution

28x28

All channels are typically convolved together


o They are summed-up in the convolution
o The convolution map stays bi-dimensional
DAVIDE BACCIU - ISPR COURSE 17
Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1

DAVIDE BACCIU - ISPR COURSE 18


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1

stride = 1

DAVIDE BACCIU - ISPR COURSE 19


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1

stride = 1

DAVIDE BACCIU - ISPR COURSE 20


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1

stride = 1

DAVIDE BACCIU - ISPR COURSE 21


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1
○ Can define a different stride
● Hyperparameter

stride = 2

DAVIDE BACCIU - ISPR COURSE 22


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1
○ Can define a different stride
● Hyperparameter

stride = 2

DAVIDE BACCIU - ISPR COURSE 23


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1
○ Can define a different stride
● Hyperparameter

stride = 2

DAVIDE BACCIU - ISPR COURSE 24


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1
○ Can define a different stride
● Hyperparameter

stride = 2

DAVIDE BACCIU - ISPR COURSE 25


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1
○ Can define a different stride
● Hyperparameter

stride = 2
Works in both directions!

DAVIDE BACCIU - ISPR COURSE 26


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1
○ Can define a different stride
● Hyperparameter
○ Stride reduces the number of
stride = 3 multiplications
● Subsamples the image

DAVIDE BACCIU - ISPR COURSE 27


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1
○ Can define a different stride
● Hyperparameter
○ Stride reduces the number of
stride = 3 multiplications
● Subsamples the image

DAVIDE BACCIU - ISPR COURSE 28


Stride
○ Basic convolution slides the filter
on the image one pixel at a time
● Stride = 1
○ Can define a different stride
● Hyperparameter
○ Stride reduces the number of
stride = 3 multiplications
● Subsamples the image

DAVIDE BACCIU - ISPR COURSE 29


Activation Map Size
What is the size of the image after application of a filter with a given
size and stride?
W=7

Take a 3x3 filter with stride 1


K=3, S=1
H=7

Output image is: 5x5

DAVIDE BACCIU - ISPR COURSE 30


Activation Map Size
What is the size of the image after application of a filter with a given
size and stride?
W=7

Take a 3x3 filter with stride 2


K=3, S=2
H=7

Output image is: 3x3

DAVIDE BACCIU - ISPR COURSE 31


Activation Map Size
What is the size of the image after application of a filter with a given
size and stride?
W=7
General rule
𝑊 − 𝐾
H=7 𝑊′ = +1
𝑆

𝐻−𝐾
𝐻 = +1
𝑆

DAVIDE BACCIU - ISPR COURSE 32


Activation Map Size
What is the size of the image after application of a filter with a given
size and stride?
W=7

Take a 3x3 filter with stride 3


K=3, S=3
H=7

Output image is: not really and


image!

DAVIDE BACCIU - ISPR COURSE 33


Zero Padding
Add columns and rows of zeros to the border of the image
W=7
0 0 0 0 0 0 0 0 0

0
H=7
0

DAVIDE BACCIU - ISPR COURSE 34


Zero Padding
Add columns and rows of zeros to the border of the image
W=7 (P=1)
0 0 0 0 0 0 0 0 0
K=3, S=1
0

0
H=7 Output image is?
0
(P=1) 0
𝑊 − 𝐾 + 2𝑃
0
𝑊′ = +1
0 𝑆
0
7x7
DAVIDE BACCIU - ISPR COURSE 35
Zero Padding
Add columns and rows of zeros to the border of the image
W=7 (P=1)
0 0 0 0 0 0 0 0 0
Zero padding serves to retain
0
the original size of image
0

0 𝐾−1
H=7 0 𝑃=
(P=1) 2
0

0 Pad as necessary to perform


0 convolutions with a given
0 stride S
DAVIDE BACCIU - ISPR COURSE 36
Feature Map Transformation
𝒘𝑇 𝒙𝑖,𝑗 + 𝑏 𝒎𝒂𝒙(𝟎, 𝒘𝑇 𝒙𝑖,𝑗 + 𝑏)

32x32x3 32x32 32x32

○ Convolution is a linear operator


○ Apply an element-wise nonlinearity to obtain a transformed feature map

DAVIDE BACCIU - ISPR COURSE 37


Pooling
○ Operates on the feature map to make the representation
● Smaller (subsampling)
● Robust to (some) transformations
W=4
1 1 2 4 W’=2

5 6 7 8 Max pooling 6 8
H=4 H’=2
3 2 1 0 2x2 filters 3 4
stride = 2
1 2 3 4 pooled map
feature map

DAVIDE BACCIU - ISPR COURSE 38


Pooling Facts
○ Max pooling is the one used more frequently, but other forms are
possible
● Average pooling
● L2-norm pooling
● Random pooling
○ It is uncommon to use zero padding with pooling

𝑊−𝐾
𝑊′ = +1
𝑆

DAVIDE BACCIU - ISPR COURSE 39


The Convolutional Architecture
To next layer

Convolutional layer
○ An architecture made by a
hierarchical composition of the
Pooling
(max)
basic elements
○ Convolution layer is an
Nonlinearity abstraction for the composition
(ReLu)
of the 3 basic operations
Convolutional Filters ○ Network parameters are in the
(Strided adaptive conv)
convolutional component
Input

DAVIDE BACCIU - ISPR COURSE 40


A Bigger Picture
Dense
connectivity
Sparse connectivity

CL 4 Output
CL 3
FCL 2
CL 2

CL 1 FCL 1
Input

CL -> Convolutional Layer Contains several convolutional filters


FCL -> Fully Connected Layer with different size and stride
DAVIDE BACCIU - ISPR COURSE 41
Convolutional Filter Banks
Feature map
+ nonlinearity
𝐷𝐾 convolutional Pooling
filters of size 𝐾 × 𝐾

𝐾 × 𝐾 × 𝐷𝐼 × 𝐷𝐾
𝐻′′ × 𝑊′′ × 𝐷𝐾
𝐻 × 𝑊 × 𝐷𝐼 𝐻′ × 𝑊′ × 𝐷𝐾

Number of model parameters due Pooling is often (not always)


to this convolution element (add applied independently on the 𝐷𝐾
𝐷𝐾 bias terms) convolutions

DAVIDE BACCIU - ISPR COURSE 42


Specifying CNN in Code (Keras)
Number of convolution filters 𝐷𝑘 Define input size (only first hidden layer)

model = Sequential()
model.add(Conv2D(32, kernel_size=(5, 5), strides=(1, 1),
activation='relu',
input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(64, (5, 5))
model.add(Activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1000, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

Does for you all the calculations to determine the final size to the
dense layer

DAVIDE BACCIU - ISPR COURSE 43


A (Final?) Note on Convolution
○ We know that discrete convolution between an image 𝐼 and a
filter/kernel 𝐾 is
(𝐼 ∗ 𝐾)(𝑖, 𝑗) = σ𝑚 σ𝑛 𝐼 𝑖 − 𝑚, 𝑗 − 𝑛 𝐾(𝑚, 𝑛)
and it is commutative.
○ In practice, convolution implementation in DL libraries does not
flip the kernel
(𝐼 ∗ 𝐾)(𝑖, 𝑗) = σ𝑚 σ𝑛 𝐼 𝑖 + 𝑚, 𝑖 + 𝑛 𝐾(𝑚, 𝑛)
Which is cross-correlation and it is not commutative.

DAVIDE BACCIU - ISPR COURSE 44


CNN as a Sparse Neural Network
Let us take a 1-D input (sequence) to ease graphics
Convolution

b b b
c a c b b
a c a c a

Input

Convolution amounts to sparse connectivity (reduce parameters)


with parameter sharing (enforces invariance)
DAVIDE BACCIU - ISPR COURSE 45
Dense Network
The dense counterpart would look like this

DAVIDE BACCIU - ISPR COURSE 46


Strided Convolution
Make connectivity sparser

DAVIDE BACCIU - ISPR COURSE 47


Max-Pooling and Spatial Invariance
A feature is detected even if it is spatially translated

Pooling

Feature map

Pooling

Feature map

DAVIDE BACCIU - ISPR COURSE 48


Cross Channel Pooling and Spatial Invariance

Feature Feature Feature Feature


map 1 map 3 map 1 map 3

Input Input
DAVIDE BACCIU - ISPR COURSE 49
Hierarchical Feature Organization
The deeper the larger the receptive field of a unit

DAVIDE BACCIU - ISPR COURSE 50


Zero-Padding Effect
Assuming
no pooling

DAVIDE BACCIU - ISPR COURSE 51


CNN Lecture – Part II
CNN Training
Variants of the standard backpropagation that account for the fact that
connections share weights (convolution parameters)
𝑎1 𝑎2 𝑎3
𝑤1 𝑤3 The gradient ∆𝑤𝑖 is obtained by
𝑤1 𝑤3
𝑤3 𝑤2 summing the contributions from all
𝑤2
𝑤2
𝑤1 connections sharing the weight

Backpropagating gradients from convolutional layer N to N-1 is not as simple


as transposing the weight matrix (need deconvolution with zero padding)

DAVIDE BACCIU - ISPR COURSE 53


Backpropagating on Convolution
Convolution
K=3, S=1 Input is a 4x4 image
Output is a 2x2 image
Backpropagation step requires
going back from the 2x2 to the
4x4 representation
Can write convolution as dense multiplication with shared weights

Backpropagation is performed by multiplying the 4x1 representation to the


transpose of this matrix
DAVIDE BACCIU - ISPR COURSE 54
Deconvolution (Transposed Convolution)
We can obtain the transposed convolution using the same logic of the forward
convolution
K=3, S=1, P=0

If you had no padding in the forward convolution, you need to pad much
when performing transposed convolution

DAVIDE BACCIU - ISPR COURSE 55


Deconvolution (Transposed Convolution)
If you have striding, you need to fill in the convolution map with zeroes to
obtain a correctly sized deconvolution
K=3, S=2, P=1

https://github.com/vdumoulin/conv_arithmetic

DAVIDE BACCIU - ISPR COURSE 56


LeNet-5 (1989)

○ Grayscale images
○ Filters are 5x5 with stride 1 (sigmoid nonlinearity)
○ Pooling is 2x2 with stride 2
○ No zero padding

DAVIDE BACCIU - ISPR COURSE 57


AlexNet (2012) - Architecture
ImageNet Top-5 : 15.4%

○ RGB images 227x227x3


○ 5 convolutional layers + 3 fully connected layers
○ Split into two parts (top/bottom) each on 1 GPU

DAVIDE BACCIU - ISPR COURSE 58


Data Augmentation
Key intuition - If I
have an image
with a given label,
I can transform it
(by flipping,
rotation, etc) and
the resulting
image will still
have the same
label

DAVIDE BACCIU - ISPR COURSE 59


AlexNet - Innovations

○ Use heavy data augmentation (rotations, random crops, etc.)


○ Introduced the use of ReLu
○ Dense layers regularized by dropout

DAVIDE BACCIU - ISPR COURSE 60


ReLU Nonlinearity

Non zero- Dead Units!!!


Saturation centered

○ ReLu help counteract gradient vanish


● Sigmod first derivative vanishes as we increase or decrease z
● ReLu first derivative is 1 when unit is active and 0 elsewhere
● ReLu second derivative is 0 (no second order effects)
○ Easy to compute (zero thresholding)
○ Favors sparsity
DAVIDE BACCIU - ISPR COURSE 61
AlexNet - Parameters

○ 62.3 millions of parameters (6% in convolutions)


○ 5-6 days to train on two GTX 580 GPUs (95% time in convolutions)

DAVIDE BACCIU - ISPR COURSE 62


VGGNet – VGG16 (2014)
ImageNet Top-5 : 7.3%

○ Standardized convolutional layer


● 3x3 convolutions with stride 1
● 2x2 max pooling with stride 2 (not after every convolution)
○ Various configuration analysed, but best has
● 16 Convolutional + 3 Fully Connected layers
● About 140 millions parameters (85% in FC)
DAVIDE BACCIU - ISPR COURSE 63
GoogLeNet (2015)
ImageNet Top-5 : 6.7%

• Kernels of different size to


capture details at varied
Why 1x1 Inception Module
scale
convolutions?
• Aggregated before sending to
next layer
• Average pooling
• No fully connected layers
DAVIDE BACCIU - ISPR COURSE 64
1x1 Convolutions are Helpful
Take 5 kernels

1x1x64
56x56x64 56x56x5
By placing 1x1 convolutions before larger kernels in the Inception module, the
number of input channels is reduced, saving computations and parameters

DAVIDE BACCIU - ISPR COURSE 65


Back on GoogLeNet

Auxiliary outputs
○ Only 5 millions of parameters to inject gradients
○ 12X less parameters than AlexNet at deeper layers
○ Followed by v2, v3 and v4 of the Inception module
● More filter factorization
● Introduce heavy use of Batch Normalization

DAVIDE BACCIU - ISPR COURSE 66


Batch Normalization
○ Very deep neural network are subject to internal covariate shift
● Distribution of inputs to a layer N might vary (shift) with different minibatches (due to
adjustments of layer N-1)
● Layer N can get confused by this
● Solution is to normalize for mean and variance in each minibatch (bit more articulated
than this actually)
𝑁𝑏 𝑥𝑖 − 𝜇𝑏 𝑦 = 𝛾𝑥ො𝑖 + 𝛽 Scale and shift
1 𝑥ො𝑖 =
𝜇𝑏 = ෍ 𝑥𝑖
𝑁𝑏
𝑖=1
𝜎𝑏2 + 𝜖 Trainable linear transform potentially
𝑁𝑏
allowing to cancel unwanted zero-
1 centering effects (e.g. sigmoid)
𝜎𝑏2 = ෍ 𝑥𝑖 − 𝜇𝑏 2
Normalization
𝑁𝑏 Need to backpropagate through this!
𝑖=1

DAVIDE BACCIU - ISPR COURSE 67


ResNet (2015)
Begin of the Ultra-Deep Network Era (152 Layers) ImageNet Top-5 : 3.57%

Why wasn’t this working


before?

Gradient vanishes when backpropagating too deep!


DAVIDE BACCIU - ISPR COURSE 68
ResNet Trick

𝐹(𝑋) + 𝑋 The input to the block 𝑋 bypasses the


+
convolution and is then combined with its
𝐹(𝑋) ReLu
residual 𝐹(𝑋) resulting from the convolutions
3x3
convolution
Residual 𝑋
ReLu
block
3x3 When backpropagating the gradient flows in full
convolution
through these bypass connections
𝑋
DAVIDE BACCIU - ISPR COURSE 69
ResNet & Batch Norm

When connecting several Residual Blocks in series, one need to be


careful about amplification/compounding of variance due to the
residual connectivity
• Batch norm can alleviate this effect

DAVIDE BACCIU - ISPR COURSE 70


MobileNets
Making CNNs efficient to run on mobile
devices by depthwise separable
convolutions

Basically run channel-independent


convolutions followed by 1x1
convolutions for cross-channel mixing

arxiv.org/pdf/1704.04861.pdf

DAVIDE BACCIU - ISPR COURSE 71


CNN Architecture Evolution

DAVIDE BACCIU - ISPR COURSE 72


Transfer learning

Use (part of) a model


trained (pretrained) by
someone on large dataset
as a “feature-extractor”
on problems with fewer
data, fine tuning only the
predictor part
Understanding CNN Embedding

tSNE projection of AlexNet last


hidden dense layer

https://cs.stanford.edu/people/karpathy/cnnembed/

DAVIDE BACCIU - ISPR COURSE 74


Interpreting Intermediate Levels
○ What about the information captured in convolutional layers?
○ Visualize kernel weights (filters)
● Naïve approach
● Works only for early convolutional layers
○ Map the activation of the convolutional kernel back in pixel space
● Requires to reverse convolution
● Deconvolution

Zeiler&Fergus, Visualizing and Understanding Convolutional Networks, ICML 2013

DAVIDE BACCIU - ISPR COURSE 75


Deconvolutional Network (DeConvNet)

○ Attach a DeConvNet to a target layer


○ Plug an input and forward propagate activations until layer
○ Zero activations of target neuron
○ Backpropagate on the DeConvNet and see what parts of the reconstructed
image are affected
DAVIDE BACCIU - ISPR COURSE 76
Inspect Deconvolution Layers
Deconv 14x14 Pooling Deconv 28x28 ….

DAVIDE BACCIU - ISPR COURSE 77


Filters & Patches – Layer 1

Reconstructed filters in pixel space


Corresponding top-9 image patches
Zeiler&Fergus, Visualizing and Understanding Convolutional Networks, ICML 2013

DAVIDE BACCIU - ISPR COURSE 78


Filters & Patches – Layer 2

Zeiler&Fergus, Visualizing and Understanding Convolutional Networks, ICML 2013

DAVIDE BACCIU - ISPR COURSE 79


Filters & Patches – Layer 3

Zeiler&Fergus, Visualizing and Understanding Convolutional Networks, ICML 2013

DAVIDE BACCIU - ISPR COURSE 80


Filters & Patches – Layer 4

Zeiler&Fergus, Visualizing and Understanding Convolutional Networks, ICML 2013

DAVIDE BACCIU - ISPR COURSE 81


Filters & Patches – Layer 5

Zeiler&Fergus, Visualizing and Understanding


Convolutional Networks, ICML 2013

DAVIDE BACCIU - ISPR COURSE 82


Occlusions
o Measure what happens to feature maps and object classification if we
occlude part of the image
o Slide a grey mask on the image and project back the response of the best
filters using deconvolution

DAVIDE BACCIU - ISPR COURSE 83


Occlusions

Zeiler&Fergus, Visualizing and Understanding Convolutional Networks, ICML 2013

DAVIDE BACCIU - ISPR COURSE 84


Dense CNN

Transition layers
batch normalization + 1×1 convolutional +
2×2 average pooling layer
batch normalization + ReLU + 3x3 conv

o Gradient flows well in bypass connections


o Each layer in the dense block has access to
all information from previous layers
Huang et al, Densely Connected Convolutional Networks, CVPR 2017

DAVIDE BACCIU - ISPR COURSE 85


Causal Convolutions
Preventing a convolution from allowing to see into the future…

time

Problem is the context size grows slow with depth

DAVIDE BACCIU - ISPR COURSE 86


Causal & Dilated Convolutions
(𝐼 ∗ 𝐾)(𝑖, 𝑗) = σ𝑚 σ𝑛 𝐼 𝑖 − 𝑙𝑚, 𝑖 − 𝑙𝑛 𝐾(𝑚, 𝑛)

Similar to striding, but size is preserved


Oord et al, WaveNet: A Generative Model for Raw Audio, ICLR 2016

DAVIDE BACCIU - ISPR COURSE 87


Semantic Segmentation

Traditional CNN cannot be used for this task due to the


downsampling of the striding and pooling operations

DAVIDE BACCIU - ISPR COURSE 89


Fully Convolutional Networks (FCN)

Convolutional part to extract


interesting features at various Learn an upsampling function of the fused
scales map to generate the semantic
segmentation map

Fuse information from feature maps of different scale


Shelhamer et at, Fully Convolutional Networks for Semantic Segmentation, PAMI 2016

DAVIDE BACCIU - ISPR COURSE 90


Deconvolution Architecture

Maxpooling indices transferred to decoder to improve the segmentation


resolution.

Badrinarayanan et al, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, PAMI 2017

DAVIDE BACCIU - ISPR COURSE 91


SegNet Segmentation

Demo here: http://mi.eng.cam.ac.uk/projects/segnet/

DAVIDE BACCIU - ISPR COURSE 92


U-Nets (Big on Biomedical Images)
Low level information transfer by Pixel mask in output
concatenation of early feature (a bit smaller than
Few convolutional maps original image)
layers at different
resolutions
Pooling layers
High level
visual features

Upconvolution
(Deconvolution)
Use Dilated Convolutions
Always perform 3x3 convolutions with no pooling at each level

Level 1 Level 2 Level 3

Context increases without


o Pooling (changes map size)
o Increasing computational complexity
Yu et al, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016

DAVIDE BACCIU - ISPR COURSE 94


Segmentation by Dilated CNN
Dilated CNN GT Dilated CNN GT

Yu et al, Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016

DAVIDE BACCIU - ISPR COURSE 95


Object Detection
Object Detection: Faster R-CNN

Crop, fuse and


polish bounding
boxes proposals

Generate bounding boxes


proposals
• x,y position
Any CNN of your
• size
choice that can
• confidence
produce a feature
map

Source: S. Yeung, BIODS 220


Software
○ CNN are supported by any deep learning framework (Keras-TF,
Pytorch, MS Cognitive TK, Intel OpenVino, …)
○ Caffe was one of the initiators and basically built around CNN
● Introduced protobuffer network specification
● ModelZoo of pretrained models (LeNet, AlexNet, …)
● Support for GPU
● Project converged into PyTorch now

DAVIDE BACCIU - ISPR COURSE 98


Caffe Protobuffer
name: "LeNet"
layer {
name: "data"
type: "Input"

input_param { shape: { dim: 64 dim: 1 dim: 28 dim: 28 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"

convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}

DAVIDE BACCIU - ISPR COURSE 99


Other Software
○ Matlab distributes its Neural Network Toolbox which allows
importing pretrained models from Keras-TF
○ Want to have a CNN in your browser?
● Try ConvNetJS (https://cs.stanford.edu/people/karpathy/convnetjs/)

DAVIDE BACCIU - ISPR COURSE 100


GUIs
Major hardware producers have GUI and toolkits wrapping Caffe,
Intel OpenVino
Keras-TF to play with CNNs
NVIDIA Digits

Barista

Plus
others…

DAVIDE BACCIU - ISPR COURSE 101


Take Home Messages
o Key things
• Convolutions in place of dense multiplications allow sparse connectivity and weight
sharing
• Pooling enforces invariance and allows to change resolution but shrinks data size
• Full connectivity compress information from all convolutions but accounts for 90% of
model complexity
o Lessons learned
• ReLU are efficient and counteract gradient vanish
• 1x1 convolutions are useful
• Need batch normalization
• Bypass connections allow to go deeper
o Dilated (à trous) convolutions
o You can use CNN outside of machine vision

DAVIDE BACCIU - ISPR COURSE 102


Next Lecture
Gated Recurrent Networks
○ Learning with sequential data PART I

○ Gradient issues
○ Gated RNN
● Long-Short Term Memories (LSTM)
● Gated Recurrent Units (GRU)
○ Advanced topics PART II
● Understanding and exploiting memory encoding
● Applications
DAVIDE BACCIU - ISPR COURSE 103

You might also like