0% found this document useful (0 votes)
38 views48 pages

DeepLearning Unit-II

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their structure, operations, and advantages over fully connected networks. It explains the convolution process, including the use of filters, padding, and pooling layers, as well as the significance of hyperparameters. Additionally, it includes practical examples and case studies, such as the application of CNNs on the MNIST dataset.

Uploaded by

srinujpt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views48 pages

DeepLearning Unit-II

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their structure, operations, and advantages over fully connected networks. It explains the convolution process, including the use of filters, padding, and pooling layers, as well as the significance of hyperparameters. Additionally, it includes practical examples and case studies, such as the application of CNNs on the MNIST dataset.

Uploaded by

srinujpt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Syllabus

 CNN: Introduction
 Striding and Padding
 Pooling layers
 Structure
 Operations and prediction of CNN with layers
 CNN - Case study with MNIST
 CNN vs Fully Connected
Convolutional Neural Networks (Conv Net)
• Convolutional Neural Network (CNN) is a type of Deep Learning neural network
architecture commonly used in Computer Vision. Computer vision is a field of Artificial
Intelligence that enables a computer to understand and interpret the image or visual data.
• The role of ConvNet is to transform the images into a form that is easier to process, without
losing features that are critical for prediction.
• It uses a special technique called Convolution.
Convolutional Neural Networks (Conv Net)
In CNN, every image is represented in the form of an array of pixel values.

 Computers can not see things as we do, for computers image is nothing but a matrix
Convolutional Neural Network(CNN)

Order to be followed
1. Convolutional Layer

2. Pooling Layer ( optional )

3. Flattening(unrolling)

4. Fully connected layer

Note 1: The first layer must be convolutional layer

Note 2: At the end we can take any [Link] fully connected layers
Convolutional Neural Network (Structure)
Why Convolutions?
• Convolutions reduce the number of parameters and speed up the training of the
model significantly
• For example 14 million parameters in a fully connected layer can be reduced
to just 156 parameters in case of convolutional layer
Advantages of convolutional layers over fully connected layers:
1. Parameter sharing: In convolutions, a single filter is convolved over the entire
input. Due to this, the parameters are shared between input and output nodes
2. Sparsity of connections: For each layer, each output value depends on a small
number of inputs, instead taking account all the inputs i.e the weights of most of
the connections are zeros
ANN ( input is vector) CNN (input is image)

In CNN, the number of parameters/weights is


In ANN, all the layers are fully connected
independent of the size of the image. It depends on
layers, due to this there will be millions of
the filter size. Advantages of CNN:
parameters/weights (on each connection
1. Sparsity of connections
separate weight) which increases
2. Parameter sharing (Bolded connections are with 0
computational complexity
weights)
1. Convolution Layer
Convolution operation:
• Filter/Kernel : When this Kernel
(K) is convolved with the input
image F(x,y), it creates a new
convolved image amplifying the
edges. Also known as feature map.
• Other filters can be applied to
detected different types of features.
For example, some filters detect
horizontal edges, others detect
vertical edges, some other filters
detect more complex shapes like
corners and so on.
Convolution Layer
Convolution operation: hyper parameters
1. Filter/Kernel : It is a weight matrix

2. Stride: It is the number of steps (pixels) the filter is to be moved horizontally or vertically over the input
matrix. When stride is 1, then we move the filter to 1 step (pixel) at a time and when stride is 2, then we
move the filter to 2 steps (pixels) at a time etc
3. Padding: It is the process of adding borders with zeros to input matrix (default padding is 0)
4. [Link] filters: for 3D input
• For 1-Dimensional (signal) : Parameters I,II are used
• For 2-Dimensional (gray image) : Parameters I, II, III are used
• For 2-Dimensional (color image) : Parameters I,II, III, IV are used
2-Dimensional Convolution
Gray Scale Image (1 Channel)

3 0 1 2 7 4
1 5 8 9 3 1
2 7 23 50 250 3
5 55 34 3 1 89
67 45 4 56 34 23
17 13 17 20 23 16

6X6 Matrix

Each Pixel value is in the range of 0-255

• Conversion of gray scale image into a 2-dimensional matrix of pixels


2-Dimensional Convolution
Convolution Operation : Gray Scale image (2D)

Convolutional Gray Scale image Example


operation

input size (2D) nx x ny 6x6


filter size fxf 3x3
stride s = 1,2,3, … s = 1 (default)
padding p = 0, 1, 2,… p = 0 (default)
output size (2D) (nx +2p-f )/s + 1 x (ny+2p-f )/s + 1 ((6+2*0-3)/1+1) x (6+2*0-3)/1+1))
o/p matrix - 4 x 4
Convolution Operation - Gray Scale Image
3 0 1 2 7 4 Convolution operator

1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9 3x3 4x4
Filter output
6x6
Gray scale image

So, we take the first 3 X 3 matrix from the 6 X 6 image and multiply it with the filter. Now, the first element of the
4 X 4 output will be the sum of the element-wise product of these values, i.e. 3*1 + 0 + 1*-1 + 1*1 + 5*0 + 8*-1 +
2*1 + 7*0 + 2*-1 = -5. To calculate the second element, we will shift the filter one step towards the right and
again get the sum of the element-wise product
Convolution Operation - Gray Scale Image

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1 0 -1
0 -2 -4 -7
1 0 -1 -3 -2 -3 -16
4 2 1 6 2 8
2 4 5 2 3 9
Convolution Operation - Gray Scale Image

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
Convolution Operation - Gray Scale Image

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
Convolution Operation - Gray Scale Image

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
0 1 3 1 7 8 * 1
1
0
0
-1
-1
0 -2 -4 -7
4 2 1 6 2 8 -3 -2 -3 -16
2 4 5 2 3 9
3-Dimensional convolution
Colour Image (3 Channels)

Each Pixel is in the range of 0-255


Convolution operation - Color image
input : 6 X 6 X 3 input : 3D
filter : 3 X 3 X 3 filter : 3D
output : 4 X 4 output : 2D

Keep in mind that the number of channels in the input and filter should be same
After convolution, the output shape is a 4 X 4 matrix
First element of the output is the sum of the element-wise product of the 27 values from the input ( 9
values from each channel ) and the 27 values from the filter
After that we convolve over the entire image
Convolution operation
• Each convolutional layer contains 1 or more convolutional filters. The number of filters in
each CONV layers determines the depth of the next layer because each filter produces its own
feature map
• The CONV layers are the hidden layers. And to increase the number of neurons in hidden
layers, we increase the number of kernels in CONV layers.
• Each Kernel unit is considered a neuron.
• Kernel_size is one of the hyperparameters that you will be setting when building a
convolutional layer.
CONV layers in Keras
• [Link](Conv2D(filters=16, kernel_size=3, stride=’1’, padding='same', activation='relu'))
Padding
• Convolving a 6x6 input with a 3x3 filter results in 4x4 output i.e the input size is not
retained after convolution.
Disadvantages:
1. The size of the image shrinks, after convolution operation
2. Pixels present in the corners of the image are used only a few no. of times as compared
to central pixels - we do not focus too much on the corners, it can lead to information loss.
To overcome these issues, we use padding i.e padding the image with an additional border
i.e we add one pixel all around the edges There are two common choices for padding:
0 0 0 0 0
• valid: It means no padding i.e p=0
0 0 1 2 0
• same: we apply padding so that the output size is same as 0 3 4 5 0
the input size, i.e., n+2p-f+1 = n, so p = ( f-1 )/2 0 6 7 8 0
0 0 0 0 0
Padding
For the input matrix, we add one pixel all around the edges.
This means that the input will be an 8 x 8 matrix (instead of a 6x6 matrix )
Applying convolution of 3 x 3 on 8 x 8 will result in a 6 x 6 matrix which is the original shape of the
image
Input: n x n - 6 x 6
Padding: p = (f-1)/2 =(3-1)/2=1 Filter size: f X f - 3 x 3
Output: ( n+2p-f+1 ) x ( n+2p-f+1 ) = ( 6+2-3+1 ) x ( 6+2-3+1 ) = 6 x 6
0 0 0 0 0 0 0 0
-5 -5 -6 -1 6 10
0 3 0 1 2 7 4 0
1 0 -1 -12 -5 -4 0 8 11
0 1 5 8 9 3 1 0
-13 -10 -2 2 3 11
0 2 7 2 5 1 3 0 1 0 -1
0 0 1 3 1 7 8 0 -10 0 -2 -4 -7 10
1 0 -1
0 4 2 1 6 2 8 0 -7 -3 -2 -3 -16 12
0 2 4 5 2 3 9 0 -6 0 -2 1 -9 5
0 0 0 0 0 0 0 0
2. Pooling layer (Pooling operation)
Pooling operation: It is responsible for reducing the spatial size of the Convolved Feature.
This is to decrease the computational power required to process the data by reducing
the dimensions.
Types of Pooling:
1. Max pooling: Maximum element in the pooling window is selected
2. Min pooling: Minimum element in the pooling window is selected
[Link] pooling: Average of all the elements of the pooling window is selected .
Generally we use max pooling or avg pooling
Hyperparameters for pooling operation:
1. Filter size ( elements are not required)
2. Stride
3. Max pooling or average pooling
2. Pooling layer
• The pooling layer will always reduce the size
of each feature map by a factor of 2.
Ex 1: If size of input is 4x4 then after applying
pooling operation, size of output is 2x2
Ex 2: If size of i'nput is 6x6 then after applying
pooling operation, size of output is 3x3
Hyperparameters:
1. Filter size -2x2
2. Stride - 2
3. Max pooling or average pooling
Note: Max/Avg pooling preserves the important features
2. Pooling Layer
-5 -5 -6 -1 6 10

-12 -5 -4 0 8 11
-5 0 11
-13 -10 -2 2 3 11
0 2 11
-10 0 -2 -4 -7 10

-7 -3 -2 -3 -16 12
Max pooling 0 1 12

-6 0 -2 1 -9 5 3x3
6x6
POOL layer has the following attributes that we need to configure:

[Link](MaxPooling2D(pool_size=(2, 2), strides = 2))


3. Flattening Layer
Flattening layer: Flattening layer is used to convert the resultant 2-
Dimensional arrays from pooled feature maps into a single continuous linear
vector.
3. Flattening Layer
Flattening(Unrolling): Conversion of matrix in to vector

Ex:

-5
0

-5 0 11 11
0
0 2 11
2
0 1 12 unrolling 11
0
Output of the pooling layer 1
12

Input to the Fully Connected layer


4. Fully Connected Layer
-5

11

11

0
Output
1 layer
output of
pooling layer 12

input layer

hidden layers
4. Fully Connected Layer
Fully-Connected Layer (FC-Layer): The flattened matrix is fed as input to
the fully connected layer to classify the image. CNN use fully-connected
layers in which each pixel is considered as a separate neuron just like a
regular neural network. The last fully-connected layer will contain as many
neurons as the number of classes to be predicted.
• Image Classification
Applications: • Image Recognition
• Document Classification
• Medical Image Analysis
• Automatic image captioning
Digit Classification
Brain Tumor Classification

Y N
CNN Architecture
ReLU

C
C O
O N
IMAGE NON
N POOLING V
LINEARITY
V

FULLY NON
CONNECTED POOLING
CLASS LINEARITY
LAYER

ReLU
SOFTMAX

*ReLU = Rectified Linear Unit


Steps to be followed to train the CNN
1. Provide the input image into convolution layer.
2. Get convolution with featured kernel / filters.
3. Apply pooling layer to reduce the dimensions.
4. Add these layers multiple times.
5. Flatten the output and feed into a fully connected layer.
6. Now train the model with backpropagation using logistic regression.
Some well known convolution networks
 LeNet — Developed by Yann LeCun to recognize handwritten digits is the pioneer CNN.
 AlexNet — Developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton won the 2012
ImageNet challenge. It is the first CNN where multiple convolution operations were used.
 GoogleLeNet - Developed by Google, won the 2014 ImageNet competition. The main
advantage of this network over the other networks was that it required a lot lesser number of
parameters to train, making it faster and less prone to overfitting.
 VGGNet - This is another popular network, with its most popular version being VGG16.
VGG16 has 16 layers which includes input, output and hidden layers.
 ResNet - Developed by Kaiming He, this network won the 2015 ImageNet competition.
The 2 most popular variant of ResNet are the ResNet50 and ResNet34. Another complex
variation of ResNet is ResNeXt architecture.
Projects on CNN

1. Hand Written Digit Recognition - MNIST dataset (2D)

2. Object Recognition in Photographs - CIFAR 10/100 dataset (3D)

3. Predict Sentiment from Movie Reviews - IMDB dataset (1D)


[Link] Written Digit Recognition - MNIST dataset
( Multi-class classification , gray scale images, 2D convolution)
• MNIST dataset: (Modified National Institute of Standards and Technology database) is a
large database of handwritten digits that is used for training various image processing
systems
• The database is also widely used for training and testing in the field of machine learning
and deep learning.

• This dataset consists of 70000 images where each image is a handwritten digit,
out of which 60000 are for training and 10000 are for testing.
• 70000 handwritten digits are divided into 10 classes. Classes include digits such as
0,1,2,3,4,5,6,7,8,9
• Size of each image is a 28x28 pixel square(784 pixels total)
MNIST(Sample dataset)
[Link] Recognition in Photographs-CIFAR 10 dataset
( multi-class classification, color images, 3D convolution )
• CIFAR dataset: (Canadian Institute For Advanced Research) is a large database of
Photographs of objects that is used for training various image processing systems
,frog,horse
• The CIFAR-10 consists of tiny colour images
• The database is also widely used for training and testing in the field of
deep learning.
• This dataset consists of 60000 photos where each object is a photograph, out of
which 50000 are for training and 10000 are for testing.
• 60000 photographs are divided into 10 classes. Classes include objects such as
airplanes, automobiles, birds, cats, frog, horse, ship, truck etc
• Size of each image is a 32x32 pixel squares (784 pixels total)
CIFAR10 ( Sample Dataset )
3. Predict Sentiment from Movie Reviews-IMDB dataset
( Binary classification, 1D convolution)

• The Internet Movie Database (IMDB) is a huge repository for image and text
data.
• The database is an excellent source for data analytics and deep learning
practice and research
• This is a dataset for binary sentiment classification consists of
50,000 highly polar movie reviews.
• 50000 movie reviews are divided into 2 classes. Classes include positive
reviews and negative reviews
• This is a dataset for with 25,000 highly polar movie reviews for training, and
25,000 for testing.
IMDB ( Sample Dataset )
[Link] Review Sentiment
This movie features Charlie Spradling dancing in a strip club. Beyond
1 that, it features a truly bad script with dull, unrealistic dialogue. That negative
it got as many positive votes suggests some people may be joking.
If you like original gut wrenching laughter you will like this movie. If
2 you are young or old then you will love this movie, hell even my mom Positive
liked it. Great Camp!!!
It's terrific when a funny movie doesn't make smile you. What a pity!!
3 This film is very boring and so long. It's simply painfull. The story is negative
staggering without goal and no fun. You feel better when it's finished.
I have seen this film at least 100 times and I am still excited by it, the
acting is perfect and the romance between Joe and Jean keeps me on
4 positive
the edge of my seat, plus I still think Bryan Brown is the tops. Brilliant
Film.
Datasets - MNIST / CIFAR10 / IMDB

No. of
[Link] dataset type of dataset classes class labels classification convolution

1 MNIST Handwritten digits 10 0,1,2,3,4,5,6,7,8,9 multiclass 2D


(gray scale images)

airplane, automobile,
2 CIFAR10 images of objects 10 dog, deer, cat, truck, multiclass 3D
(Color images) ship, horse, bird, frog

3 IMDB movie reviews 2 positive, negative binary 1D


Steps to develop Convolutional Neural Networks(CNN) with
Keras

Step 1: Load Dataset

Step2: Define Model

Step 3: Compile Model

Step 4: Fit Model

Step 5: Evaluate Model


Hand Written Digit Recognition
Build the model architecture
from [Link] import Sequential
from [Link] import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
# build the model object
model = Sequential()
# CONV_1: add CONV layer with RELU activation and depth = 32 kernels
[Link](Conv2D(32, kernel_size=(3, 3), stride=1, padding='same', activation='relu', input_shape=(28,28,1)))

# POOL_1: downsample the image to choose the best features


[Link](MaxPooling2D(pool_size=(2, 2)))
# CONV_2: here we increase the depth to 64
[Link](Conv2D(64, (3, 3), stride=1, padding='same', activation='relu'))

# POOL_2: more downsampling


[Link](MaxPooling2D(pool_size=(2, 2)))
Build the model architecture
# flatten since too many dimensions, we only want a classification output
[Link](Flatten())

# FC_1: fully connected to get all relevant data


[Link](Dense(64, activation='relu'))

# FC_2: output a softmax to squash the matrix into output probabilities for the 10 classes
[Link](Dense(10, activation='softmax'))

# print model architecture summary


[Link]()
Model summary

number of params = filters x kernel size x depth of the previous layer + no. of filters (for biases)
CNN vs Fully Connected
 The basic difference between the two types of layers is the density of the
connections. The FC layers are densely connected, meaning that every neuron
in the output is connected to every input neuron. On the other hand, in a Conv
layer, the neurons are not densely connected but are connected only to
neighboring neurons within the width of the convolutional kernel.

 A second main difference between them is weight sharing. In an FC layer,


every output neuron is connected to every input neuron through a different
weight . However, in a Conv layer, the weights are shared among different
neurons. This is another characteristic that enables Conv layers to be used in
the case of a large number of neurons.

You might also like