0% found this document useful (0 votes)

24 views35 pages

Unit IV Deep Leraning

This document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, layers, and operations such as convolution and pooling. It discusses the advantages and disadvantages of CNNs, and highlights famous architectures like AlexNet and VGG, explaining their significance and design choices. The document also includes practical examples of CNN operations using Python code.

Uploaded by

Ramprakash Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views35 pages

Unit IV Deep Leraning

Uploaded by

Ramprakash Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

UNIT 4 CONVNETS 9 hours

Basic concepts of Convolutional Neural Networks starting from filtering. Convolution

and pooling operation and arithmetic of these, Discussions on famous convent
architectures - AlexNet, ZFNet, VGG, GoogLeNet, ResNet, MobileNet-v1
REGULARIZATION, BATCHNORM
Discussion on regularization, Dropout, Batch norm, Discussion on detection as
classification, region proposals, RCNN architectures.

Introduction to Convolution Neural Network

A Convolutional Neural Network (CNN) is a type of Deep Learning
neural network architecture commonly used in Computer Vision.
Computer vision is a field of Artificial Intelligence that enables a
computer to understand and interpret the image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform
really well. Neural Networks are used in various datasets like images,
audio, and text. Different types of Neural Networks are used for
different purposes, for example for predicting the sequence of words
we use Recurrent Neural Networks more precisely an LSTM,
similarly for image classification we use Convolution Neural networks.
Here we are going to build a basic building block for CNN.
In a regular Neural Network there are three types of layers:
1. Input Layers: It’s the layer in which we give input to our
model. The number of neurons in this layer is equal to the total
number of features in our data (number of pixels in the case of
an image).
2. Hidden Layer: The input from the Input layer is then fed into the
hidden layer. There can be many hidden layers depending on our
model and data size. Each hidden layer can have different
numbers of neurons which are generally greater than the number
of features. The output from each layer is computed by matrix
multiplication of the output of the previous layer with learnable
weights of that layer and then by the addition of learnable biases
followed by activation function which makes the network
nonlinear.
3. Output Layer: The output from the hidden layer is then fed into
a logistic function like sigmoid or softmax which converts the
output of each class into the probability score of each class.
The data is fed into the model and output from each layer is obtained
from the above step is called feedforward, we then calculate the error
using an error function, some common error functions are cross-
entropy, square loss error, etc. The error function measures how well
the network is performing. After that, we backpropagate into the model
by calculating the derivatives. This step is
called Backpropagation which basically is used to minimize the loss.
Convolution Neural Network:
Convolutional Neural Network (CNN) is the extended version of artificial
neural networks (ANN) which is predominantly used to extract the
feature from the grid-like matrix dataset. For example visual datasets
like images or videos where data patterns play an extensive role.
CNN architecture
Convolutional Neural Network consists of multiple layers like the input
layer, Convolutional layer, Pooling layer, and fully connected layers.

The Convolutional layer applies filters to the input image to extract

features, the Pooling layer down samples the image to reduce
computation, and the fully connected layer makes the final prediction.
The network learns the optimal filters through backpropagation and
gradient descent.
How Convolutional Layers works:
Convolution Neural Networks or covnets are neural networks that share
their parameters. Imagine you have an image. It can be represented as
a cuboid having its length, width (dimension of the image), and height
(i.e the channel as images generally have red, green, and blue
channels).

Now imagine taking a small patch of this image and running a small
neural network, called a filter or kernel on it, with say, K outputs and
representing them vertically. Now slide that neural network across the
whole image, as a result, we will get another image with different
widths, heights, and depths. Instead of just R, G, and B channels now
we have more channels but lesser width and height. This operation is
called Convolution. If the patch size is the same as that of the image
it will be a regular neural network. Because of this small patch, we
have fewer weights.
Now let’s talk about a bit of mathematics that is involved in the whole
convolution process.

 Convolution layers consist of a set of learnable filters (or

kernels) having small widths and heights and the same depth
as that of input volume (3 if the input layer is image input).
 For example, if we have to run convolution on an image with
dimensions 34x34x3. The possible size of filters can be axax3,
where ‘a’ can be anything like 3, 5, or 7 but smaller as
compared to the image dimension.
 During the forward pass, we slide each filter across the whole
input volume step by step where each step is
called stride (which can have a value of 2, 3, or even 4 for
high-dimensional images) and compute the dot product
between the kernel weights and patch from input volume.
 As we slide our filters we’ll get a 2-D output for each filter and
we’ll stack them together as a result, we’ll get output volume
having a depth equal to the number of filters. The network will
learn all the filters.
Layers used to build ConvNets
A complete Convolution Neural Networks architecture is also known as
covnets. A covnets is a sequence of layers, and every layer transforms
one volume to another through a differentiable function.
Types of layers: datasets
Let’s take an example by running a covnets on of image of dimension
32 x 32 x 3.
 Input Layers: It’s the layer in which we give input to our
model. In CNN, Generally, the input will be an image or a
sequence of images. This layer holds the raw input of the
image with width 32, height 32, and depth 3.
 Convolutional Layers: This is the layer, which is used to
extract the feature from the input dataset. It applies a set of
learnable filters known as the kernels to the input images. The
filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5
shape. it slides over the input image data and computes the
dot product between kernel weight and the corresponding input
image patch. The output of this layer is referred as feature
maps. Suppose we use a total of 12 filters for this layer we’ll
get an output volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the
output of the preceding layer, activation layers add nonlinearity
to the network. it will apply an element-wise activation function
to the output of the convolution layer. Some common activation
functions are RELU: max(0, x), Tanh, Leaky RELU, etc. The
volume remains unchanged hence output volume will have
dimensions 32 x 32 x 12.
 Pooling layer: This layer is periodically inserted in the covnets
and its main function is to reduce the size of volume which
makes the computation fast reduces memory and also prevents
overfitting. Two common types of pooling layers are max
pooling and average pooling. If we use a max pool with 2 x 2
filters and stride 2, the resultant volume will be of dimension
16x16x12.

Image source: cs231n.stanford.edu

 Flattening: The resulting feature maps are flattened into a

one-dimensional vector after the convolution and pooling layers
so they can be passed into a completely linked layer for
categorization or regression.
 Fully Connected Layers: It takes the input from the previous
layer and computes the final classification or regression task.
Image source: cs231n.stanford.edu

Output Layer: The output from the fully connected layers is

then fed into a logistic function for classification tasks like
sigmoid or softmax which converts the output of each class into
the probability score of each class.
Example:
Let’s consider an image and apply the convolution layer, activation
layer, and pooling layer operation to extract the inside feature.
Input image:
Input image

Step:
 import the necessary libraries
 set the parameter
 define the kernel
 Load the image and plot it.
 Reformat the image
 Apply convolution layer operation and plot the output image.
 Apply activation layer operation and plot the output image.
 Apply pooling layer operation and plot the output image.
 Python3
# import the necessary libraries

import numpy as np

import tensorflow as tf

import matplotlib.pyplot as plt

from itertools import product

# set the param

plt.rc('figure', autolayout=True)

plt.rc('image', cmap='magma')

# define the kernel

kernel = tf.constant([[-1, -1, -1],

[-1, 8, -1],

[-1, -1, -1],

])

# load the image

image = tf.io.read_file('Ganesh.jpg')

image = tf.io.decode_jpeg(image, channels=1)

image = tf.image.resize(image, size=[300, 300])

# plot the image

img = tf.squeeze(image).numpy()

plt.figure(figsize=(5, 5))
plt.imshow(img, cmap='gray')

plt.axis('off')

plt.title('Original Gray Scale image')

plt.show();

# Reformat

image = tf.image.convert_image_dtype(image,
dtype=tf.float32)

image = tf.expand_dims(image, axis=0)

kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])

kernel = tf.cast(kernel, dtype=tf.float32)

# convolution layer

conv_fn = tf.nn.conv2d

image_filter = conv_fn(

input=image,

filters=kernel,

strides=1, # or (1, 1)

padding='SAME',

plt.figure(figsize=(15, 5))

# Plot the convolved image

plt.subplot(1, 3, 1)
plt.imshow(

tf.squeeze(image_filter)

plt.axis('off')

plt.title('Convolution')

# activation layer

relu_fn = tf.nn.relu

# Image detection

image_detect = relu_fn(image_filter)

plt.subplot(1, 3, 2)

plt.imshow(

# Reformat for plotting

tf.squeeze(image_detect)

plt.axis('off')

plt.title('Activation')

# Pooling layer

pool = tf.nn.pool

image_condense = pool(input=image_detect,

window_shape=(2, 2),

pooling_type='MAX',

strides=(2, 2),

padding='SAME',
)

plt.subplot(1, 3, 3)

plt.imshow(tf.squeeze(image_condense))

plt.axis('off')

plt.title('Pooling')

plt.show()

Output:

Original Grayscale image

Output

Advantages of Convolutional Neural Networks (CNNs):

1. Good at detecting patterns and features in images, videos, and
audio signals.
2. Robust to translation, rotation, and scaling invariance.
3. End-to-end training, no need for manual feature extraction.
4. Can handle large amounts of data and achieve high accuracy.
Disadvantages of Convolutional Neural Networks (CNNs):
1. Computationally expensive to train and require a lot of memory.
2. Can be prone to overfitting if not enough data or proper
regularization is used.
3. Requires large amounts of labeled data.
4. Interpretability is limited, it’s hard to understand what the
network has learned.

Discussions on famous convent architectures - AlexNet, ZFNet, VGG, GoogLeNet,

ResNet, MobileNet-v1

AlexNet
When?

 The Alan Turing Year

 The year of Sustainable Energy for All

 London Olympics

Why? AlexNet was born out of the need to improve the results of
the ImageNet challenge. This was one of the
first Deep convolutional networks to achieve considerable
accuracy on the 2012 ImageNet LSVRC-2012 challenge with an
accuracy of 84.7% as compared to the second-best with an
accuracy of 73.8%. The idea of spatial correlation in an image
frame was explored using convolutional layers and receptive fields.

What? The network consists of 5 Convolutional (CONV) layers and

3 Fully Connected (FC) layers. The activation used is the Rectified
Linear Unit (ReLU). The structural details of each layer in the
network can be found in the table below.

Alexnet Block Diagram

The network has a total of 62 million trainable variables

How? The input to the network is a batch of RGB images of size

227x227x3 and outputs a 1000x1 probability vector one
corresponding to each class.

 Data augmentation is carried out to reduce over-fitting. This Data

augmentation includes mirroring and cropping the images to
increase the variation in the training data-set. The network uses an
overlapped max-pooling layer after the first, second, and fifth
CONV layers. Overlapped maxpool layers are simply maxpool
layers with strides less than the window size. 3x3 maxpool layer is
used with a stride of 2 hence creating overlapped receptive fields.
This overlapping improved the top-1 and top-5 errors by 0.4% and
0.3%, respectively.
 Before AlexNet, the most commonly used activation functions
were sigmoid and tanh. Due to the saturated nature of these
functions, they suffer from the Vanishing Gradient (VG) problem
and make it difficult for the network to train. AlexNet uses
the ReLU activation function which doesn’t suffer from the VG
problem. The original paper showed that the network
with ReLU achieved a 25% error rate about 6 times faster than the
same network with tanh non-linearity.

 Although ReLU helps with the vanishing gradient problem, due to

its unbounded nature, the learned variables can become
unnecessarily high. To prevent this, AlexNet introduced Local
Response Normalization (LRN). The idea behind LRN is to carry
out a normalization in a neighborhood of pixels amplifying the
excited neuron while dampening the surrounding neurons at the
same time.

 AlexNet also addresses the over-fitting problem by using drop-out

layers where a connection is dropped during training with a
probability of p=0.5. Although this avoids the network from over-
fitting by helping it escape from bad local minima, the number of
iterations required for convergence is doubled too.

VGGNet:
When?

 International Year of Family Farming and Crystallography

 First Robotic Landing on Comet

 Year of Robin Williams’ death

Why? VGGNet was born out of the need to reduce the # of

parameters in the CONV layers and improve on training time.

What? There are multiple variants of VGGNet (VGG16, VGG19,

etc.) which differ only in the total number of layers in the network.
The structural details of a VGG16 network have been shown below.

VGG16 Block Diagram (source: neurohive.io)

VGG16 has a total of 138 million parameters. The important point
to note here is that all the conv kernels are of size 3x3 and
maxpool kernels are of size 2x2 with a stride of two.

How? The idea behind having fixed size kernels is that all the
variable size convolutional kernels used in Alexnet (11x11, 5x5,
3x3) can be replicated by making use of multiple 3x3 kernels as
building blocks. The replication is in terms of the receptive field
covered by the kernels.

Let’s consider the following example. Say we have an input layer of

size 5x5x1. Implementing a conv layer with a kernel size of 5x5 and
stride one will result in an output feature map of 1x1. The same
output feature map can be obtained by implementing two 3x3 conv
layers with a stride of 1 as shown below
Now let’s look at the number of variables needed to be trained. For
a 5x5 conv layer filter, the number of variables is 25. On the other
hand, two conv layers of kernel size 3x3 have a total of 3x3x2=18
variables (a reduction of 28%).

Similarly, the effect of one 7x7 (11x11) conv layer can be achieved
by implementing three (five) 3x3 conv layers with a stride of one.
This reduces the number of trainable variables by 44.9% (62.8%).
A reduced number of trainable variables means faster learning and
more robust to over-fitting.

ResNet
When?

 Discovery of Gravitational Waves

 International year of soil and light-based technologies

 The Martian movie

Why? Neural Networks are notorious for not being able to find a
simpler mapping when it exists.

 For example, say we have a fully connected multi-layer perceptron

network and we want to train it on a data-set where the input
equals the output. The simplest solution to this problem is having
all weights equaling one and all biases zeros for all the hidden
layers. But when such a network is trained using back-propagation,
a rather complex mapping is learned where the weights and biases
have a wide range of values.

 Another example is adding more layers to an existing neural

network. Say we have a network f(x) that has achieved an
accuracy of n% on a data-set. Now adding more layers to this
network g(f(x)) should have at least an accuracy of n% i.e. in the
worst case g(.) should be an identical mapping yielding the same
accuracy as that of f(x) if not more. But unfortunately, that is not
the case. Experiments have shown that the accuracy decreases by
adding more layers to the network.

 The issues mentioned above happens because of the vanishing

gradient problem. As we make the CNN deeper, the derivative
when back-propagating to the initial layers becomes almost
insignificant in value.

ResNet addresses this network by introducing two types of

‘shortcut connections’: Identity shortcut and Projection shortcut.

What? There are multiple versions of ResNetXX architectures

where ‘XX’ denotes the number of layers. The most commonly used
ones are ResNet50 and ResNet101. Since the vanishing gradient
problem was taken care of (more about it in the How part), CNN
started to get deeper and deeper. Below we present the structural
details of ResNet18
Resnet18 has around 11 million trainable parameters. It consists of
CONV layers with filters of size 3x3 (just like VGGNet). Only two
pooling layers are used throughout the network one at the
beginning and the other at the end of the network. Identity
connections are between every two CONV layers. The solid arrows
show identity shortcuts where the dimension of the input and
output is the same, while the dotted ones present the projection
connections where the dimensions differ.

How? As mentioned earlier, ResNet architecture makes use of

shortcut connections to solve the vanishing gradient problem. The
basic building block of ResNet is a Residual block that is repeated
throughout the network.
Residual Block — Image is taken from the original paper

Instead of learning the mapping from x →F(x), the network learns

the mapping from x → F(x)+G(x). When the dimension of the input
x and output F(x) is the same, the function G(x) = x is an identity
function and the shortcut connection is called Identity connection.
The identical mapping is learned by zeroing out the weights in the
intermediate layer during training since it's easier to zero out the
weights than push them to one.

For the case when the dimensions of F(x) differ from x (due to
stride length>1 in the CONV layers in between), the Projection
connection is implemented rather than the Identity connection.
The function G(x) changes the dimensions of input x to that of
output F(x). Two kinds of mapping were considered in the original
paper.
 Non-trainable Mapping (Padding): The input x is simply padded
with zeros to make the dimension match that of F(x)

 Trainable Mapping (Conv Layer): 1x1 Conv layer is used to map

x to G(x). It can be seen from the table above that across the
network the spatial dimensions are either kept the same or halved,
and the depth is either kept the same or doubled and the product
of Width and Depth after each conv layer remains the same i.e.
3584. 1x1 conv layers are used to half the spatial dimension and
double the depth by using stride length of 2 and multiple of such
filters respectively. The number of 1x1 conv layers is equal to the
depth of F(x).

Inception:
When?

 International Year of Family Farming and Crystallography

 First Robotic Landing on Comet

 Year of Robin Williams’ death

Why? In an image classification task, the size of the salient feature

can considerably vary within the image frame. Hence, deciding on
a fixed kernel size is rather difficult. Lager kernels are preferred
for more global features that are distributed over a large area of
the image, on the other hand, smaller kernels provide good results
in detecting area-specific features that are distributed across the
image frame. For effective recognition of such a variable-sized
feature, we need kernels of different sizes. That is what Inception
does. Instead of simply going deeper in terms of the number of
layers, it goes wider. Multiple kernels of different sizes are
implemented within the same layer.

What? The Inception network architecture consists of several

inception modules of the following structure

Inception Module (source: original paper)

Each inception module consists of four operations in parallel

 1x1 conv layer

 3x3 conv layer

 5x5 conv layer

 max pooling

The 1x1 conv blocks shown in yellow are used for depth reduction.
The results from the four parallel operations are then concatenated
depth-wise to form the Filter Concatenation block (in green). There
is multiple version of Inception, the simplest one being the

GoogLeNet.
How? Inception increases the network space from which the best
network is to be chosen via training. Each inception module can
capture salient features at different levels. Global features are
captured by the 5x5 conv layer, while the 3x3 conv layer is prone
to capturing distributed features. The max-pooling operation is
responsible for capturing low-level features that stand out in a
neighborhood. At a given level, all of these features are extracted
and concatenated before it is fed to the next layer. We leave for the
network/training to decide what features hold the most values and
weight accordingly. Say if the images in the data-set are rich in
global features without too many low-level features, then the
trained Inception network will have very small weights
corresponding to the 3x3 conv kernel as compared to the 5x5 conv
kernel.

Summary
In the table below these four CNNs are sorted w.r.t their top-5
accuracy on the Imagenet dataset. The number of trainable
parameters and the Floating Point Operations (FLOP) required for
a forward pass can also be seen.

Several comparisons can be drawn:

 AlexNet and ResNet-152, both have about 60M parameters but
there is about a 10% difference in their top-5 accuracy. But
training a ResNet-152 requires a lot of computations (about 10
times more than that of AlexNet) which means more training time
and energy required.

 VGGNet not only has a higher number of parameters and FLOP as

compared to ResNet-152 but also has a decreased accuracy. It
takes more time to train a VGGNet with reduced accuracy.

 Training an AlexNet takes about the same time as training

Inception. The memory requirements are 10 times less with
improved accuracy (about 9%)

A Convolutional Neural Network (CNN, or ConvNet) are a

special kind of multi-layer neural networks, designed to recognize
visual patterns directly from pixel images with minimal
preprocessing.. The ImageNet project is a large visual database
designed for use in visual object recognition software research.
The ImageNet project runs an annual software contest,
the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), where software programs compete to correctly classify
and detect objects and scenes. Here I will talk about CNN
architectures of ILSVRC top competitors .
LeNet-5 (1998)
LeNet-5, a pioneering 7-level convolutional network by LeCun et
al in 1998, that classifies digits, was applied by several banks to
recognise hand-written numbers on checks (cheques) digitized
in 32x32 pixel greyscale inputimages. The ability to process
higher resolution images requires larger and more convolutional
layers, so this technique is constrained by the availability of
computing resources.
AlexNet (2012)
In 2012, AlexNet significantly outperformed all the prior
competitors and won the challenge by reducing the top-5 error
from 26% to 15.3%. The second place top-5 error rate, which
was not a CNN variation, was around 26.2%.

The network had a very similar architecture as LeNet by Yann

LeCun et al but was deeper, with more filters per layer, and with
stacked convolutional layers. It consisted 11x11, 5x5,3x3,
convolutions, max pooling, dropout, data augmentation, ReLU
activations, SGD with momentum. It attached ReLU activations
after every convolutional and fully-connected layer. AlexNet was
trained for 6 days simultaneously on two Nvidia Geforce GTX
580 GPUs which is the reason for why their network is split into
two pipelines. AlexNet was designed by the SuperVision group,
consisting of Alex Krizhevsky, Geoffrey Hinton, and Ilya
Sutskever.

ZFNet(2013)
Not surprisingly, the ILSVRC 2013 winner was also a CNN
which became known as ZFNet. It achieved a top-5 error rate of
14.8% which is now already half of the prior mentioned non-
neural error rate. It was mostly an achievement by tweaking the
hyper-parameters of AlexNet while maintaining the same
structure with additional Deep Learning elements as discussed
earlier in this essay.

GoogLeNet/Inception(2014)
The winner of the ILSVRC 2014 competition was
GoogLeNet(a.k.a. Inception V1) from Google. It achieved a top-5
error rate of 6.67%! This was very close to human level
performance which the organisers of the challenge were now
forced to evaluate. As it turns out, this was actually rather hard
to do and required some human training in order to beat
GoogLeNets accuracy. After a few days of training, the human
expert (Andrej Karpathy) was able to achieve a top-5 error rate
of 5.1%(single model) and 3.6%(ensemble). The network used a
CNN inspired by LeNet but implemented a novel element which
is dubbed an inception module. It used batch normalization,
image distortions and RMSprop. This module is based on several
very small convolutions in order to drastically reduce the
number of parameters. Their architecture consisted of a 22 layer
deep CNN but reduced the number of parameters from 60
million (AlexNet) to 4 million.

VGGNet (2014)
The runner-up at the ILSVRC 2014 competition is dubbed
VGGNet by the community and was developed by Simonyan and
Zisserman. VGGNet consists of 16 convolutional layers and is
very appealing because of its very uniform architecture. Similar
to AlexNet, only 3x3 convolutions, but lots of filters. Trained on 4
GPUs for 2–3 weeks. It is currently the most preferred choice in
the community for extracting features from images. The weight
configuration of the VGGNet is publicly available and has been
used in many other applications and challenges as a baseline
feature extractor. However, VGGNet consists of 138 million
parameters, which can be a bit challenging to handle.
ResNet(2015)
At last, at the ILSVRC 2015, the so-called Residual Neural
Network (ResNet) by Kaiming He et al introduced anovel
architecture with “skip connections” and features heavy batch
normalization. Such skip connections are also known as gated
units or gated recurrent units and have a strong similarity to
recent successful elements applied in RNNs. Thanks to this
technique they were able to train a NN with 152 layers while
still having lower complexity than VGGNet. It achieves a top-5
error rate of 3.57% which beats human-level performance on
this dataset.
AlexNet has parallel two CNN line trained on two GPUs with
cross-connections, GoogleNet has inception modules ,ResNet
has residual connections.

Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
Unit III
No ratings yet
Unit III
89 pages
CNN Basics for AI Enthusiasts
No ratings yet
CNN Basics for AI Enthusiasts
6 pages
Unit Iii Deep Learning
No ratings yet
Unit Iii Deep Learning
31 pages
Unit III
No ratings yet
Unit III
89 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
Introduction to CNNs in Deep Learning
No ratings yet
Introduction to CNNs in Deep Learning
42 pages
Unit 5 Ann
No ratings yet
Unit 5 Ann
28 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
CNNs for AI and Machine Learning
No ratings yet
CNNs for AI and Machine Learning
16 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
9 pages
CNN Guide for Machine Learning Students
No ratings yet
CNN Guide for Machine Learning Students
37 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
Understanding CNN Architecture Basics
No ratings yet
Understanding CNN Architecture Basics
24 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
14 pages
Convolu Onal Neural Network (CNN)
No ratings yet
Convolu Onal Neural Network (CNN)
3 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Understanding CNN Architecture Basics
No ratings yet
Understanding CNN Architecture Basics
13 pages
CNN Layer Sequence in Transfer Learning
No ratings yet
CNN Layer Sequence in Transfer Learning
8 pages
CNN
No ratings yet
CNN
6 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Introduction To Convolution Neural Network - GeeksforGeeks
No ratings yet
Introduction To Convolution Neural Network - GeeksforGeeks
24 pages
CNNs for Image Recognition
No ratings yet
CNNs for Image Recognition
16 pages
Unit II
No ratings yet
Unit II
38 pages
DL Mod3
No ratings yet
DL Mod3
102 pages
DEEP LEARNING Unit-2 NOTES For Post Graduation
No ratings yet
DEEP LEARNING Unit-2 NOTES For Post Graduation
11 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
Unit 3
No ratings yet
Unit 3
59 pages
Unit 4
No ratings yet
Unit 4
19 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
UNIT 2 Study Materials 1
No ratings yet
UNIT 2 Study Materials 1
42 pages
1 CNN
No ratings yet
1 CNN
14 pages
MLT UNIT-4 & 5 Imp Sol
No ratings yet
MLT UNIT-4 & 5 Imp Sol
22 pages
An Introduction To Convolutional Neural Networks
No ratings yet
An Introduction To Convolutional Neural Networks
11 pages
CNN Basics for AI Enthusiasts
No ratings yet
CNN Basics for AI Enthusiasts
29 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
Building CNNs with PyTorch and C++
No ratings yet
Building CNNs with PyTorch and C++
6 pages
Unit 5th Ig Ann
No ratings yet
Unit 5th Ig Ann
112 pages
Unit 3 ML
No ratings yet
Unit 3 ML
27 pages
Deep Learning UNIT-4
No ratings yet
Deep Learning UNIT-4
34 pages
CNN Overview: Layers and Functions
No ratings yet
CNN Overview: Layers and Functions
24 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
Unit 3 DL
No ratings yet
Unit 3 DL
72 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Unit 4 (CNN and SOM)
No ratings yet
Unit 4 (CNN and SOM)
15 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
CV Unit V
No ratings yet
CV Unit V
18 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
Convolutional Neural Networks - Part 2
No ratings yet
Convolutional Neural Networks - Part 2
49 pages
Introduction to CNN Basics
No ratings yet
Introduction to CNN Basics
4 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
Neural Networks Unit 3
No ratings yet
Neural Networks Unit 3
93 pages
Deep Learning & CNN Fundamentals
No ratings yet
Deep Learning & CNN Fundamentals
56 pages
Unit V Deep Lrearning
No ratings yet
Unit V Deep Lrearning
1 page
Anatomy of Android
50% (2)
Anatomy of Android
2 pages
Dbms Unitwise Questions
No ratings yet
Dbms Unitwise Questions
34 pages
CS6302-Database Management Systems - 2013 - Regulation
No ratings yet
CS6302-Database Management Systems - 2013 - Regulation
6 pages
Cs Unit-1
No ratings yet
Cs Unit-1
14 pages
Java Inheritance and Super Keyword
No ratings yet
Java Inheritance and Super Keyword
42 pages
Java 8 Programming Black Book PDF
100% (1)
Java 8 Programming Black Book PDF
2 pages
Program Structure
No ratings yet
Program Structure
7 pages
Skin Put
No ratings yet
Skin Put
2 pages
C Programming Course Syllabus Overview
No ratings yet
C Programming Course Syllabus Overview
120 pages
Walking Stick With Heart Attack Detection Ece Seminar
No ratings yet
Walking Stick With Heart Attack Detection Ece Seminar
8 pages
Lesson-3 State Nation Globalization
No ratings yet
Lesson-3 State Nation Globalization
21 pages
Mineral Fuels
No ratings yet
Mineral Fuels
53 pages
Overview of Hydraulic Actuators
No ratings yet
Overview of Hydraulic Actuators
27 pages
Sweex LW050v2 Router User Manual
No ratings yet
Sweex LW050v2 Router User Manual
22 pages
How To Solve The ProgrammingError - Column Does Not Exist Error in Odoo - Ngasturi Notes
No ratings yet
How To Solve The ProgrammingError - Column Does Not Exist Error in Odoo - Ngasturi Notes
4 pages
HBL632RT2: Construction Electrical Optics Specification Features
No ratings yet
HBL632RT2: Construction Electrical Optics Specification Features
2 pages
Employees' Compensation Act, 1973
No ratings yet
Employees' Compensation Act, 1973
15 pages
Mobile App Marketing Strategies
No ratings yet
Mobile App Marketing Strategies
30 pages
Fixed Slite Display: Installation Manual
No ratings yet
Fixed Slite Display: Installation Manual
61 pages
Active Heating and Cooling
0% (1)
Active Heating and Cooling
16 pages
FAA Type Certificate Data Sheet
No ratings yet
FAA Type Certificate Data Sheet
8 pages
Plant Watering System
No ratings yet
Plant Watering System
5 pages
India Tractor Demand Forecasting
No ratings yet
India Tractor Demand Forecasting
23 pages
Best Water Purifiers in India (2020) - Buyer's Guide & Reviews!
No ratings yet
Best Water Purifiers in India (2020) - Buyer's Guide & Reviews!
54 pages
Cathode Materials
No ratings yet
Cathode Materials
8 pages
CIPC 2018 Winners Announced
No ratings yet
CIPC 2018 Winners Announced
3 pages
Airtel Payment Bank Account Statement
No ratings yet
Airtel Payment Bank Account Statement
3 pages
8-Port Gigabit PoE Switch Installation Guide
No ratings yet
8-Port Gigabit PoE Switch Installation Guide
2 pages
Sense Making
100% (2)
Sense Making
233 pages
Dental Alloy Properties Guide
No ratings yet
Dental Alloy Properties Guide
4 pages
Example of Contract Law Essay
No ratings yet
Example of Contract Law Essay
8 pages
Employee Salary and Position List
No ratings yet
Employee Salary and Position List
2 pages
UCSD UPS Program Orientation Guide
No ratings yet
UCSD UPS Program Orientation Guide
4 pages
F Maintenance Program Planning Services
100% (1)
F Maintenance Program Planning Services
272 pages
Game Devs: Master Procedural Design
No ratings yet
Game Devs: Master Procedural Design
2 pages
Cambridge First Certificate in English3 For Updated Exam Upper Intermediate Students Book With Answers Frontmatter PDF
100% (1)
Cambridge First Certificate in English3 For Updated Exam Upper Intermediate Students Book With Answers Frontmatter PDF
4 pages
!thewholetruthfoods
No ratings yet
!thewholetruthfoods
1 page
Automotive Electrical Assembly NC II
No ratings yet
Automotive Electrical Assembly NC II
70 pages
Encrypted Document Analysis
No ratings yet
Encrypted Document Analysis
16 pages
HeadgearX-5
No ratings yet
HeadgearX-5
4 pages

Unit IV Deep Leraning

Uploaded by

Unit IV Deep Leraning

Uploaded by

UNIT 4 CONVNETS 9 hours

Basic concepts of Convolutional Neural Networks starting from filtering. Convolution

Introduction to Convolution Neural Network

The Convolutional layer applies filters to the input image to extract

 Convolution layers consist of a set of learnable filters (or

Image source: cs231n.stanford.edu

 Flattening: The resulting feature maps are flattened into a

Output Layer: The output from the fully connected layers is

import matplotlib.pyplot as plt

from itertools import product

# set the param

# define the kernel

kernel = tf.constant([[-1, -1, -1],

[-1, -1, -1],

# load the image

image = tf.io.decode_jpeg(image, channels=1)

image = tf.image.resize(image, size=[300, 300])

# plot the image

plt.title('Original Gray Scale image')

image = tf.expand_dims(image, axis=0)

kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])

kernel = tf.cast(kernel, dtype=tf.float32)

# Plot the convolved image

# Reformat for plotting

Original Grayscale image

Advantages of Convolutional Neural Networks (CNNs):

Discussions on famous convent architectures - AlexNet, ZFNet, VGG, GoogLeNet,

 The Alan Turing Year

 The year of Sustainable Energy for All

What? The network consists of 5 Convolutional (CONV) layers and

Alexnet Block Diagram

How? The input to the network is a batch of RGB images of size

 Data augmentation is carried out to reduce over-fitting. This Data

 Although ReLU helps with the vanishing gradient problem, due to

 AlexNet also addresses the over-fitting problem by using drop-out

 International Year of Family Farming and Crystallography

 Year of Robin Williams’ death

Why? VGGNet was born out of the need to reduce the # of

What? There are multiple variants of VGGNet (VGG16, VGG19,

VGG16 Block Diagram (source: neurohive.io)

Let’s consider the following example. Say we have an input layer of

 Discovery of Gravitational Waves

 International year of soil and light-based technologies

 The Martian movie

 For example, say we have a fully connected multi-layer perceptron

 Another example is adding more layers to an existing neural

 The issues mentioned above happens because of the vanishing

ResNet addresses this network by introducing two types of

What? There are multiple versions of ResNetXX architectures

How? As mentioned earlier, ResNet architecture makes use of

Instead of learning the mapping from x →F(x), the network learns

 Trainable Mapping (Conv Layer): 1x1 Conv layer is used to map

 International Year of Family Farming and Crystallography

 First Robotic Landing on Comet

 Year of Robin Williams’ death

Why? In an image classification task, the size of the salient feature

What? The Inception network architecture consists of several

Inception Module (source: original paper)

Each inception module consists of four operations in parallel

 1x1 conv layer

 3x3 conv layer

 5x5 conv layer

Several comparisons can be drawn:

 VGGNet not only has a higher number of parameters and FLOP as

 Training an AlexNet takes about the same time as training

A Convolutional Neural Network (CNN, or ConvNet) are a

The network had a very similar architecture as LeNet by Yann

You might also like