0% found this document useful (0 votes)
7 views49 pages

Unit 2 Part 03

Uploaded by

harshithkataray1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views49 pages

Unit 2 Part 03

Uploaded by

harshithkataray1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Datatypes

• For CNNs, the data used usually has several channels


(single channel or multichannel) with different
dimensionalities (1-D, 2-D, or 3-D).

• Each of these channels represent an observation of a


different quantity at some point in space or time.

• The following table shows examples of data types with


different dimensionalities and number of channels.
Single channel Multichannel

1D Audio waveform: The axis we Skeleton animation data:


convolve over corresponds to Skeleton animation data represents

time. the Orientation of a character's


skeleton over time.

2D Audio data that has been Color image data: One channel
preprocessed with a Fourier contains the red pixels, one the
transform: We can transform green pixels, and one the blue
the audio waveform into a 2-D pixels. The convolution kernel
tensor with different rows moves over both the horizontal
corresponding to different and the vertical axes of the
frequencies and different image, conferring translation
columns corresponding to equivariance in both directions.
different points in time.
Datatypes

Single channel Multichannel

3D Volumetric data: A Color video data: One axis


common source of this corresponds to time, one to
kind of data is medical the height of the video
imaging technology, such frame, and one to the width
as CT scans. of the video frame.

• One advantage to convolutional networks is that they can


also process inputs with varying spatial extents.
Datatypes
• These kinds of input simply cannot be represented by
traditional matrix multiplication-based neural networks.

• This provides a compelling reason to use convolutional


networks.

• For example, consider a collection of images in which each


image has a different width and height.

• It is unclear how to model such inputs with a weight


matrix of fixed size.
Datatypes
• Convolution is straightforward to apply; the kernel is
simply applied with a different number of times depending
on the size of the input, and the output of the convolution
operation scales accordingly.

• Hence, Sometimes the output of the network as well as the


input is allowed to have variable size.
Efficient Convolution Algorithms
• The property of convolution in the frequency domain is
known as the Convolution Theorem, which states that
convolution in the spatial domain is equivalent to
element-wise multiplication in the frequency domain.
• Mathematically, if f(x) and g(x) are two functions with
Fourier transforms F(u) and G(u) respectively, then the
convolution (f∗g)(x) in the spatial domain corresponds to
the element-wise multiplication F(u)⋅G(u) in the
frequency domain.
• In other words:
• F{f∗g} = F(u) . G(u)
Efficient Convolution Algorithms
• Where:

F - denotes the Fourier transform.

∗ - denotes convolution.

⋅ - denotes element-wise multiplication.

u - represents the frequency domain variable.

• This property is extensively used in signal processing and image

processing, as it provides an efficient way to perform convolution

operations by transforming the signals/images into the frequency

domain, performing element-wise multiplication, and then

transforming back to the spatial domain using inverse Fourier

transform as given below.


Efficient Convolution Algorithms

• By leveraging this property, certain convolution


operations, especially those involving large filters or
multiple convolutions, can be accelerated using
techniques such as Fast Fourier Transform (FFT).
Convolution Networks and the History of
Deep Learning
• In deep learning field, Convolutional Neural Network
(CNN) is a class of artificial neural network, most
commonly used for image analysis.

• Since inception, CNN architectures have gone through


rapid evolution and in recent years they have achieved
results which were previously considered possible only via
human execution/intervention.

• Depending on the task at hand, and the corresponding


constraints, a wide variety of architectures are available
today.
Convolution Networks and the History of
Deep Learning
1. Neocognitron (1980)

• The Neocognitron is a type of artificial neural network


designed for pattern recognition, specifically for visual
pattern recognition tasks such as image classification.

• It was proposed by Kunihiko Fukushima in 1980 and is


considered as one of the pioneering models in the field of
convolutional neural networks (CNNs).

• The Neocognitron consists of multiple layers organized in a


hierarchical manner as shown in the Figure below.
Convolution Networks and the History of
Deep Learning

Figure : The architecture of Neocognitron


Convolution Networks and the History of
Deep Learning
• Each layer is composed of units called "cells" arranged in a grid-
like topology, similar to the organization of neurons in the
visual cortex of human brain.

• The Neocognitron uses two types of layers: S (simple) layers


and C (complex) layers.

• In the S layers, each cell is connected to local receptive fields in


the input and computes simple feature maps using linear filters
(templates).

• The C layers are composed of cells with larger receptive fields,


which combine and pool the outputs of neighboring S layer cells
to create more complex feature maps.
Convolution Networks and the History of
Deep Learning
• During this process, local features extracted in lower
stages using S layers are gradually integrated into
more global features using C layers.

• Neocognitron was used for handwritten (Japanese)


character recognition and other pattern recognition
tasks, and further paved the way for convolutional
neural networks.
Convolution Networks and the History of
Deep Learning
2. LeNet-5 (1989–1998)

• In the 1990s, Yann LeCun, Leon Bottou, Yosuha Bengio,


and Patrick Haffner proposed the LeNet-5 neural network
design for character recognition in both handwriting and
machine printing.

• The architecture is straightforward and simple to


understand and it is mostly used as a first step for
learning Convolutional Neural Network.

• LeNet-5 is a multi-layer convolution neural network for


image classification as shown in the Figure below.
Convolution Networks and the History of
Deep Learning

Figure : The architecture of LeNet-5

• The network has 5 layers with learnable parameters and

hence named Lenet-5. It has three sets of convolution

layers with a combination of average pooling.


Convolution Networks and the History of
Deep Learning
• After the convolution and average pooling layers, we have
two fully connected layers.

• At last, a Softmax classifier as output which classifies the


images into respective class.

• Step 1: The input to this model is a 32 X 32 grayscale


image hence the number of channels is one.

• We then apply the first convolution operation with the


filter size 5X5 and we have 6 such filters.

• As a result, we get a feature map of size 28X28X6.


Convolution Networks and the History of
Deep Learning
• Step 2: After the first convolution operation, we apply the
average pooling and the size of the feature map is reduced
by half i.e 14X14X6.
• Note that, the number of channels is same.
• Step 3: Next, we have a convolution layer with sixteen
filters of size 5X5.
• Now the feature map changed it is 10X10X16.
• Step 4: After this, we again applied an average pooling or
subsampling layer, which again reduce the size of the
feature map by half i.e 5X5X16.
Convolution Networks and the History of
Deep Learning
• Step 5: Then we have a final convolution layer of size 5X5
with 120 filters.

• Leaving the feature map size 1X1X120. After which


flatten result is 120 values.

• Step 6: After these convolution layers, we have a fully


connected layer with 84 neurons.

• Step 7: At last, we have an output layer with 10 neurons


since the data have 10 classes.
Convolution Networks and the History of
Deep Learning
• The Softmax gives the probability that a data point
belongs to a particular class (predicting the highest
value).

• The number of trainable parameters of this architecture


is around 60000.

3. AlexNet (2012)

• AlexNet won the ImageNet Large Scale Visual


Recognition Challenge (ILSVRC) in 2012, with a
significant margin of error reduction compared to the
previous state-of-the-art model.
Convolution Networks and the History of Deep
Learning
• It was designed by Alex Krizhevsky, Ilya Sutskever, and
Geoffrey Hinton at the University of Toronto.

• In this model, the depth of the network was increased in


comparison to Lenet-5.

• The Alexnet has 8-layers with learnable parameters.

• The model consists of 5-layers with a combination of max


pooling followed by 3 fully connected layers and they use
Relu activation in each of these layers except at the
output layer.

• Figure below shows the architecture of AlexNet.


Convolution Networks and the History of
Deep Learning

• Figure : The architecture of AlexNet


Convolution Networks and the History of
Deep Learning
• Using ReLu as an activation function the speed of the
training process accelerated by almost six times.

• AlexNet also used the dropout layers, that prevented their


model from overfitting issue.

• Further, the model is trained on the ImageNet dataset


which has almost 14 million images across a 1000 classes.

• This architecture of the Alexnet model has a total of 62.3


million learnable parameters.
• The Table below shows the stepwise operation in each layer of AlexNet
Layer Filter/ Filter Size Stride Padding Size of Activation
Kernel feature map function
Input - - - - 227x227x3 -
Conv1 96 11x11 4 - 55x55x96 ReLU
MaxPool1 - 3x3 2 - 27x27x96 -
Conv2 256 5x5 1 2 27x27x256 ReLU
MaxPool2 - 3x3 2 - 13x13x256 -
Conv3 384 3x3 1 1 13x13x384 ReLU
Conv4 384 3x3 1 1 13x13x384 ReLU
Conv5 256 3x3 1 1 13x13x256 ReLU
MaxPool3 - 3x3 2 - 6x6x256 -
Dropout1 Rate=0.5 - - - 6x6x256 -

FullyConnected1 - - - - 9216 ReLU

Dropout2 Rate=0.5 - - - 4096 ReLU

FullyConnected2 - - - - 4096 ReLU

FullyConnected3 - - - - 1000 SoftMax


Convolution Networks and the History of
Deep Learning
4. VGGNet(2014)

• VGG stands for Visual Geometry Group; it is a standard

deep Convolutional Neural Network (CNN) architecture

with multiple layers.

• The “deep” refers to the number of layers with VGG-16 or

VGG-19 consisting of 16 and 19 convolutional layers.

• The VGG architecture is the basis for object recognition

models and is developed as a deep neural network.


Convolution Networks and the History of
Deep Learning
• The VGGNet is also a baseline for many tasks and datasets

and the most popular image recognition architecture.

Figure below shows the basic architecture of VGGNet16.


Convolution Networks and the History of
Deep Learning
• VGGNet16 which is a convolutional neural network model
proposed by A. Zisserman and K. Simonyan from the
University of Oxford.

• It was argued that by making CNN deeper, one can solve


problems better and get a lower error rate on the
ImageNet classification challenge.

• Hence, Multiple architectures of different depths were


tried.
The input to VGG based convNet is a 224*224 RGB image.
Convolution Networks and the History of
Deep Learning
• The training images are passed through a stack of
convolution layers. There are total of 13 convolutional
layers and 3 fully connected layers in VGG16
architecture.

• VGG has smaller filters (3*3) with more depth instead of


having large filter size.

• Another variation of VGGNet has 19 weight layers


consisting of 16 convolutional layers with 3 fully connected
layers and same 5 pooling layers.
Convolution Networks and the History of
Deep Learning
• In both variation of VGGNet there consists of two Fully
Connected layers with 4096 channels each which is followed by
another fully connected layer with 1000 channels to predict
1000 labels.

• Lastly the fully connected layer uses softmax layer for


classification purpose.

• Figure below shows the architecture layers of VGGNet16 and


VGGNet19.

• Notably VGG-16 consists of 138M parameters and has a


significant memory overhead of 48.6 MB compare to AlexNet
that has 1.9 MB.
Convolution Networks and the History of
Deep Learning
5. GoogLeNet/Inception (2014)

• The 2014 paper: “Going deeper with convolutions” from


Google introduced the Inception module architecture,
which has come to be known as Inception-v1 or GoogLeNet.

• GoogLeNet also focused on deeper networks but with the


objective of greater efficiency to reduce parameter count,
memory usage, and computation.

• GoogLeNet is a 22-layer deep convolutional neural


network, a variant of the Inception Network.
Convolution Networks and the History of
Deep Learning
• The GoogLeNet architecture presented in the ImageNet
Large-Scale Visual Recognition Challenge
2014(ILSVRC14) solved computer vision tasks such as
image classification and object detection.

• Researchers discovered that an increase of layers and


units within a network led to a significant performance
gain.

• But increasing the layers create more extensive networks


cost.
Convolution Networks and the History of
Deep Learning
• Large networks are prone to overfitting and suffer from
either exploding or vanishing gradient problem.

• The GoogLeNet architecture solved most of the problems


that large networks faced, mainly through the Inception
module's utilization.

• The Inception module is a neural network architecture


that leverages feature detection at different scales
through convolutions with different filters and reduced
the computational cost of training an extensive network
through dimensional reduction.
Convolution Networks and the History of
Deep Learning
• An inception network is a deep neural network with an
architectural design that consists of repeating
components referred to as Inception modules.

• Inception-v1 uses a combination of:

1. repeated inception modules that use convolution filters


of different sizes in parallel to attend to multiple scales at
the same time.

2. global average pooling to replace fully connected layers


at the end of typical CNNs.
Convolution Networks and the History of
Deep Learning
3. auxiliary classifiers (for training only) to combat the
vanishing gradient problem and help with regularization.
• Additional classification layers added to intermediate
layers of a deep neural network.

Figure : The architecture of GoogLeNet


Convolution Networks and the History of
Deep Learning
6. ResNet(2015)

• ResNet stands for Residual Neural Network and is a type

of convolutional neural network (CNN).

• It was designed to tackle the issue of vanishing

gradients in deep networks, which was a major problem in

developing deep neural networks.

• The ResNet architecture enables the network to learn

multiple layers of features without getting stuck in local

minima, a common issue with deep networks.


Convolution Networks and the History of
Deep Learning
• The key features of the ResNet (Residual Network)
architecture are :

• 1. Residual Connections: ResNet incorporates residual


connections, which allow for training very deep neural
networks and solve the vanishing gradient problem.

• 2. Identity Mapping: ResNet uses identity mapping as the


residual function, which makes the training process easier
by learning the residual mapping rather than the actual
mapping.
Convolution Networks and the History of
Deep Learning
• 3. Depth: ResNet enables the creation of very deep neural
networks, which can improve performance on image
recognition tasks.

• 4. Fewer Parameters: ResNet achieves better results with


fewer parameters, making it computationally more
efficient.

5. State-of-the-art Results: ResNet has achieved state-of-


the-art results on various image recognition tasks and has
become a widely used benchmark for image recognition
tasks.
Convolution Networks and the History of
Deep Learning
• Problems of plain network

• For conventional deep learning networks, usually


convolution layers without fully connected (FC) layers for
classification task, like AlexNet, ZFNet and VGGNet,
without any skip / shortcut connection.

• we call them plain networks, and when the plain network


is deeper (layers are increased), the problem of
vanishing/exploding gradients occurs.
Convolution Networks and the History of
Deep Learning
• When the network is deep, and multiplying n of these
small numbers will become zero (vanished).

• When the network is deep, and multiplying n of these


large numbers will become too large (exploded).

• Solution to above problems

• To solve the problem of vanishing/exploding gradients, a


skip / shortcut connection is added to add the input x to
the output after few weight layers as below:
Convolution Networks and the History of
Deep Learning

• Hence, the output H(x)= F(x) + x. The weight layers


actually is to learn a kind of residual mapping:

F(x)=H(x)-x.
Convolution Networks and the History of
Deep Learning
• For ResNet, there are 3 types of skip / shortcut connections
when the input dimensions are smaller than the output
dimensions. Figure below shows the architecture of ResNet15
APPLICATIONS OF CNN
• Some real-life applications of CNN include:

1. Image Recognition: CNNs can be used to identify objects,


faces, people, and other features in images. For example,
they can be used in face recognition systems to identify
individuals in digital images or videos.

2. Video Classification: CNNs are very effective in video


classification because they are designed to recognize and
detect patterns in images, which can be applied to video
frames.
APPLICATIONS OF CNN
• CNN's are able to learn from videos by extracting frames
as individual images and then analyzing them
individually.

• They also have the ability to recognize and classify objects


in the scene.

• CNNs to classify videos better than other methods, such as


traditional machine-learning techniques.

3. Self-driving Cars: CNNs are used in self-driving cars to


identify objects in images or videos.
APPLICATIONS OF CNN
• CNN can help self-driving cars recognize pedestrians,
cyclists, and other vehicles on the road.

• CNN can also help the car to detect obstacles such as


potholes, cracks in the road, or even fallen tree branches.

• CNN can also be used to detect lane lines and traffic lights.
This helps the car stay in its lane and obey traffic rules.

4. Natural Language Processing:

• CNNs can be used for natural languages processing tasks,


such as sentiment analysis and document classification.
APPLICATIONS OF CNN
• NLP is a subfield of AI that focuses on understanding and
generating natural language. By training CNNs on large
datasets of text, researchers are able to develop systems
that can understand and generate human language.

• The NLP applications include chatbots, question-


answering, and machine translation.

6. Medical Imaging: CNNs can be used to diagnose diseases


by recognizing patterns in medical images, such as X-rays,
CT scans, and MRI scans.
APPLICATIONS OF CNN
• By training CNNs on large datasets of medical images,
researchers are able to develop systems that are able to
accurately diagnose diseases.

• CNN's have also been used in the field of drug discovery.


By training CNNs on large datasets of molecular images,
researchers are able to identify potential drug targets for
various diseases and conditions.

6. Analyzing Documents: Document analysis using


Convolutional Neural Networks (CNNs) involves applying
CNN architectures to extract features and make predictions
from documents or text data.
APPLICATIONS OF CNN
• CNNs, which are primarily used in computer vision tasks,
have been adapted and extended to handle document
analysis tasks such as text classification, document
classification, and optical character recognition (OCR).

• 7.Understanding Climate

• Understanding climate using Convolutional Neural


Networks (CNNs) involves leveraging CNN architectures
to analyze and extract features from climate-related data,
such as satellite imagery, weather maps, climate model
outputs, and other geospatial datasets.
APPLICATIONS OF CNN
• CNNs can be applied to various tasks related to climate
analysis, including weather prediction, extreme event
detection, climate pattern recognition, and climate
change analysis.

• CNNs offer a versatile and powerful tool for climate


research, enabling scientists to analyze large-scale
climate data, extract meaningful features, and make
predictions and insights.
APPLICATIONS OF CNN
8. Advertising:

• Using Convolutional Neural Networks (CNNs) for


advertising involves leveraging image recognition and
analysis capabilities to enhance ad targeting, content
creation, and performance optimization.

• CNNs can be applied to various aspects of advertising,


including image classification, object detection, content
generation, and ad performance analysis.

You might also like