Deep Learning-Notes
Deep Learning-Notes
COMPUTERSCIENCEANDENGINEERING
DIGITALNOTES
ON
DEEPLEARNING
(22IT4177)
SYLLABUS
[Link] L/T/P/C3/-/-/-3
(R20A6610)DEEPLEARNING
"UNIT I
Introduction- Historical Trends in Deep Learning, McCulloch Pitts Neuron, Thresholding Logic,
Perceptron, Perceptron Learning Algorithm. Representation Power of MLPs, Sigmoid Neurons,
Gradient Descent, Feed forward Neural Networks, Representation Power of Feed forward
Neural Networks."
"UNIT II
Feed Forward Neural Networks: - Back propagation, Gradient Descent (GD), Momentum Based
GD, Nesterov Accelerated GD, Stochastic GD, AdaGrad, RMS Prop, Adam, Eigenvalues and
Eigenvectors, Eigenvalue Decomposition, Basis Principal Component Analysis and its
interpretations, Singular Value Decomposition."
UNIT III
Auto encoders:- Relation to PCA, Regularization in auto encoders, Denoising auto encoders,
Sparse auto encoders, Contractive auto encoders, Regularization: Bias Variance Tradeoff, L2
regularization, early stopping, Dataset augmentation, Parameter sharing and tying, Injecting
noise at input, Ensemble methods, Dropout, Greedy Layer wise Pre-training, Better activation
functions, better weight initialization methods, Batch Normalization."
"UNIT IV
Convolutional Neural Network: - The Convolution Operation, Motivation, Pooling,Convolution
and Pooling as an Innitely Strong Prior, Variants of the Basic Convolution Function, Structured
Outputs, Data Types, LeNet, AlexNet, ZF-Net, VGGNet, GoogLeNet,ResNet, Visualizing
Convolutional Neural Networks, Guided Back propagation, Deep Dream, Deep Art, Fooling
Convolutional Neural Networks.
"UNIT V
Recurrent Neural Networks-Back propagation through time (BPTT), Vanishing and Exploding
Gradients, Truncated BPTT, GRU, LSTMs, Encoder Decoder Models, Attention Mechanism,
Attention over images.
[Link]–CSE R-22
INTRODUCTIONTODEEPLEARNING:
Deep learning is a branch of machine learning which is based on artificial neural
networks. It is capable of learning complex patterns and relationships within data. In deep
learning, we don’t need to explicitly program everything. It has become increasinglypopular
in recent years due to the advances in processing power and the availability oflargedatasets.
Because it is based on artificial neural networks (ANNs) also known as deep neural networks
(DNNs). These neural networks are inspired by the structure and functionofthehumanbrain’s
biologicalneurons,andtheyaredesignedtolearnfromlarge
amounts of data.
1. Deep Learning is a subfield of Machine Learning that involves the use of neural
networks to model and solve complex problems. Neural networks are modeled
after the structure and function of the human brain and consist of layers of
interconnected nodes that process and transform data.
2. The key characteristic of Deep Learning is the use of deep neural networks,which
have multiple layers of interconnected nodes. These networks can learn complex
representations of data by discovering hierarchical patterns andfeatures in the
data. Deep Learning algorithms can automatically learn and improve from data
without the need for manual feature engineering.
3. Deep Learning has achieved significant success in various fields, including image
recognition, natural language processing, speech recognition, and
recommendation systems. Some of the popular Deep Learning architectures
include Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Deep Belief Networks (DBNs).
4. Training deep neural networks typically requires a large amount of data and
computational resources. However, the availability of cloud computing and the
developmentofspecializedhardware,suchasGraphicsProcessingUnits (GPUs), has
made it easier to train deep neural networks.
In summary, Deep Learning is a subfield of Machine Learning that involves the useof
deep neural networks to model and solve complex problems. Deep Learning
hasachievedsignificantsuccessinvariousfields,anditsuseisexpectedtocontinuetogrowasmore
data becomes available, and more powerful computing resources becomeavailable.
DeepLearning
[Link]–CSE R-22
WhatisDeepLearning?
Deeplearningisthebranch of “ MachineLearning ”whichisbasedon artificial neural
network architecture. An artificial neural network or ANN uses layers of interconnected
nodes called neurons that work together to process and learn from theinput data.
In a fully connected Deep neural network, there is an input layer and one or more
[Link] previous layer
[Link] oneneuronbecomestheinputtootherneuronsin the
next layer of the network, and this process continues until the final layer produces the
output of the network. The layers of the neural network transform the input data through a
series of nonlinear transformations, allowing the network to learn complex representations
of the input data.
DeepLearning
[Link]–CSE R-22
Today, Deep learning has become one of the most popular and visible areas of
machine learning, due to its success in a variety of applications, such as computer vision,
natural language processing, and Reinforcement learning.
Deep learning can be used for supervised, unsupervised as well as reinforcement
machine learning. it uses a variety of ways to process these.
Supervised Machine Learning: Supervised machine learning is the
machinelearning technique in which the neural network learns to make
predictions or classify data based on the labeled datasets. Here we input both
input features along with the target variables. the neural network learns to make
predictions based on the cost or error that comes from the difference between
thepredictedandtheactualtarget,thisprocessisknownasbackpropagation. Deep
learning algorithms like Convolutional neural networks, Recurrent neural
networks are used for many supervised tasks like image classifications and
recognition, sentiment analysis, language translations, etc.
UnsupervisedMachineLearning: Unsupervisedmachinelearning is the
machinelearning techniqueinwhich
theneuralnetworklearnstodiscoverthepatterns or to cluster the dataset based on
unlabeled datasets. Here thereare no target variables. while the machine has to
self-determined the hidden patterns or relationships within the datasets. Deep
learning algorithms like autoencodersand generative models are used for
unsupervised tasks like clustering, dimensionality reduction, and anomaly
detection.
ReinforcementMachineLearning: ReinforcementMachineLearning is the
machinelearning techniqueinwhichanagentlearnstomakedecisionsin an
environment to maximize a reward signal. The agent interacts with the
environment by taking action and observing the resulting rewards. Deeplearning
can be used to learn policies, or a set of actions, that maximizes the
[Link] Deep Q
networks and Deep Deterministic Policy Gradient (DDPG) are used to reinforce
tasks like robotics and game playing etc.
Artificialneuralnetworks:
“Artificialneuralnetworks” arebuiltontheprinciplesofthestructureand
operationofhumanneurons. Itis alsoknown as neural networks or neural nets. An artificial
neural network’s input layer, which is the first layer, receives input from external sources
and passes it on to the hidden layer, which is the second layer. Each neuron in the hidden
layer gets information from the neurons in the previous layer, computes the
weightedtotal,andthentransfersit tothe neuronsinthe [Link] are
weighted, which means that the impacts of the inputs from the preceding layer aremore or
lessoptimized bygivingeach input adistinct [Link] weightsare then adjustedduring the
training process to enhance the performance of the model.
DeepLearning
[Link]–CSE R-22
FullyConnectedArtificialNeuralNetwork
Artificial neurons, also known as units, are found in artificial neural networks. The
wholeArtificialNeuralNetwork iscomposed oftheseartificialneurons,whichare arranged in a
series of layers. The complexities of neural networks will depend on the complexities of the
underlying patterns in the dataset whether a layer has a dozen units or millions of
[Link], Artificial Neural Network has an inputlayer,anoutputlayer as well ashidden
layers. The input layer receives data from the outside world which the neural network needs
to analyze or learn about.
Ina fullyconnectedartificialneural network,thereis aninputlayerandone or more
hidden layers connected one after the other. Each neuron receives input from the previous
layer neurons or the input layer. The output of one neuron becomes the input to other
neurons in the next layer of the network, and this process continues until the final layer
producestheoutputofthenetwork. Then, afterpassing throughoneormorehidden layers, this
data is transformed into valuable data for the output layer. Finally, the output layer provides
an output in the form of an artificial neural network’s response to the data that comes in.
Units are linked to one another from one layer to another in the bulk of neural
networks. Each of these links has weights that control how much one-unit influences
[Link] neural networklearnsmore and more aboutthedataasitmovesfromoneunit to
another, ultimately producing an output from the output layer.
DifferencebetweenMachineLearningandDeepLearning:
Machinelearninganddeeplearningbotharesubsetsofartificialintelligencebut there
are many similarities and differences between them.
DeepLearning
[Link]–CSE R-22
MachineLearning DeepLearning
Requiresthelargervolumeofdataset
Canworkonthesmalleramountofdataset
compared to machine learning
Takeslesstimetotrainthemodel. Takesmoretimetotrainthemodel.
Lesscomplexandeasytointerprettheresult. Morecomplex,itworksliketheblackbox
interpretationsoftheresultarenoteasy.
Typesofneuralnetworks:
DeepLearningmodelsareabletoautomaticallylearnfeaturesfromthedata,
whichmakesthemwell-suitedfortaskssuchasimagerecognition,speechrecognition,
[Link]
DeepLearning
[Link]–CSE R-22
feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural
networks (RNNs).
Feedforward neural networks (FNNs)are the simplest type of ANN, with a linear flow of
information through the network. FNNs have been widely used for tasks such as image
classification, speech recognition, and natural language processing.
Convolutional Neural Networks (CNNs)are specifically for image and video recognitiontasks.
CNNs are able to automatically learn features from the images, which makes them well-
suitedfortaskssuchasimageclassification,objectdetection,andimage segmentation.
Recurrent Neural Networks (RNNs)are a type of neural network that is able to process
sequential data, such as time series and natural language. RNNs are able to maintain an
internalstatethatcapturesinformationaboutthepreviousinputs,whichmakesthem well-
suitedfortaskssuchasspeechrecognition,naturallanguageprocessing,and language
translation.
ApplicationsofDeepLearning:
The main applications of deep learning can be divided into computer vision, natural
language processing (NLP), and reinforcement learning.
Computervision
In computer vision, Deep learning models can enable machines to identify and
understand visual data. Some of the main applications of deep learning in computer vision
include:
Object detection and recognition: Deep learning model can be used to identify
and locate objects within images and videos, making it possible for machines to
perform tasks such as self-driving cars, surveillance, and robotics.
Image classification: Deep learning models can be used to classify images into
categories such as animals, plants, and buildings. This is used in applicationssuch
as medical imaging, quality control, and image retrieval.
DeepLearning
[Link]–CSE R-22
ChallengesinDeepLearning:
Deep learning has made significant advancements in various fields, but there are still
some challenges that need to be addressed. Here are some of the main challenges in
deep learning:
[Link]: [Link]
learningit’sabigconcern togatherasmuchdatafor training.
DeepLearning
[Link]–CSE R-22
AdvantagesofDeepLearning:
DisadvantagesofDeepLearning:
1. Highcomputationalrequirements:DeepLearningmodelsrequirelarge amounts of
data and computational resources to train and optimize.
2. Requires large amounts of labeled data: Deep Learning models often require a
large amount of labeled data for training, which can be expensive and time-
consuming to acquire.
3. Interpretability:DeepLearningmodelscanbechallengingtointerpret,making
itdifficulttounderstandhowtheymakedecisions. Overfitting: Deep Learning
models can sometimesoverfittothetrainingdata, resultinginpoor performance on
new and unseen data.
4. Black-box nature: Deep Learningmodels are often treated as black boxes,making
it difficult to understand how they work and how they arrived at theirpredictions.
In summary, while Deep Learning offers many advantages,including
high accuracy and scalability, it also has some disadvantages, such as high
computational requirements, the need for large amounts of labeled data,
[Link] considered
when deciding whether to use Deep Learning for a specific task.
DeepLearning
[Link]–CSE R-22
HistoricalTrendsinDeepLearning:
Deep learning has experienced significant historical trends since its
inception. Here are some key milestones and trends that have
shaped the field:
4. RiseofConvolutionalNeuralNetworks(CNNs):Inthelate1990sand early
2000s, CNNs gained prominence in the fieldof computer vision.
• TheLeNet-5architecturedevelopedbyYannLeCunrevolutionized
imagerecognitiontasksanddemonstratedthepotentialofdeep learning
in visual perception.
5. BigDataandGPUs:Theearly2010smarkedaturningpointfor
deeplearningwiththeadventofbigdataandtheavailabilityofpowerful
Graphics Processing Units (GPUs).
• Theabundanceoflabeleddata,combinedwithGPUacceleration,
enabled the training of large-scale deep neuralnetworks and
significantly improved performance.
6. ImageNetandDeepLearningRenaissance:TheImageNetLargeScale
VisualRecognitionChallengein2012,wonbyadeepneuralnetworkknown as
AlexNet, brought deep learning into the spotlight.
• This event sparked a renaissance in the field, encouraging
researcherstoexploredeeplearningarchitecturesandtechniques
across various domains.
7. DeepLearninginNaturalLanguageProcessing(NLP):Deeplearning
DeepLearning
[Link]–CSE R-22
10. ExplainabilityandInterpretability:Asdeeplearningmodels
have become increasingly complex, researchers havefocused on
improving their explainability and interpretability.
• Techniques like attention mechanisms, saliency maps, and
model-agnosticinterpretabilitymethodsaimtoshedlightonthe
decision-making processes of deep learning models.
WhyDLisGrowing:
• ProcessingpowerneededforDeeplearningisreadilybecomingavailableusing
GPUs, Distributed Computing and powerful CPUs.
• Moreover,asthedataamountgrows,DeepLearningmodelsseemtooutperform
Machine Learning models.
• Focusoncustomizationandrealtimedecision.
• [Link]
features (super variables) without significant manual feature engineering.
DeepLearning
[Link]–CSE R-22
ProcessinML/DL:
ArtificialNeuralNetworks:
Artificial Neural Networks contain artificial neurons which are called units. These
units are arranged in a series of layers that together constitute the whole Artificial Neural
Network in a system.
A layer can have only a dozen units or millions of units as this depends on how the
complex neural networks will be required to learn the hidden patterns in the dataset.
Commonly, Artificial Neural Network has an input layer, an output layer as well as hidden
layers.
The input layer receives data from the outside world which the neural network needs
to analyze or learn about. Then this data passes through one or multiple hidden layers that
transform the input into data that is valuable for the output layer. Finally, the output layer
provides an output in the form of a response of the Artificial Neural Networks to input data
provided.
In the majority of neural networks, units are interconnected from one layer toanother.
Each of these connections has weights that determine the influence of one unit on another
unit. As the data transfers from one unit to another, the neural network learns more and more
about the data which eventually results in an output from the output layer.
DeepLearning
[Link]–CSE R-22
Thestructuresandoperationsofhumanneuronsserveasthebasisforartificial neural
networks. It is also known as neural networks or neural nets. The input layer of an artificial
neural network is the first layer, and it receives input from external sources and releases it to
the hidden layer, which is the second layer. In the hidden layer, each neuron receives input
from the previous layer neurons, computes the weighted sum, and sends it to the neurons in
the next layer.
These connections are weighted means effects of the inputs from the previous layer
are optimized more or less by assigning different-different weights to each input and it is
adjusted during the training process by optimizing these weights for improved model
performance.
ArtificialneuronsvsBiologicalneurons
Theconceptofartificialneuralnetworkscomesfrombiologicalneuronsfoundin animal
brains So they share a lot of similarities in structure and function wise.
Structure: The structure of artificial neural networks is inspired by biological
neurons. A biological neuron has a cell body or soma to process the impulses,
dendrites to receive them, and an axon that transfers them to other [Link]
input nodes of artificial neural networks receive input signals, the hidden layer
nodes compute these input signals, and the output layer nodes compute the final
output by processing the hidden layer’s results using activation functions.
BiologicalNeuron ArtificialNeuron
Dendrite Inputs
DeepLearning
[Link]–CSE R-22
BiologicalNeuron ArtificialNeuron
CellnucleusorSoma Nodes
Synapses Weights
Axon Output
Synapses: Synapses are the links between biological neurons that enable the
transmission of impulses from dendrites to the cell body. Synapses are
theweightsthatjointheone-layernodestothenext-layernodesinartificial neurons. The
strength of the links is determined by the weight value.
Learning: In biological neurons, learning happens in the cell body nucleus or
soma,[Link] potential is
produced and travels through the axons if the impulses are powerful enough to
reach the threshold. This becomes possible by synaptic plasticity,which represents
the ability of synapses to become stronger or weaker over timein reaction to
changes in their activity. In artificial neural networks, backpropagation is a
technique used for learning, which adjusts the weights
betweennodesaccordingtotheerror ordifferences betweenpredictedand actual
outcomes.
BiologicalNeuron ArtificialNeuron
Synapticplasticity Backpropagations
DeepLearning
[Link]–CSE R-22
HowdoArtificialNeuralNetworkslearn?
Artificial neural networks are trained using a training set. For example, suppose you
want to teach an ANN to recognize a cat. Then it is shown thousands of different images of
[Link] trained enough
using images of cats, then you need to check if it can identify cat images correctly. This is
done by making the ANN classify the images it is provided by deciding whether they are cat
images or [Link] output obtained by the ANN is corroborated by a human-provided
description of whether the image is a cat image or not.
If the ANN identifies incorrectly then back-propagation is used to adjust whatever it
has learned during training. Backpropagationis done by fine-tuning the weights of the
connections in ANN units based on the error rate obtained. This process continues until the
artificial neural network can correctly recognize a cat in an image with minimal possibleerror
rates.
WhatarethetypesofArtificialNeuralNetworks?
DeepLearning
[Link]–CSE R-22
ApplicationsofArtificialNeuralNetworks
1. Social Media: Artificial Neural Networks are used heavily in Social Media. For
example,let’stakethe ‘Peopleyoumayknow’ featureonFacebookthat
suggestspeoplethatyoumightknowinreallifesothatyoucansendthem friend requests.
Well, this magical effect is achieved by using Artificial Neural Networks that
analyzeyourprofile, yourinterests, yourcurrent friends, and also theirfriends and
various other factors to calculate the people you mightpotentially know. Another
common application of Machine Learningin social media is facial recognition.
This is done by finding around 100 reference points on the person’s face and then
matching them with those already available in the database using convolutional
neural networks.
2. MarketingandSales:WhenyoulogontoE-commercesiteslikeAmazonand
Flipkart,theywillrecommendyourproductstobuybasedonyourpreviousbrowsing
history. Similarly, suppose you love Pasta, then Zomato, Swiggy,
[Link]
[Link]-agemarketingsegmentslikeBook
sites,Movieservices,Hospitalitysites,[Link]
marketing. This uses Artificial Neural Networks to identify the customer
likes,dislikes, previous shopping history, etc., and thentailor the marketing
campaigns accordingly.
3. Healthcare: Artificial Neural Networks are used in Oncology to train algorithms
thatcanidentify canceroustissueatthemicroscopiclevelatthesameaccuracy as trained
physicians. Various rare diseases may manifest in physical characteristics and can
be identified in their premature stages by using Facial Analysis on the patient
photos. So the full-scale implementation of Artificial Neural Networks in the
healthcare environment can only enhance the diagnostic abilities of medical
experts and ultimately lead to the overall improvement in the quality of medical
care all over the world.
4. Personal Assistants: Applications like Siri, Alexa, Cortana, etc., and also heard
thembasedonthephonesyouhave!!!Thesearepersonalassistantsandan
DeepLearning
[Link]–CSE R-22
NeuralNetwork,Non-
linearclassificationexampleusingNeuralNetworks:
XOR/XNOR:
XORproblemwithneuralnetworks:
X Y Output
0 0 0
0 1 1
1 0 1
1 1 0
Output=X.Y’+X’.Y
DeepLearning
[Link]–CSE R-22
Thelinearseparabilityofpoints
So here we can see that the pink dots and red triangle points in the
plot do not overlap each other and the linear line is easily separating the
two classes where the upper boundary of the plot can be considered as one
classification and the below region can be considered as the other region of
classification.
DeepLearning
[Link]–CSE R-22
Needforlinearseparabilityinneuralnetworks
HowtosolvetheXORproblemwithneuralnetworks:
DeepLearning
[Link]–CSE R-22
Example:ForX1=0andX2=[Link].
Solution:
ConsideringX1=0andX2=0
H1=RELU(0.1+0.1+0)=0
H2=RELU(0.1+0.1+0)=0
So now we have obtained the weights that were propagated from the
input layertothehidden layer. Now,letus propagate fromthehiddenlayer to
the output layer.
Y=RELU(0.1+0.(-2))=0
DeepLearning
[Link]–CSE R-22
Perceptronusesthestepfunctionthatreturns+1iftheweightedsumof itsinput
0 and -1.
Aregularneuralnetworklookslikethis:
DeepLearning
[Link]–CSE R-22
Theperceptronconsistsof4parts.
o InputvalueorOneinputlayer:Theinputlayeroftheperceptronismadeofartificial
input neurons and takes the initial data into the system for furtherprocessing.
o WeightsandBias:
Weight: It represents the dimension or strength of the connection between
units.Iftheweighttonode1tonode2hasahigherquantity,thenneuron1 has a
more considerable influence on the neuron.
Bias: It is the same as the intercept added in a linear equation. It is an
additionalparameterwhichtaskistomodifytheoutputalongwiththeweighted
sum of the input to the other neuron.
o Netsum:Itcalculatesthetotalsum.
o ActivationFunction: Aneuroncanbeactivatedornot,isdeterminedbyan
[Link]
further adding bias with it to give the result.
Astandardneuralnetworklookslikethebelowdiagram.
DeepLearning
[Link]–CSE R-22
How doesitwork?
Theperceptronworksonthesesimplestepswhicharegivenbelow:
a. Inthefirststep,alltheinputsxaremultipliedwiththeirweightsw.
DeepLearning
[Link]–CSE R-22
b. Inthisstep,addalltheincreasedvaluesandcallthemtheWeightedsum.
c. Inthelaststep,[Link]
Example:
AUnitStepActivationFunction,
[Link]
neural networks as follows-
o SingleLayerPerceptron
DeepLearning
[Link]–CSE R-22
o Multi-LayerPerceptron
SingleLayerPerceptron
The single-layer perceptron was the first neural network model, proposed in
1958 by Frank Rosenbluth. It is one of the earliest models for learning. Our goal is to
find a linear decision function measured by the weight vector w and the bias
parameter b.
[Link] neuron's
local memory contains a vector of weight. The single vector perceptron
iscalculatedbycalculatingthesumoftheinputvectormultipliedbythecorrespondingelemen
tofthevector,witheachincreasingtheamountofthecorresponding component of the
vector by weight. The value that is displayed in the output is theinput of an activation
function.
Now,wehavetodothefollowingnecessarystepsoftraininglogisticregression-
o Theweightsareinitializedwiththerandomvaluesattheoriginationof
eachtraining.
DeepLearning
[Link]–CSE R-22
o For each element of the training set, the error is calculated with the difference
between the desired output and the actual output. The calculated error isused
to adjust the weight.
o The process is repeated until the fault made on the entire training set is less
than the specified limit until the maximum number of iterations has been
reached.
A multi-layer perceptron has one input layer and for each input, there is one
neuron (or node), it has one output layer with a single node for each output anditcan
have any number of hidden layers and each hidden layer can have any
[Link]-LayerPerceptron(MLP)isdepicted below.
In the multi-layer perceptron diagram above, we can see that there are three inputsand
thus three input nodes and the hidden layer has three nodes. The output layer gives two
outputs, therefore there are two output nodes. The nodes in the input layer take input and
forward it for further process, in the diagram above the nodes in the input layer forwardstheir
output to each of the three nodes in the hidden layer, and in the same way, the hidden layer
processes the information and passes it to the output layer.
DeepLearning
[Link]–CSE R-22
Every node in the multi-layer perception uses a sigmoid activation function. The
sigmoidactivationfunctiontakesrealvaluesasinputandconvertsthemtonumbers between 0 and 1
using the sigmoid formula.
FeedForwardNetwork:
Whyareneuralnetworksused?
Machine learning models are built on assumptions such as the one where X and Y are
related. An Inductive Bias of linear regression is the linear relationship between X and Y. In
this way, a line or hyperplane gets fitted to the data.
Whatisafeedforwardneuralnetwork?
Feed forward neural networks are artificial neural networksin which nodes do notform
loops. This type of neural network is also known as a multi-layer neural network as all
information is only passed forward.
During data flow, input nodes receive data, which travel through hidden layers, and
exit output nodes. Nolinks exist in the network that could get used to bysending information
back from the output node.
Afeed forwardneuralnetworkapproximatesfunctionsinthefollowingway:
Analgorithmcalculatesclassifiersbyusingtheformulay=f*(x).
Inputxisthereforeassignedtocategoryy.
Accordingtothefeedforwardmodel,y=f(x; θ).Thisvaluedeterminesthe
closestapproximation of the function.
Feed forward neural networks serve as the basis for object detection in photos, as
shown in the Google Photos app.
DeepLearning
[Link]–CSE R-22
Whatistheworkingprincipleofafeedforwardneuralnetwork?
When the feed forward neural network gets simplified, it can appear as a single layer
perceptron.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum. As long as the sum of the values
rises aboveacertain threshold, set at zero,theoutput valueis usually1, whileifit falls below the
threshold, it is usually -1.
As a feed forward neural network model, the single-layer perceptron often gets used
for classification. Machine learning can also get integrated into single-layer perceptrons.
Through training, neural networks can adjust their weights based on a property called thedelta
rule, which helps them compare their outputs with the intended values.
Layersof feedforwardneuralnetwork
DeepLearning
[Link]–CSE R-22
Inputlayer:
The neurons of this layer receive input and pass it on to the other layers of the
network. Feature or attribute numbers in the dataset must match the number of
neurons in the input layer.
Outputlayer:
According to the type of model getting built, this layer represents the forecasted
feature.
Hiddenlayer:
Input and output layers get separated by hidden layers. Depending on the type of
model, there may be several hidden layers.
Neuronweights:
Neurons:
Artificial neurons get used in feed forward networks, which later get adapted from
biological neurons. Aneuralnetwork consistsof artificialneurons. Neuronsfunctionin
two ways: first, they create weighted input sums, and second, they activate the sums
to make them normal.
ActivationFunction:
Neurons are responsible for making decisions in this area. According to the
activation function, the neurons determine whether to make a linear or nonlinear
decision. Since it passes through so many layers, it prevents the cascading effect
from increasing neuron outputs.
a) Sigmoid:
DeepLearning
[Link]–CSE R-22
Inputvaluesbetween0and1getmappedtotheoutputvalues.
b) Tanh:
Avaluebetween-1and1getsmappedtotheinputvalues.
c) RectifiedLinearUnit:
[Link]
values get mapped to 0.
Functioninfeedforwardneuralnetwork:
Costfunction
Followingisadefinitionofthemeansquareerrorcostfunction:
Where,
w=theweightsgatheredinthenetworkb=
biases
n=numberofinputsfortraining
DeepLearning
[Link]–CSE R-22
a=outputvectorsx
=input
‖v‖=vectorv'snormallength
Lossfunction
Neurons in the output layer are equal to the number of classes. Showing the
differences between predicted and actual probability distributions. Following is the
cross-entropy loss for binary classification.
Asaresultofmulticlasscategorization,across-entropylossoccurs:
Gradientlearningalgorithm
To decrease the function, it subtracts the value (to increase, it would add). As
an example, here is how to write this procedure:
DeepLearning
[Link]–CSE R-22
Outputunits
In the output layer, output units are those units that provide the desired output
or prediction, thereby fulfilling the task that the neural network needs to complete.
There is a close relationship between the choice of output units and the cost
function. Anyunit that can serve asa hidden unit can also serve asan output unit ina
neural network.
AdvantagesoffeedforwardNeuralNetworks
Machinelearningcanbeboostedwithfeedforwardneuralnetworks'simplified
architecture.
Multi-networkinthefeedforwardnetworksoperateindependently,witha
moderated intermediary.
Complextasksneedseveralneuronsinthenetwork.
Neuralnetworkscan handleandprocessnonlinear data easilycomparedto
perceptrons and sigmoid neurons, which are otherwise complex.
A neural network deals with the complicated problem of decision
boundaries.
Depending on the data, the neural network architecture can vary. For
example, convolutional neural networks (CNNs) perform exceptionally
well in image processing, whereas Recurrent Neural Networks(RNNs)
perform well in text and voice processing.
Neural networks need Graphics Processing Units (GPUs) to handle large
datasets for massive computational and hardware performance. Several
GPUs get used widely in the market, including Kaggle Notebooks and
Google Collab Notebooks.
Applicationsoffeedforwardneuralnetworks:
[Link].
DeepLearning
[Link]–CSE R-22
A) Physiologicalfeedforwardsystem
Itispossibletoidentifyfeedforwardmanagementinthissituationbecausethecentralinvoluntary
regulates the heartbeat before exercise.
B) Generegulationandfeedforward
Detectingnon-temporarychangestotheatmosphereisafunctionofthismotifasafeedforward
system. You can find the majority of this pattern in the illustrious networks.
C) Automationandmachinemanagement
Automationcontrolusingfeedforwardisoneofthedisciplinesinautomation.
D) Parallelfeedforwardcompensationwithderivative
Anopen-looptransferconvertsnon-minimumpartsystemsintominimumpartsystemsusing this
technique.
Understandingthemathbehindneuralnetworks
Typical deep learning algorithms are neural networks (NNs). As a result of their
unique structure, their popularity results from their 'deep' understanding of data.
Furthermore, NNs are flexible in terms of complexity and structure. Despite all the
advanced stuff, they can't work without the basic elements: they may work better with the
advanced stuff, but the underlying structure remains the same.
DeepFeed-forwardnetworks:
DeepLearning
[Link]–CSE R-22
Thefollowingisasimplifiedvisualization:
Inamatrixformat,itlooksasfollows:
Inthethirdstep,avectorofonesgetsmultipliedbytheoutputofourhidden
layer:
Using the output value, we can calculate the result. Understanding these
fundamental concepts will make building NN much easier, and youwill be amazed at
how quickly you can do it. Every layer's output becomes the following layer's input.
DeepLearning
[Link]–CSE R-22
Thearchitectureofthenetwork:
The network can approximate any Borel measurable function within a finite-
dimensional space with at least some amount of non-zero error when there are
enough hidden units. It simply states that we can always represent any functionusing
the multi-layer perceptron (MLP), regardless of what function we try to learn.
Thus, we now know there will always be an MLP to solve our problem, but
there is no specific method for finding it. It is impossible to say whether it will be
possible to solve the given problem if we use N layers with M hidden units.
Research is still ongoing, and for now, the only way to determine this
configuration is by experimenting with it. While it is challenging to find theappropriate
architecture, we need to try many configurations before finding the one that can
represent the target function.
Whatisbackpropagationinfeedforwardneuralnetwork?
The goal is to reduce the cost function given the training data while learning a
neural network. Network weights and biases of all neurons in each layer determine
the cost function. Backpropagation gets used to calculate the gradient of the cost
function iteratively. And then update weights and biases in the opposite direction to
reduce the gradient.
In backpropagationformulas,theerrorisdefinedasabove:
DeepLearning
[Link]–CSE R-22
Below is the full derivation of the formulas. For each formula below, L stands
for the output layer, g for the activation function, ∇ the gradient, W[l]T layer l weights
transposed.
The first equation shows how to calculate the error at the output layer for
sample j. Following that, we can use the second equation to calculate the errorin the
layer just before the output layer.
Basedon theerror values for the next layer, the secondequation cancalculate
the error in any layer. Because this algorithm calculates errors backward, it is known
as backpropagation. For sample j, we calculate the gradient of the loss function by
taking the third and fourth equations and dividing them by the biases and weights.
StochasticGradientDescent(SGD):
Gradient Descent is an iterative optimization process that searches for an objective
function’soptimumvalue(Minimum/Maximum).Itisoneofthemostusedmethodsfor
DeepLearning
[Link]–CSE R-22
1. StochasticGradientDescent(SGD):
Stochastic Gradient Descent(SGD) isa variant of the GradientDescentalgorithm that is
used for optimizing machine learning models. It addresses the computational inefficiency of
traditional Gradient Descent methods when dealing with large datasets in machine learning
projects.
In SGD, instead of using the entire dataset for each iteration, only a single random
trainingexample(orasmallbatch)isselectedtocalculatethegradientandupdatethe model
parameters. This random selection introduces randomness into the optimization process,
hence the term “stochastic” in stochastic Gradient Descent.
TheadvantageofusingSGDisitscomputationalefficiency,especiallywhen dealing with
large datasets. By using a single example or a small batch, the computational cost periteration
is significantly reduced compared to traditional Gradient Descent methods that require
processing the entire dataset.
StochasticGradientDescentAlgorithm:
Initialization:Randomlyinitializetheparametersofthemodel.
SetParameters:Determinethenumberofiterationsandthelearningrate (alpha) for
updating the parameters.
Stochastic Gradient Descent Loop: Repeat the following steps until the
modelconverges or reaches the maximum number of iterations:
a. Shufflethetrainingdatasettointroducerandomness.
b. Iterateovereachtrainingexample(orasmallbatch)intheshuffledorder.
c. Computethegradientofthecostfunctionwithrespecttothemodelparameters
using the current training example (or batch).
d. Updatethemodelparametersbytakingastepinthedirectionofthe
negativegradient, scaled by the learning rate.
e. Evaluatetheconvergencecriteria,suchasthedifferenceinthecostfunction
between iterations of the gradient.
ReturnOptimizedParameters:Oncetheconvergencecriteriaaremet
orthemaximumnumberofiterationsisreached,returntheoptimizedmodel parameters.
DeepLearning
[Link]–CSE R-22
Inneural networks, a hidden layer is located between the input and output of the
algorithm,inwhichthefunctionappliesweightstotheinputsanddirectsthemthrough
anactivationfunction as the output. In short, the hidden layers perform nonlinear
transformations of the inputs entered into the network. Hidden layers vary depending on the
function of the neural network, and similarly, the layers may vary depending on their
associated weights.
HowdoesaHiddenLayerwork?
Hidden layers, simply put, are layers of mathematical functions each designed to
produce an output specific to an intended result. For example, some forms of hidden layers
are known as squashing functions. These functions are particularly useful when the intended
output of the algorithm is aprobabilitybecause they take an input and produce an output value
between 0 and 1, the range for defining probability.
Hidden layers allow for the function of a neural network to be broken down into
specific transformations of the data. Each hidden layer function is specialized to produce a
defined output. For example, a hidden layer functions that are used to identify human eyes
and ears may be used in conjunction by subsequent layers to identify faces in images. While
the functions to identify eyes alone are not enough to independently recognize objects, they
can function jointly within a neural network.
HiddenLayersandMachineLearning:
Hidden layers areverycommon in neural networks, howevertheiruseandarchitecture
often vary from case to case. As referenced above, hidden layers can beseparated by their
functional characteristics. For example, in a CNN used for object recognition, a hidden layer
that is used to identify wheels cannot solely identify a car, however when placed in
conjunction with additional layers used to identify windows, a large metallic body, and
headlights, the neural network can then make predictions and identify possible cars within
visual data.
DeepLearning
[Link]–CSE R-22
ChoosingHiddenLayers
1. Wellifthedataislinearlyseparablethenyoudon'tneedanyhiddenlayers
at all.
3. Ifdataishavinglargedimensionsorfeaturesthentogetanoptimum
solution, 3 to 5 hidden layers can be used.
ChoosingNodesinHiddenLayers
Once hidden layers have been decided the next task is to choose the
number of nodes in each hidden layer.
1. Thenumberofhiddenneuronsshouldbebetweenthesizeof
theinput layer and the output layer.
2. Themostappropriatenumberofhiddenneuronsis
DeepLearning
[Link]–CSE R-22
Sqrt(inputlayernodes*outputlayernodes)
The above algorithms are only a general use case and they can be
moulded according to use [Link] the number of nodes in hidden
layers can increase also in subsequent layers and the number of hidden layers
can also be more than the ideal case.
This whole depends upon the use case and problem statement that we
are dealing with.
ArchitectureDesign:
Typesofneuralnetworksmodelsarelistedbelow:
Perceptron
FeedForwardNeuralNetwork
MultilayerPerceptron
ConvolutionalNeuralNetwork
RadialBasisFunctionalNeuralNetwork
RecurrentNeuralNetwork
LSTM–LongShort-TermMemory
SequencetoSequenceModels
ModularNeuralNetwork
DeepLearning
[Link]–CSE R-22
AnIntroductiontoArtificialNeuralNetwork
Artificial neural networks are inspired by the biological neurons within the human
body which activateundercertain circumstances resulting in arelated action performed bythe
body in response. Artificial neural nets consist of various layers of interconnected artificial
neurons powered by activation functions that help in switching them ON/OFF. Like
traditionalmachine algorithms, here too, there are certain values that neural nets learn in the
training phase.
Briefly,eachneuronreceivesamultipliedversionofinputsandrandomweights, which is
then added with a static bias value (unique to each neuron layer); this is then passed
toanappropriateactivationfunctionwhichdecidesthefinalvaluetobegivenoutofthe
[Link]. Once the
output is generated from the final neural net layer, loss function (input vs output) is
calculated,andbackpropagationisperformedwheretheweightsareadjustedtomaketheloss
[Link] optimal values of weights iswhatthe overalloperation focuses around.
Please refer to the following for better understanding.
Weightsare numeric values that are multiplied by inputs. In backpropagation, they are
modified to reduce the loss. In simple words, weights are machine learned values fromNeural
Networks. Theyself-adjustdependingonthedifferencebetweenpredictedoutputsvstraining
inputs.
ActivationFunctionisamathematicalformulathathelpstheneurontoswitchON/OFF.
DeepLearning
[Link]–CSE R-22
Inputlayerrepresentsdimensionsoftheinputvector.
Hidden layer represents the intermediary nodes that divide the input space into
regions with (soft) boundaries. It takes in a set of weighted input and produces
output through an activation function.
Outputlayerrepresentstheoutputoftheneuralnetwork.
Backpropagation:
BackpropagationProcessinDeepNeuralNetwork:
The main features of Backpropagation are the iterative, recursive and efficient
method through which it calculates theupdated weight to improve the network until
it is not able to perform the task for which it is being trained. Derivatives of the
activation function to be known at network design time is required to
Backpropagation.
DeepLearning
[Link]–CSE R-22
Inputvalues
X1=0.05
X2=0.10
Initialweight
W1=0.1 W5=0.40
W2=0.20 W6=0.45
W3=0.25 W7=0.50
W4=0.30 W8=0.55
BiasValues
b1=0.35 b2=0.60
TargetValues
T1=0.01
T2=0.99
Now,wefirstcalculatethevaluesofH1andH2byaforwardpass.
ForwardPass
TofindthevalueofH1wefirstmultiplytheinputvaluefromtheweightsas
DeepLearning
[Link]–CSE R-22
H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.3
H1=0.3775
TocalculatethefinalresultofH1,weperformedthesigmoidfunctionas
WewillcalculatethevalueofH2inthesamewayasH1
H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925
TocalculatethefinalresultofH1,weperformedthesigmoidfunctionas
DeepLearning
[Link]–CSE R-22
y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597
Tocalculatethefinalresultofy1weperformedthesigmoidfunctionas
Wewillcalculatethevalueofy2inthesamewayasy1
y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214
TocalculatethefinalresultofH1,weperformedthesigmoidfunctionas
Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with
our target values T1 and T2. Now, we will find the total error, which is simply the
difference between the outputs from the target outputs. The total error is calculated
as
DeepLearning
[Link]–CSE R-22
So,thetotalerroris
Now,wewillbackpropagatethiserrortoupdatetheweightsusingabackward
pass.
Backwardpassattheoutputlayer
To update the weight, we calculate the error correspond to each weight with
the help of a total error. The error on weight w is calculated by differentiating total
error with respect to w.
Weperformbackwardprocesssofirstconsiderthelastweightw5as
DeepLearning
[Link]–CSE R-22
Now,wecalculateeachtermonebyonetodifferentiateEtotalwithrespectto
w5as
Puttingthevalueofe-yinequation(5)
DeepLearning
[Link]–CSE R-22
Now,wewillcalculatetheupdatedweightw5newwiththehelpofthefollowing
formula
Inthesameway,wecalculatew6new,w7new,andw8newandthiswillgiveusthe following
values
w5new=0.35891648
w6new=408666186
w7new=0.511301270
w8new=0.561370121
BackwardpassatHiddenlayer
Now, we will backpropagate to our hidden layer and update the weight w1,
w2, w3, and w4 as we have done with w5, w6, w7, and w8 weights. We will calculate
the error at w1 as
DeepLearning
[Link]–CSE R-22
Now,wecalculateeachtermonebyonetodifferentiateEtotalwithrespectto
w1as
WeagainsplitthisbecausethereisnoanyH1finalterminEtoatalas
willagainsplitbecauseinE1andE2thereisnoH1term.
Splittingisdoneas
WeagainSplitboth [Link] it
as
Now,wefindthevalueof byputtingvaluesinequation(18)and(19)asFrom
equation (18)
DeepLearning
[Link]–CSE R-22
Fromequation(8)
Fromequation(19)
Puttingthevalueofe-y2inequation(23)
DeepLearning
[Link]–CSE R-22
Fromequation(21)
Nowfromequation(16)and(17)
DeepLearning
[Link]–CSE R-22
Wehave weneedtofigureout as
Puttingthevalueofe-H1inequation(30)
DeepLearning
[Link]–CSE R-22
Now,wewillcalculatetheupdatedweightw1newwiththehelpofthefollowing
formula
Inthesameway,wecalculatew2new,w3new,andw4andthiswillgiveusthefollowing
values
w1new=0.149780716
w2new=0.19956143
w3new=0.24975114
w4new=0.29950229
Wehaveupdatedalltheweights.Wefoundtheerror0.298371109onthe
network when we fed forward the 0.05 and 0.1 inputs. In the first round of
Backpropagation,thetotalerrorisdownto [Link]
10,000,[Link],theoutputsneurons
DeepLearning
[Link]–CSE R-22
generate0.159121960and0.984065734i.e.,nearbyourtarget valuewhenwe
feedforward the 0.05 and 0.1.
Deeplearningframeworksandlibraries:
DeepLearningFrameworks:
Keras, TensorFlow and PyTorch are among the top three frameworks that are
preferred by Data Scientists as well as beginners in the field of Deep Learning.
This comparison on Keras vs TensorFlowvs PyTorch will provide you with acrisp
knowledgeaboutthetop Deep Learning Frameworksand help youfindout which
one is suitable for you. In this blog you will get a complete insight into the above
three frameworks in the following sequence:
IntroductiontoKeras,TensorFlow&PyTorch
ComparisonFactors
FinalVerdict
Introduction
Keras
DeepLearning
[Link]–CSE R-22
PyTorch
ComparisonFactors
All the three frameworks are related to each other and also have certain basic
differences that distinguishes them from one another.
Theparametersthatdistinguishthem:
LevelofAPI
Speed
Architecture
Debugging
Dataset
Popularity
LevelofAPI
TensorFlow is a framework that provides both high and low level APIs.
Pytorch, on the other hand, is a lower-level API focused on direct work with array
[Link],becomingapreferred
DeepLearning
[Link]–CSE R-22
solutionforacademicresearch,andapplicationsofdeeplearningrequiringoptimizing
custom expressions.
Speed
Architecture
DeepLearning
[Link]–CSE R-22
Debugging
Dataset
Popularity
DeepLearning
[Link]–CSE R-22
With the increasing demand in the field of Data Science,there has been an
enormous growth of Deep learning technologyin the industry. With this, all thethree
frameworks havegained quite a lot of popularity. Kerastops the list
[Link] its
simplicity when compared to the other two.
Theseweretheparametersthat distinguishallthethreeframeworksbutthereis no
absolute answer to which one is better. The choice ultimately comes down to
Technicalbackground
Requirementsand
EaseofUse
FinalVerdict
Now coming to the final verdict of Keras vs TensorFlow vs PyTorch let’s have
a look at the situations that are most preferablefor each one of these three deep
learning frameworks
Kerasismostsuitablefor:
RapidPrototyping
SmallDataset
Multipleback-endsupport
TensorFlowismostsuitablefor:
LargeDataset
HighPerformance
Functionality
ObjectDetection
DeepLearning
[Link]–CSE R-22
PyTorchismostsuitablefor:
Flexibility
ShortTrainingDuration
Debuggingcapabilities
UNIT-II:
CONVOLUTIONNEURALNETWORK(CNN):IntroductiontoCNNs
and their applications in computer vision, CNN basic architecture,
Activation functions-sigmoid, tanh, ReLU, Softmax layer, Types of
pooling layers, Training of CNN in TensorFlow, various popular CNN
architectures:VGG, GoogleNet,ResNetetc, Dropout,Normalization,
Data augmentation
IntroductiontoCNNsandtheirapplicationsincomputervision:
DeepLearning
[Link]–CSE R-22
Sincethe1950s,the earlydaysofAI,researchershavestruggledtomake
[Link],thisfieldcame to be
known as Computer Vision. In 2012, computer vision took a quantum leap
when a group of researchers from the University of Toronto developed an AI
model that surpassed the best image recognition algorithms, and that tooby a
large margin.
The AI system, which became known as AlexNet (named after its main
creator, Alex Krizhevsky), won the 2012 ImageNet computer vision contestwith
an amazing85percentaccuracy. Therunner-up scored amodest74percenton the
test.
BackgroundofCNNs
CNN’s were first developed and used around the 1980s. The most that a
[Link]
inthepostalsectorstoreadzipcodes,pincodes,[Link]
DeepLearning
[Link]–CSE R-22
remember about any deep learning model is that it requires a large amount of
data to train and also requires a lot of computing resources. This was a major
drawback for CNNs at that period and hence CNNs were only limited to the
postal sectors and it failed to enter the world of machine learning.
In the past few decades, Deep Learning has proved to be a very powerful
tool because of its ability to handle large amounts of data. The interest to use
hidden layers has surpassed traditional techniques, especially in pattern
recognition. One of the most popular deep neural networks is Convolutional
Neural Networks (also known as CNN or ConvNet) in deep learning, especially
when it comes to Computer Vision applications.
Since the 1950s, the early days of AI, researchers have struggled to make a
systemthatcan understand [Link] the following years, thisfield came to be
known as Computer Vision. In 2012, computer vision took a quantum leap when a
group of researchers from the University of Toronto developed an AI model that
surpassed the best image recognition algorithms, and that too by a large margin.
The AI system, which became known as AlexNet (named after its main
creator, AlexKrizhevsky), won the 2012 ImageNetcomputer vision contestwithan
amazing 85 percent accuracy. The runner-up scored a modest 74 percent on thetest.
DeepLearning
[Link]–CSE R-22
BackgroundofCNNs
WhatIsaCNN?
Howdoesitwork?
WhatIsaPoolingLayer?
LimitationsofCNNs
BackgroundofCNNs
CNN’s were first developed and used around the 1980s. The most that a
CNN could do at that time was recognize handwritten digits. It was mostly used in
the postal sectors to read zip codes, pin codes, etc. The important thing to
remember about any deep learning model is that it requires a large amount of data
to train and also requires a lot of computing resources. This was a major drawback
for CNNs at that period and hence CNNs were only limited to the postal sectorsand
it failed to enter the world of machine learning.
In 2012, Alex Krizhevsky realized that it was time to bring back the branch
of deep learning that uses multi-layered neural networks. The availability of large
sets of data, to be more specific ImageNet datasets with millions of labeled images
and an abundance of computing resources enabled researchers to revive CNNs.
WhatIsa CNN?
DeepLearning
[Link]–CSE R-22
Now when we think of a neural network we think about matrix multiplications but
that is not the case with ConvNet. It uses a special technique called Convolution.
Now in mathematics convolution is a mathematical operation on two functionsthat
produces a third function that expresses how the shape of one is modified by the
other.
Bottom line is that the ConvNet role to reduce the images into a form
thatiseasiertoprocess,withoutlosingfeatures crucialforgoodprediction.
Howdoesitwork?
DeepLearning
[Link]–CSE R-22
Forsimplicity,considergrayscaleimagestounderstandhowCNNs
work.
DeepLearning
[Link]–CSE R-22
DeepLearning
[Link]–CSE R-22
imageinaConvNet,eachlayergeneratesseveralactivationfunctionsthat are
passed on to the next layer.
DeepLearning
[Link]–CSE R-22
and 1) that specify how likely the image is to belong to a “class.” For
instance, if you have a ConvNet that detects cats, dogs, and horses, the
output of the final layer is the possibility that the input image contains anyof
those animals.
WhatIsaPoolingLayer?
DeepLearning
[Link]–CSE R-22
DeepLearning
[Link]–CSE R-22
BenefitsofUsingCNNsforMachineandDeepLearning
Deep learning is a form of machine learning that requires a neural network with a minimum of
three layers. Networks with multiple layers are more accurate than single-layer networks. Deep learning
applications often use CNNs or RNNs (recurrent neural networks).
The CNN architecture is especially useful for image recognition and image classification, as well
as other computer vision tasks becausetheycan processlarge amounts of data andproducehighlyaccurate
predictions. CNNs can learn the features of an object through multiple iterations, eliminating the need for
manual feature engineering tasks like feature extraction.
It is possible to retrain a CNN for a new recognition task or build a new model based on anexisting
network with trained weights. This is known as transfer learning. This enables ML model developers to
apply CNNs to different use cases without starting from scratch.
WhatAreConvolutionalNeuralNetworks(CNNs)?
The connectivity pattern in CNNs is inspired by the visual cortex in the human brain, where
neurons respond to specific regions or receptive fields in the visual space. This architecture enables CNNs
to effectively capture spatial relationships and patterns in images. By stacking multiple convolutional and
pooling layers,CNNscanlearn increasinglycomplex features, leading tohigh accuracyin taskslike image
classification, object detection, and segmentation.
ConvolutionalNeuralNetworkArchitectureModel
Convolutional neural networks are known for their superiority over other artificial neuralnetworks,
given their ability to process visual, textual, and audio data. The CNN architecture comprises three main
layers: convolutional layers, pooling layers, and a fully connected (FC) layer.
There can be multiple convolutional and pooling layers. The more layers in the network, the
greaterthecomplexityand(theoretically)[Link]
DeepLearning
[Link]–CSE R-22
layerthatprocessestheinputdataincreasesthemodel’sabilitytorecognizeobjectsandpatternsinthedata.
TheConvolutionalLayer
Convolutional layers are the key building block of the network, where most of the computations
are carried out. It works by applying a filter to the input data to identify features. This filter, known as a
feature detector, checks the image input’s receptive fields for a given feature. This operation is referred to
as convolution.
The filter is a two-dimensional array of weights that represents part of a 2-dimensional image. A
filter is typically a 3×3 matrix, although there are other possible sizes. The filter is applied to a region
withintheinput imageandcalculatesadotproductbetweenthe pixels,whichisfedto [Link] filterthen
shifts and repeats the process until it has covered the whole image. The final output of all the filter
processes is called the feature map.
The CNN typically applies the ReLU (Rectified Linear Unit) transformation to each feature map
after every convolution to introduce nonlinearity to the ML model. A convolutional layer is typically
followed by a pooling [Link], theconvolutionaland pooling layers makeupa convolutionalblock.
Additional convolution blocks will follow the first block, creating a hierarchical structure
withlaterlayerslearningfrom the [Link], aCNN modelmighttrainto detectcarsinimages.
Cars can be viewed as the sum of their parts, including the wheels, boot, and windscreen. Each feature ofa
car equates to a low-level pattern identified by the neural network, which then combines these parts to
create a high-level pattern.
ThePoolingLayers
A pooling or down sampling layer reduces the dimensionality of the input. Like a convolutional
operation, pooling operations use a filter to sweep the whole input image, but it doesn’t use weights. The
filter instead uses an aggregation function to populate theoutput array based on thereceptive field’svalues.
Therearetwokeytypesofpooling:
Averagepooling:Thefiltercalculatesthereceptivefield’saveragevaluewhenitscanstheinput.
DeepLearning
[Link]–CSE R-22
Max pooling:The filter sends the pixel with the maximum value to populate the output [Link]
approach is more common than average pooling.
Poolinglayersareimportantdespitecausingsomeinformationtobelost,becausetheyhelpreducethe
complexity and increase the efficiency of the CNN. It also reduces the risk of overfitting.
TheFullyConnected(FC)Layer
ThefinallayerofaCNNisafullyconnectedlayer.
The FC layer performs classification tasks using the features that the previous layers and filters
extracted. Instead of ReLu functions, the FC layer typically uses a softmax function that classifies inputs
more appropriately and produces a probability score between 0 and 1.
BasicArchitectureofCNN:
BasicArchitecture
TherearetwomainpartstoaCNNarchitecture
ConvolutionLayers
There are three types of layers that make up the CNN which are the
convolutionallayers,poolinglayers,andfully-connected(FC)[Link]
DeepLearning
[Link]–CSE R-22
1. ConvolutionalLayer
Thislayeristhefirstlayerthatisusedtoextractthevariousfeatures from
the input images. In this layer, the mathematical operation
ofconvolutionisperformedbetweentheinputimageandafilterofa
[Link],thedot product is
taken between the filter and the parts of the input image with respect to
thesize of the filter (MxM).
TheconvolutionlayerinCNNpassestheresulttothenextlayer
[Link]
layersinCNNbenefitalotastheyensurethespatialrelationship between the
pixels is intact.
2. Pooling Layer
InMaxPooling,thelargestelementistakenfromfeaturemap. Average
Pooling calculates the average of the elements in a predefined sized
[Link]
DeepLearning
[Link]–CSE R-22
[Link]
Convolutional Layer and the FC Layer.
3. FullyConnectedLayer
In this, the input image from the previous layers are flattened and fedto
the FC layer. The flattened vector then undergoes few more FC
[Link],
the classification process begins to take place. The reason two layersare
connected is that two fully connected layers will perform better than a single
connected layer. These layers in CNN reduce the human supervision
4. Dropout
Usually,whenallthefeaturesareconnectedtotheFClayer,itcancauseoverfitt
[Link]
particularmodelworkssowellonthetrainingdatacausinganegative impact in the
model’s performance when used on a newdata.
DeepLearning
[Link]–CSE R-22
5. ActivationFunctions
Finally, one of the most important parameters of the CNN model is the
activation function. They are used to learn and approximate any kind of
continuous and complex relationship between variables of the network. In
simple words, it decides which information of the model should fire in the
forward direction and which ones should not at the end of the network.
[Link] used
activation functions such as the ReLU, Softmax, tanH and the Sigmoid
functions. Each of these functions have a specific usage. For a binary
classificationCNNmodel,sigmoidandsoftmaxfunctionsarepreferredafor a
multi-class classification, generally softmax us used. In simple terms,
activation functions in a CNN model determine whether a neuron should be
[Link] not to
predict using mathematical operations.
TypesofNeuralNetworks
ActivationFunctions
Thepopularactivationfunctionsare
a) BinaryStepFunction
Binarystepfunctiondependsonathresholdvaluethatdecideswhether
[Link]
comparedtoacertainthreshold;iftheinputisgreaterthanit,thentheneuronis
activated,elseitisdeactivated,meaningthatitsoutputisnotpassedontothe next
hidden layer.
DeepLearning
[Link]–CSE R-22
Mathematically,itcanberepresentedas:
Thelimitationsofbinarystepfunctionareasfollows:
Itcannotprovidemulti-valueoutputs—forexample,itcannotbeusedfor
multi-class classificationproblems.
Thegradientofthestepfunctioniszero,whichcausesahindranceinthe
backpropagation process.
DeepLearning
[Link]–CSE R-22
b) LinearActivationFunction:
Thelinearactivationfunction,alsoknownas"noactivation,"or"identity
function"(multipliedx1.0),iswheretheactivationisproportionaltotheinput.
Thefunctiondoesn'tdoanythingtotheweightedsumoftheinput,itsimply
spitsoutthevalueitwasgiven.
Mathematically,itcanberepresentedas:
However,alinearactivationfunctionhastwomajorproblems:
It’snotpossibletousebackpropagationasthederivativeofthefunction
isaconstantandhasnorelationtotheinputx.
Alllayersoftheneuralnetworkwillcollapseintooneifalinearactivation
[Link],
DeepLearning
[Link]–CSE R-22
[Link],essentially,
alinearactivationfunctionturnstheneuralnetworkintojustonelayer.
Non-LinearActivationFunctions
Thelinearactivationfunctionshownaboveissimplyalinearregression
[Link] its limited power, this does not allow the model to create
complexmappingsbetweenthenetwork’sinputsandoutputs.
Non-linearactivationfunctionssolvethefollowinglimitationsoflinear
activation functions:
Theyallowbackpropagationbecausenowthederivativefunctionwould
berelatedtotheinput,andit’spossibletogobackandunderstandwhich
weightsintheinputneuronscanprovideabetterprediction.
Theyallowthestackingofmultiplelayersofneuronsastheoutputwould
nowbeanon-linearcombinationofinputpassedthroughmultiplelayers.
Anyoutputcanberepresentedasafunctionalcomputationinaneural
network.
Belowaretendifferentnon-linearneuralnetworksactivationfunctionsand their
characteristics.
a) Sigmoid/LogisticActivationFunction
This function takes any real value as input and outputs values in the
[Link](morepositive),theclosertheoutputvalue will be
to 1.0, whereas the smaller the input (more negative), the closer the
outputwillbeto0.0,asshownbelow.
DeepLearning
[Link]–CSE R-22
Mathematically,itcanberepresentedas:
Here’swhysigmoid/logisticactivationfunctionisoneofthemostwidely
usedfunctions:
Itiscommonlyusedformodelswherewehavetopredicttheprobability
[Link]
of0and1,sigmoidistherightchoicebecauseofitsrange.
Thefunctionisdifferentiableandprovidesasmoothgradient,i.e.,
[Link]-shapeof
thesigmoidactivationfunction.
Thelimitationsofsigmoidfunctionarediscussedbelow:
Thederivativeofthefunctionisf'(x)=sigmoid(x)*(1-sigmoid(x)).
DeepLearning
[Link]–CSE R-22
FromtheaboveFigure,thegradientvaluesareonlysignificantforrange
-3to3,[Link]
valuesgreaterthan3orlessthan-
3,[Link]
,thenetworkceasestolearn andsuffersfromtheVanishinggradientproblem.
[Link]
[Link]
euralnetworkmoredifficultandunstable.
b) TanhFunction(HyperbolicTangent)
Tanhfunctionisverysimilartothesigmoid/logisticactivationfunction,
andevenhasthesameS-shapewiththedifferenceinoutputrangeof-1to1.
InTanh,thelargertheinput(morepositive),theclosertheoutputvaluewillbe
to1.0,whereasthesmallertheinput(morenegative),theclosertheoutputwillbeto
-1.0.
Mathematically,itcanberepresentedas:
DeepLearning
[Link]–CSE R-22
Advantagesofusingthisactivationfunctionare:
TheoutputofthetanhactivationfunctionisZerocentered;hencewecaneasil
y map the output values as strongly negative, neutral, or strongly
positive.
Usuallyusedinhiddenlayersofaneuralnetworkasitsvalueslie between-
1to;therefore,themeanforthehiddenlayercomesouttobe
[Link] the
next layer much easier.
DeepLearning
[Link]–CSE R-22
Note: Althoughbothsigmoidandtanhfacevanishinggradientissue,
tanhiszerocentered,andthegradientsarenotrestrictedtomoveinacertain
direction. Therefore, in practice, tanh nonlinearity is always preferred
to sigmoid nonlinearity.
c) ReLUFunction
[Link] a
linear function, ReLU has a derivative function and allows for
backpropagationwhilesimultaneouslymakingitcomputationallyefficient.
ThemaincatchhereisthattheReLUfunctiondoesnotactivateallthe
neurons at the same time.
Theneuronswillonlybedeactivatediftheoutputofthelinear
transformationislessthan0.
Mathematically,itcanberepresentedas:
DeepLearning
[Link]–CSE R-22
TheadvantagesofusingReLUasanactivationfunctionareasfollows:
Sinceonlyacertainnumberofneuronsareactivated,theReLUfunction
isfarmorecomputationallyefficientwhencomparedtothesigmoidandtanh
functions.
ReLU accelerates the convergence of gradient descent towards the
global minimum of theloss functiondue to its linear, non-saturating
property.
Thelimitationsfacedbythisfunctionare:
TheDyingReLUproblem.
The negative side of the graph makes the gradient value zero. Due to
thisreason,duringthebackpropagationprocess,theweightsandbiasesfor
[Link]
activated.
Allthenegativeinputvaluesbecomezeroimmediately,whichdecreases
themodel’sabilitytofitortrainfromthedataproperly.
Note:ForbuildingthemostreliableMLmodels,splityourdataintotrain,validation,andtests
ets.
DeepLearning
[Link]–CSE R-22
d) LeakyReLUFunction
LeakyReLUisanimprovedversionofReLUfunctiontosolvetheDying
ReLUproblemasithasasmallpositiveslopeinthenegativearea.
Mathematically,itcanberepresentedas:
TheadvantagesofLeakyReLUaresameasthatofReLU,inadditionto
thefactthatitdoesenablebackpropagation,[Link]
makingthisminormodificationfornegativeinputvalues,thegradientoftheleft
[Link],we wouldno
longerencounterdeadneuronsinthatregion.
`HereisthederivativeoftheLeakyReLUfunction.
DeepLearning
[Link]–CSE R-22
Thelimitationsthatthisfunctionfacesinclude:
Thepredictionsmaynotbeconsistentfornegativeinputvalues.
Thegradientfornegativevaluesisasmallvaluethatmakesthelearning
ofmodelparameterstime-consuming.
d) ParametricReLUFunction
ParametricReLUisanothervariantofReLUthataimstosolvethe
problemofgradient’[Link]
provides the slope of the negative part of the function as an argumenta. By
performingbackpropagation,themostappropriatevalueofaislearnt.
DeepLearning
[Link]–CSE R-22
Mathematically,itcanberepresentedas:
Where"a"istheslopeparameterfornegativevalues.
TheparameterizedReLUfunctionisusedwhentheleakyReLUfunction
stillfailsatsolvingtheproblemofdeadneurons,andtherelevantinformationis
notsuccessfullypassedtothenextlayer.
Thisfunction’slimitationisthatitmayperformdifferentlyfordifferent
problemsdependinguponthevalueofslopeparametera.
TypesofpoolingLayers:
DeepLearning
[Link]–CSE R-22
AConvolutionalneuralnetwork(CNN)isaspecialtypeofArtificialNeuralNetworkthat is
usually used for image recognition and processing due to its ability to recognize patterns in
images. It eliminates the need to extract features from visual data manually. It learns images
by sliding a filter of some size on them and learning not just the features from the data but
also keeps Translation invariance.
ThetypicalstructureofaCNNconsistsofthreebasiclayers
1. Convolutionallayer:Theselayersgenerateafeaturemapbyslidingafilterovertheinput image
and recognizing patterns in images.
2. Poolinglayers:TheselayersdownsamplethefeaturemaptointroduceTranslationinvariance,
which reduces the overfitting of the CNN model.
3. FullyConnectedDenseLayer:Thislayercontainsthesamenumberofunitsasthenumberof
classes and the output activation function such as “softmax” or “sigmoid”
WhatarePoolinglayers?
Pooling layers are one of the building blocks of Convolutional Neural Networks.
Where Convolutional layers extract featuresfrom images, Pooling layers consolidate the
featureslearned by CNNs. Its purpose is to gradually shrink the representation’s spatial
dimension to minimize the number of parameters and computations in the network.
WhyarePoolinglayersneeded?
ThefeaturemapproducedbythefiltersofConvolutionallayersislocation-dependent. For
example, If an object in an image has shifted a bit it might not be recognizable by the
Convolutional [Link], it means that the feature maprecordstheprecise positions offeatures
[Link] poolinglayersprovideis “Translational Invariance”whichmakes theCNN
invariant to translations, i.e., even if the input of the CNN is translated, the CNN will still be
able to recognize the features in the input.
HowdoPoolinglayersachievethat?
A Pooling layer is added after the Convolutional layer(s), as seen in the structure of a
CNN above. It down samples the output of the Convolutional layers by sliding the filter of
some size with some stride size and calculating the maximum or average of the input.
Therearetwotypesofpoolingthatareused:
1. Max pooling: This works by selecting the maximum value from every pool. Max Pooling retains
themost prominentfeaturesof the feature map, and the returned image is sharper than the original
image.
2. Average pooling: This pooling layer works by getting the average of the pool. Average pooling
retains theaverage valuesof features of the feature map. It smoothes the image while keeping the
essence of the feature in an image.
DeepLearning
[Link]–CSE R-22
[Link] it.
MaxPooling
Create a MaxPool2D layer with pool_size=2 and strides=2. Apply the MaxPool2D
layer to the matrix, and you will get the MaxPooled output in the tensor form. By applying it
tothematrix,theMax poolinglayerwillgothroughthematrix bycomputingthemax ofeach
2×[Link] of size 1
from the shape of a tensor.
AveragePooling
Create an AveragePooling2D layer with the same 2 pool_size and strides. Apply the
AveragePooling2Dlayer tothematrix. Byapplyingit tothematrix,theaveragepoolinglayer will
go through the matrix by computing the average of 2×2 for each pool with a jump of 2. Print
the shape of the matrix and Use [Link] to convert the output into a readable form by
removing all 1 size dimensions.
The GIF here explains how these pooling layers go through the input matrix and
computes the maximum or average for max pooling and average pooling, respectively.
DeepLearning
[Link]–CSE R-22
GlobalPoolingLayers
Global Pooling Layers often replace the classifier’s fully connected or Flatten layer.
The model instead ends with a convolutional layer that produces as many feature maps as
there are target classes and performs global average pooling on each of the feature maps to
combine each feature map into a single value.
Create the same NumPy array but with a different shape. By keeping the same shape
as above, the Global Pooling layers will reduce them to one value.
GlobalAveragePooling
Considering a tensor of shapeh*w*n, the output of the Global Average Pooling layer
is a single value across h*w that summarizes the presence of the feature. Instead of
downsizingthepatchesoftheinputfeaturemap,theGlobalAveragePoolinglayerdownsizes the
whole h*w into 1 value by taking the average.
GlobalMaxPooling
With the tensor of shape h*w*n, the output of the Global Max Pooling layer is asingle
value acrossh*wthat summarizes the presence of a feature. Instead of downsizing the
patchesoftheinputfeaturemap,theGlobalMaxPoolinglayerdownsizesthe whole h*w into 1value
by taking the maximum.
TrainingofCNNinTensorFlow
DeepLearning
[Link]–CSE R-22
If we get familiarized with the building blocks of Connects, we can build one
with TensorFlow. We can use the MNIST dataset for image classification.
Here,thecodeisexecutedinGoogleColab(anonlineeditorofmachine
learning).WecangotoTensorFloweditorthroughthebelowlink:
[Link]
ThesearethestepsusedtotrainingtheCNN.
Steps:
Step3:Convolutionallayer
Step5:ConvolutionallayerandPoolingLayer
Step6:Denselayer
Step7:LogitLayer
DeepLearning
[Link]–CSE R-22
Step1:UploadDataset
The MNIST dataset is available with scikit for learning in this URL (Unified
ResourceLocator).[Link] canupload it with
fetch_mldata ('MNIST Original').
Createatest/trainset
Weneedtosplitthedatasetintotrain_test_split.
Scalethefeatures
Finally,wescalethefunctionwiththehelpofMinMaxScaler.
1. importnumpyasnp
2. importtensorflowastf
3. fromsklearn.datasetsimportfetch_mldata
4. #ChangeUSERNAMEbytheusernameofthemachine
5. ##WindowsUSER
6. mnist=fetch_mldata('C:\\Users\\USERNAME\\Downloads\\MNISToriginal')
7. ##MacUser
8. mnist=fetch_mldata('/Users/USERNAME/Downloads/MNISToriginal')
9. print([Link])
10. print([Link])
11. fromsklearn.model_selectionimporttrain_test_split
DeepLearning
[Link]–CSE R-22
12. A_train,A_test,B_train,B_test=train_test_split([Link],[Link],test_siz
e=0.2, random_state=45)
13. B_train=B_train.astype(int)
14. B_test=B_test.astype(int)
15. batch_size=len(X_train)
16. print(A_train.shape,B_train.shape,B_test.shape)
17. ##rescale
18. [Link]
19. scaler=MinMaxScaler()
20. #TraintheDataset
21. X_train_scaled=scaler.fit_transform(A_train.astype(np.float65))
1. #testthedataset
2. X_test_scaled=scaler.fit_transform(A_test.astype(np.float65))
3. feature_columns=[tf.feature_column.numeric_column('x',shape=A_train_scale
[Link][1:])]
4. X_train_scaled.shape[1:]
DefiningtheCNN(ConvolutionalNeuralNetwork)
1. A convolutional Layer: Apply the number of filters to the feature map. After
convolution, we need to use a relay activation function to add non-linearity to the
network.
2. Pooling Layer:The next step after the Convention is to downsampling the maximum
facility. The objective is to reduce the mobility of the feature map to prevent
overfitting and improve the computation speed. Max pooling is a traditional
technique, which splits feature maps into subfields and only holds maximum values.
3. Fully connected Layers:All neurons from the past layers are associated with theother
next layers. The CNN has classified the label according to the features from
convolutional layers and reduced with any pooling layer.
CNNArchitecture
o ConvolutionalLayer:Itapplies145x5filters(extracting5x5-pixelsub-regions),
DeepLearning
[Link]–CSE R-22
o PoolingLayer:Thiswillperformmaxpoolingwitha2x2filterandstrideof2(which
specifies that pooled regions do not overlap).
o ConvolutionalLayer:Itapplies365x5filters,withReLUactivationfunction
o PoolingLayer:Again,performsmaxPoolingwitha2x2filterandstrideof2.
o 1,764neurons,withthedropoutregularizationrateof0.4(wheretheprobabilityof 0.4
that any given element will be dropped in training)
o DenseLayer(LogitsLayer):Therearetenneurons,oneforeachdigittargetclass(0-9).
ImportantmodulestouseincreatingaCNN:
1. Conv2d().Constructatwo-dimensionalconvolutionallayerwiththenumberoffilters,filter
kernel size, padding, and activation function like arguments.
2. max_pooling2d().Constructatwo-dimensionalpoolinglayerusingthemax-pooling
algorithm.
3. Dense().Constructadenselayerwiththehiddenlayersandunits
WecandefineafunctiontobuildCNN.
The following represents steps to construct every building block before wrapping
everything in the function.
Step2:Inputlayer
1. #Inputlayer
2. defcnn_model_fn(mode,features,labels):
3. input_layer=[Link](tensor=features["x"],shape=[-1,26,26,1])
[Link],wecanuse themodule
[Link],weneed todeclarethetensortoreshapeandtoshapethe tensor.
The first argument is the feature of the data, that is defined in the argument of a
function.
Step3:ConvolutionalLayer
1. #firstCNNLayer
DeepLearning
[Link]–CSE R-22
2. conv1=[Link].conv2d(
3. inputs=input_layer,
4. filters=18,
5. kernel_size=[7,7],
6. padding="same",
7. activation=[Link])
The first convolutional layer has 18 filters with the kernel size of 7x7 with equal
padding. The same padding has both the output tensor and input tensor have the
same width and height. TensorFlow will add zeros in the rowsand columns to ensure
the same size. We use the ReLu activation function. The output size will be [28, 28,
and 14].
Step4:Poolinglayer
The next step after the convolutional is pooling computation. The pooling
computation will reduce the extension of the data. We can use the module
max_pooling2d with a size of 3x3 and stride of 2. We use the previous layer as input.
The output size can be [batch_size, 14, 14, and 15].
1. ##firstPoolingLayer
2. pool1=[Link].max_pooling2d(inputs=conv1,pool_size=[3,3],strides=2)
Step5:PoolingLayerandSecondConvolutionalLayer
The second CNN has exactly 32 filters, with the output size of [batch_size, 14, 14,
32]. The size of the pooling layer has the same as ahead, and output shape is
[batch_size, 14, 14, and18].
1. conv2=[Link].conv2d(
2. inputs=pool1,
3. filters=36,
4. kernel_size=[5,5],
5. padding="same",
6. activation=[Link])
7. pool2=[Link].max_pooling2d(inputs=conv2,pool_size=[2,2],strides=2).
Step6:Fullyconnected(Dense)Layer
DeepLearning
[Link]–CSE R-22
1. pool2_flat=[Link](pool2,[-1,7*7*36])
2. dense=[Link](inputs=pool2_flat,units=7*7*36,activation=[Link])
3. dropout=[Link](inputs=dense,rate=0.3,training=mode==[Link]
[Link])
Step7:LogitsLayer
Finally,[Link]
to the batch size 12, equal to the total number of images in the layer.
1. #LogitLayer
2. logits=[Link](inputs=dropout,units=12)
We can create a dictionary that contains classes and the possibility of each
class. The module returns the highest value with [Link]() if the logit layers. The
softmax function returns the probability of every class.
PopularCNNarchitectures-VGG,GoogleNet,ResNet:
DeepLearning
[Link]–CSE R-22
TypesofConvolutionalNeuralNetworkAlgorithms
LeNet
LeNet is a pioneering CNN designed for recognizing handwritten characters. It was proposed by
Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner in the late 1990s. LeNet consists of a
series of convolutional and pooling layers, as well as a fully connected layer and softmax classifier. It was
among the first successful applications of deep learning for computer vision. It has been used by banks to
identify numbers written on cheques in grayscale input images.
VGG
VGG (Visual GeometryGroup) is a research group within the Department of Engineering Science
at the Universityof Oxford. The VGG group is well-known for its work in computer vision, particularlyin
the area of convolutional neural networks (CNNs).
One of the most famous contributions from the VGG group is the VGG model, also known as
VGGNet. The VGG model is a deep neural network that achieved state-of-the-art performance on the
ImageNetLargeScaleVisual Recognition Challenge in 2014, and hasbeenwidelyused as abenchmarkfor
image classification and object detection tasks.
The VGG model is characterized by its use of small convolutional filters (3×3) and deep
architecture (up to 19 layers), which enables it to learn increasingly complex features from input images.
The VGG model also uses max pooling layers to reduce the spatial resolution of the feature maps and
increase the receptive field, which can improve its ability to recognize objects of varying scales and
orientations.
The VGG model has inspired many subsequent research efforts in deep learning, including the
development of even deeper neural networks and the use of residual connections to improve gradient flow
and training stability.
ResNet
ResNet (short for “Residual Neural Network”) is a family of deep convolutional neural networks
designed to overcome the problem of vanishing gradients that are common in very deep networks. Theidea
behind ResNet is to use “residual blocks” that allow for the direct propagation of gradients throughthe
network, enabling the training of very deep networks.
DeepLearning
[Link]–CSE R-22
A residual block consists of two or more convolutional layers followed by an activation function,
combined with a shortcut connection that bypasses the convolutional layers and adds the original input
directly to the output of the convolutional layers after the activation function.
This allows the network to learn residual functions that represent the difference between the
convolutional layers’ input and output, rather than trying to learn the entire mapping directly. The use of
residual blocks enables the training of very deep networks, with hundreds or thousands of layers,
significantly alleviating the issue of vanishing gradients.
GoogLeNet
GoogLeNet is notable for its use of the Inception module, which consists of multiple parallel
convolutional layers with different filter sizes, followed by a pooling layer, and concatenation of the
outputs. This design allows the network to learn features at multiple scales and resolutions, while keeping
the computational cost manageable. The network also includes auxiliary classifiers at intermediate layers,
which encourage the network to learn more discriminative features and prevent overfitting.
GoogLeNet builds upon the ideas of previous convolutional neural networks, including LeNet,
which was one of the first successful applications of deep learning in computer vision. However,
GoogLeNet is much deeper and more complex than LeNet.
DeepLearning
[Link]–CSE R-22
Dropout:
DeepLearning
[Link]–CSE R-22
The term “dropout” refers to dropping out the nodes (input and hidden
layer) in a neural network (as seen in Figure 1). All the forward and backwards
connections with a dropped node are temporarily removed, thus creating a
[Link] a
dropout probability of p.
Considergiveninputx:{1,2,3,4,5}[Link] a
dropout layer with probability p = 0.2 (or keep probability = 0.8). During the
forward propagation (training) from the input x, 20% of the nodes would be
dropped, i.e. the x could become {1, 0, 3, 4, 5} or {1, 2, 0, 4, 5} and so on.
Similarly, it applied to the hidden layers.
For instance, if the hidden layers have 1000 neurons (nodes) and a
dropout is applied with drop probability = 0.5, then 500 neurons would be
randomly dropped in every iteration (batch).
Generally, for the input layers, the keep probability, i.e. 1- drop
probability, is closer to 1, 0.8 being the best as suggested by the authors. For
the hidden layers, the greater the drop probability more sparse the model,
where 0.5 is the most optimised keep probability, that states dropping 50% of
the nodes.
HowdoesDropoutsolvetheOverfittingproblem?
In the overfitting problem, the model learns the statistical noise. To be
precise, the main motive of training is to decrease the loss function, given all
the units (neurons). So in overfitting, a unit may change in a way that fixes up
themistakesoftheother [Link]-adaptations,whichin
DeepLearning
[Link]–CSE R-22
turnleadstotheoverfittingproblembecausethiscomplexco-adaptationfails to
generalise on the unseen dataset.
This ensures that the model is getting generalised and hence reducing
the overfitting problem.
Figure2:(a)Hiddenlayerfeatureswithoutdropout;
(b)Hiddenlayerfeatureswithdropout
Fromfigure2,wecaneasilymakeoutthatthehiddenlayerwithdropout is
learning more of the generalised features than the co-adaptations in the layer
without dropout. It is quite apparent, that dropout breaks such inter-unit
relations and focuses more on generalisation.
DeepLearning
[Link]–CSE R-22
DropoutImplementation
Figure3:(a)Aunit(neuron)duringtrainingispresentwithaprobabilitypandis
connectedtothenextlayerwithweights ‘w’;
(b)Aunitduringinference/predictionisalwayspresentandis
connectedtothenextlayerwithweights,‘pw’
Inthestandardneuralnetwork,duringtheforwardpropagationwehave the
following equations:
Figure4:Forwardpropagationofastandardneuralnetwork
where:
z:denotethevectorofoutputfromlayer(l+1)beforeactivationy:
denote the vector of outputs from layer l
w:weightofthelayerl b:
biasofthelayerl
DeepLearning
[Link]–CSE R-22
Further, with the activation function, z is transformed into the output for
layer (l+1). Now, if we have a dropout, the forward propagation equations
change in the following way:
Figure5:Forwardpropagationofalayerwithdropout
DeepLearning
[Link]–CSE R-22
[Link],
avoiding overfitting.
Normalization:
Normalization is a pre-processing technique used to standardize [Link]
other words, having different sources of data inside the same range. Not
normalizing the data before training can cause problems in our network, making it
drastically harder to train and decrease its learning speed.
beingthe data point to normalize,the mean of the data set, andthe standard
deviation of the data set. Now, each data point mimics a standard normal
DeepLearning
[Link]–CSE R-22
[Link],noneofthemwillhaveabias,and
therefore, our models will learn better.
InBatchNorm,weusethislasttechniquetonormalizebatchesofdatainside
the network itself.
BatchNormalization
Batch Norm is a normalization technique done between the layers of a
NeuralNetwork instead of in the raw data. It isdone along mini-batches instead
of the full data set. It serves to speed up training and use higher learning rates,
making learning easier.
beingmzthe mean of the neurons’ output and szthe standard deviation of the
neurons’ output.
HowIs ItApplied?
Thefollowingimagerepresentsaregularfeed-forwardNeural Network:are
the inputs,the output of the neurons,the output of the activation functions,
andthe output of the network:
DeepLearning
[Link]–CSE R-22
BatchNorm–intheimagerepresentedwitharedline–isappliedtothe
neurons’[Link],aneuronwithout
BatchNormwouldbecomputedasfollows:
beingthelineartransformationoftheneuron, theweightsoftheneuron,
thebiasoftheneurons,and [Link]
parameters and. Adding Batch Norm, it looks as:
DataAugmentation:
Algorithms can use machine learning to identify different objects and classify
them for image recognition. This evolving technology includes using Data
Augmentation to produce better-performing models. Machine learning models need
to identify an object in any condition, even if it is rotated, zoomed in, or a grainy
image. Researchers needed an artificial way of adding training data with realistic
modifications.
Data augmentation is the addition of new data artificially derived from existing
[Link] resizing, flipping, rotating, cropping, padding,etc. It
DeepLearning
[Link]–CSE R-22
helps to address issues like overfitting and data scarcity, and it makes the model
robust with better performance. Data Augmentation provides many possibilities to
alter the original image and can be useful to add enough data for larger models.
DataAugmentationinaCNN:
Convolutional Neural Networks (CNNs) can do amazing things if there is
sufficient data. However, selecting the correct amount of training data for allof
the features that need to be trained is a difficult question. If the user does not
have enough, the networkcanoverfiton the [Link]
a variety of sizes, poses, zoom, lighting, noise, etc.
DataAugmentationTechniques DataAugmentationFactor
Flipping 2-4x(ineachdirection)
Rotation Arbitrary
Translation Arbitrary
Scaling Arbitrary
SaltandPepperNoiseAddition Atleast2x(dependsontheimplementation)
DeepLearning
[Link]–CSE R-22
Atableoutliningthefactorbywhichdifferentmethodsmultiplytheexistingtrainingdata.
DataAugmentationTechniques:
Some libraries use Data Augmentation by actually copying the training
images and saving these copies as part of the total. This produces new training
examples to feed to the machine learning model. Other libraries simply definea
set of transformsto perform on the input training data. These transforms are
[Link] result,the space the optimizer issearchingis increased.
Thishastheadvantagethatitdoesnotrequireextra diskspacetoaugmentthe
training.
ImageDataAugmentationinvolvesthetechniquessuchas
a) Flips:
By Flipping images, the optimizer will not become biased that particular
features of an imageare only on oneside. To do this augmentation,theoriginal
training imageisflipped verticallyor horizontally over oneaxisof theimage. As a
result, the features continually change directions.
StellathePuppysittingonacarseat StellathePuppyFlippedovertheverticalaxis.
b) Rotation:
DeepLearning
[Link]–CSE R-22
data is great. For rotation, the background color is commonly fixed so that it
can blend when the image is rotated. Otherwise, the model can assume the
background change is a distinct feature. This works best when the background
is the same in all rotated images.
StellathePuppysittingonacarseatStellathePuppyrotated90degrees.
c) Translation:
DeepLearning
[Link]–CSE R-22
StellathePuppysittingonacarseat StellathePuppytranslatedandcroppedsoshe’s
onlypartlyvisible.
d)Scaling:
StellathePuppysittingonacarseat StellathePuppyscaleduptobeevenlargerthan
sheisinreallife.
e)SaltandPepperNoiseAddition
StellathePuppysittingonacarseat StellathePuppywithSaltandPeppernoiseadded
totheimage
DeepLearning
[Link]–CSE R-22
BenefitsofDataAugmentationinaCNN
DrawbacksofDataAugmentation:
Data Augmentation is not useful when the variety required by the application
cannot be artificially generated. For example, if one were training a bird recognition
model and the training data contained only red birds. The training data could be
augmented by generating pictures with the color of the bird varied.
However, the artificial augmentation method may not capture the realisticcolor
details of birds when there is not enough variety of data to start with. For example, if
the augmentation method simply varied red for blue or green, etc. Realistic non-red
birds may have more complex color variations and the model may fail to recognize
the color. Having sufficient data is still important if one wants Data Augmentation to
work properly.
DeepLearning
[Link]–CSE R-22
UNIT-III
RECURRENT NEURAL NETWORK (RNN): Introduction to
RNNs and their applications in sequential data analysis, Back
propagation through time (BPTT), Vanishing Gradient Problem,
gradient clipping Long Short-Term Memory (LSTM) Networks,
Gated Recurrent Units, Bidirectional LSTMs, Bidirectional RNNs.
IntroductiontoRNNsandtheirapplicationsinsequentialdataanalysis:
ADeepLearningapproachformodellingsequentialdataisRNN:
RNNswerethestandardsuggestionforworkingwithsequentialdata
[Link]
DeepLearning
[Link]–CSE R-22
[Link] may
also be unable to generalize to variable-length sequences.
Recurrent Neural Networks use the same weights for each elementof
the sequence, decreasing the number of parameters and allowing themodel
to generalize to sequences of varying lengths. RNNs generalize to
structured data other than sequential data, such as geographical or
graphical data, because of its design.
WhatisaRecurrentNeuralNetwork(RNN)?
Neural networks imitate the function of the human brain in the fieldsof
AI, machine learning, and deep learning, allowing computer programs to
recognize patterns and solve common issues.
DeepLearning
[Link]–CSE R-22
networkscananticipatesequentialdatainawaythatotheralgorithmscan’t.
TheArchitectureofaTraditionalRNN
RNNs are a type of neural network that has hidden states and allows
past outputs to be used as inputs. They usually go like this:
DeepLearning
[Link]–CSE R-22
BelowaresomeexamplesofRNNarchitectures.
DeepLearning
[Link]–CSE R-22
HowdoesRecurrentNeuralNetworkswork?
Theinputlayerxreceivesandprocessestheneuralnetwork’sinput
beforepassingitontothemiddlelayer.
Multiple hidden layers can be found in the middle layer h, each withits
own activation functions,weights,and [Link] neural
network if the various parameters of different hidden layers are not
impacted by the preceding layer, i.e. There is no memory in the neural
network.
DeepLearning
[Link]–CSE R-22
CommonActivationFunctions:
Thefollowingaresomeofthemostcommonlyutilizedfunctions:
Sigmoid:Theformulag(z)=1/(1+e^-z)isusedtoexpressthis.
Tanh:Theformulag(z)=(e^-z–e^-z)/(e^-z+e^-z)isusedtoexpressthis.
ReLu:Theformulag(z)=max(0,z)isusedtoexpressthis.
ApplicationsofRNNNetworks:
DeepLearning
[Link]–CSE R-22
1. MachineTranslation:
2. TextCreation:
RNNs can also be used to build a deep learning model for text
generation. Based on the previous sequence of words/characters used in the
text, a trained modellearns the likelihoodofoccurrenceofa word/character. A
model can be trained at the character, n-gram, sentence, or paragraph level.
3. Captioningofimages:
4. RecognitionofSpeech:
DeepLearning
[Link]–CSE R-22
5. ForecastingofTimeSeries:
RecurrentNeuralNetworkVsFeedforwardNeuralNetwork:
DeepLearning
[Link]–CSE R-22
BackpropagationThroughTime-RNN:
Backpropagation is a training algorithm that we use for training neural
networks. When preparing a neural network, we are tuning the network's
weights to minimize the error concerning the available actual values with the
help of the Backpropagation algorithm. Backpropagation is a supervised learning
algorithm as we find errors concerning already given values.
The backpropagation training algorithm aims to modify the weights of a
neural network to minimize the error of the network results compared to some
expected output in response to corresponding inputs.
DeepLearning
[Link]–CSE R-22
ThegeneralalgorithmofBackpropagationisasfollows:
1. Wefirsttraininputdataandpropagateitthroughthenetworktoget an
output.
2. Comparethepredictedoutcomestotheexpectedresultsandcalculate the
error.
3. Then,wecalculatethederivativesoftheerrorconcerningthenetwork
weights.
4. Weusethesecalculatedderivativestoadjusttheweights tominimize the
error.
5. Repeattheprocessuntiltheerrorisminimized.
Insimplewords,Backpropagationisanalgorithmwherethe informationof
cost function is passed on through the neural network in the backward direction.
The Backpropagation training algorithm is ideal for training feed- forward neural
networks on fixed-sized input-output pairs.
UnrollingTheRecurrentNeuralNetwork
Recurrent Neural Network deals with sequential data. RNN predicts
outputs using not only the current inputs but also by considering those that
occurred before it. In other words, the current outcome depends on the current
production and a memory element (which evaluates the past inputs).
ThebelowfiguredepictsthearchitectureofRNN.
WeuseBackpropagationfortrainingsuchnetworkswithaslightchange.
Wedon'tindependentlytrainthenetworkataspecifictime"t."Wetrainitat
aparticulartime"t"aswellasallthathashappenedbeforetime"t"liket-1,t-2,t-3.
S1,S2,S3arethehiddenstatesattimet 1,t2,t3,respectively,andWsis
theassociated weight matrix.
DeepLearning
[Link]–CSE R-22
x1, x2, x3are the inputs at time t1, t2, t3, respectively, and Wxis the associated
weight matrix.
Y1, Y2, Y3are the outcomes at time t 1, t2, t3, respectively, and Wyis the
associated weight matrix.
At time t0, we feed input x0 to the network and output y0. At time t1, we
[Link],wecan see
that to calculate the outcome. The network uses input x and the cell state fromthe
previous timestamp. To calculate specific Hidden state and output at each step,
here is the formula:
BackpropagationThroughTime
Ws,Wx,andWydonotchangeacrossthetimestamps,whichmeans thatfor all
inputs in a sequence, the values of these weights are the same.
Theerrorfunctionisdefinedas:
Thepointstoconsiderare:
Whatisthetotallossforthisnetwork?
Howdoweupdatetheweights,Ws,Wx,andWy?
The total loss we have to calculate is the sum in overall timestamps, i.e.,
E0+E1+E2+E3+...NowtocalculatetheerrorgradientconcerningWs,Wx,[Link]
relatively easy to calculate the loss derivative concerning Wy as the derivative
only depends on the current timestamp values.
Formula:
DeepLearning
[Link]–CSE R-22
ThencalculatingthederivativeoflossconcerningWsandWx,becomescomplex.
Formula:
Thegeneralexpressioncanbewrittenas:
Similarly,forWx,itcanbewrittenas:
Wefeedasequenceoftimestampsofinputandoutputpairstothenetwork.
Then,weunrollthenetworkthencalculateandaccumulateerrorsacross
each timestamp.
DeepLearning
[Link]–CSE R-22
Finally,werollupthenetworkandupdateweights.
Repeattheprocess.
LimitationsofBPTT:
BPTT has difficulty with local optima. Local optima are a more significant
issue with recurrent neural networks than feed-forward neural networks. The
recurrent feedback in such networks creates chaotic responses in the error
surface,which causeslocal optima to occur frequentlyandinthe wrong locationson
the error surface.
When using BPTT in RNN, we face problems such as exploding gradient
and vanishing gradient. To avoid issues such as exploding gradient, we use a
gradient clipping method to check if the gradient value is greater than the
threshold or not at each timestamp. If it is, we normalize it. This helps to tackle
exploding gradient.
We can use BPTT up to a limited number of steps like 8 or 10. If we
backpropagate further, the gradient becomes too negligible and is a Vanishing
gradient problem. To avoid the vanishing gradient problem, some of the possible
solutions are:
Using ReLU activation function in place of tanh or sigmoid activation
function.
Properinitializingthe weightmatrixcanreducetheeffectofvanishing
gradients. For example, using an identity matrix helps us tackle this
problem.
UsinggatedcellssuchasLSTMorGRUs.
VanishingGradientProblem:
DeepLearning
[Link]–CSE R-22
Firstly,informationtravelsthroughtimeinRNNs,whichmeansthat
informationfromprevioustimepointsisusedasinputforthenexttime points.
Secondly,wecancalculatethecostfunction,ortheerror,ateachtimepoint.
Basically, during the training, your cost function compares your outcomes (red
circles on the image below) to your desired output. As a result, you have these values
throughout the time series, for every single one of these red circles.
The focus is on one error term et. We calculate the cost function etand then
propagatethecostfunctionbackthroughthenetworkbecauseofthe needtoupdatethe
weights.
The problem relates to updating wrec(weight recurring) – the weight that isused
to connect the hidden layers to themselves in the unrolled temporal loop.
For instance, to get from xt-3to xt-2we multiply xt-3by wrec. Then, to get from xt-
2to xt-1we again multiply xt-2by wrec. So, we multiply with the same exact weight
multipletimes, andthisiswherethe problemarises:when we multiplysomethingbya small
number, the value decreases very quickly.
DeepLearning
[Link]–CSE R-22
As we know, weights are assigned at the start of the neural network with the
random values, which are close to zero, and from there the network trains them up.
But, when you start with wrec close to zero and multiply xt, xt-1, xt-2, xt-3, … by this
value, your gradient becomes less and less with each multiplication.
Whatdoesthismeanforthenetwork?
The lower the gradient is, the harder it is for the network to update the weights
and the longer it takes to get to the final result.
For instance, 1000 epochs might be enough to get the final weight for the time
point t, but insufficient for training the weights for the time point t-3 due to a verylow
gradient at this point. However, the problem is not only that half of the network is not
trained properly.
The output of the earlier layers is used as the input for the further layers. Thus,
the training for the time point t is happening all along based on inputs that are coming
fromuntrained layers. So, because of the vanishing gradient, the whole network is not
being trained properly.
To sum up, if wrec is small, you have vanishing gradient problem, and if wrec
is large, you have exploding gradient problem. For the vanishing gradient problem,the
further you go through the network, the lower your gradient is and the harder it is to
train the weights, which has a domino effect on all of the further weightsthroughout
the network.
DeepLearning
[Link]–CSE R-22
That was the main roadblock to using Recurrent Neural Networks. However,
the possible solutions to this problem are as follows:
Solutionstothevanishinggradientproblem
Incaseofexplodinggradient,youcan:
Stopbackpropagatingafteracertainpoint,whichisusuallynotoptimalbecausenot
all of the weights get updated.
Penalizeorartificiallyreducegradient.
Putamaximumlimitonagradient.
Incaseofvanishinggradient,youcan:
Initializeweightssothatthepotentialforvanishinggradientisminimized.
HaveEchoStateNetworksthataredesignedtosolvethevanishinggradientproblem.
HaveLongShort-TermMemoryNetworks(LSTMs).
GradientclippingLongShort-TermMemory(LSTM)Networks:
Training a neural network can become unstable given the choice of error
function, learning rate, or even the scale of the target variable. Large updates to
weightsduringtrainingcancausea numericaloverfloworunderflowoften referred to as
“Exploding Gradients.”
A common and relatively easy solution to the exploding gradients problem isto
change the derivative of the error before propagating it backward through the network
and using it to update the weights. Two approaches include rescaling the gradients
givenachosenvector normandclippinggradientvaluesthatexceed apreferredrange.
Together, these methods are referred to as “Gradient Clipping.”
Trainingneuralnetworkscanbecomeunstable,leadingtoanumericaloverflow
or underflow referred to as exploding gradients.
DeepLearning
[Link]–CSE R-22
The training process can be made stable by changing the error gradients
either by scaling the vector norm or clipping gradient values to a range.
How to update anMLP model for aregression predictive modeling problem
with exploding gradients to have a stable training process using gradient
clipping methods?
ExplodingGradientsandClipping
Neural networks are trained using the stochastic gradient descentoptimization
algorithm. This requires first the estimation of the loss on one or more training
examples, then the calculation of the derivative of the loss, which is propagated
backward through the network in order to update the weights. Weights are updated
using a fraction of the back propagated error controlled by the “LearningRate”.
It is possible for the updates to the weights to be so large that the weights
either overflow or underflow their numerical precision. In practice, the weights can
take on the value of an “NaN” or “Inf” when they overflow or underflow and for
practical purposes the network will be useless from that point forward, forever
predicting NaN values as signals flow through the invalid weights.
The difficulty that arises is that when the parameter gradient is very large, a
gradient descent parameter update could throw the parameters very far, into aregion
where the objective function is larger, undoing much of the work that hadbeen doneto
reach the current solution.
Poorchoiceoflearningratethatresultsinlargeweightupdates.
DeepLearning
[Link]–CSE R-22
One difficulty when training LSTM with the full gradient is that the derivatives
sometimes become excessively large, leading to numerical problems. To prevent
this, [we] clipped the derivative of the loss with respect to the network inputs to the
LSTM layers (before the sigmoid and tanh functions are applied) to lie within a
predefined range.
Therearetwomainmethodsforupdatingtheerrorderivativeasfollows:
GradientScaling.
GradientClipping.
Gradient scaling involves normalizing the error gradient vector such thatvector
norm (magnitude) equals a defined value, such as 1.0. One simplemechanism todeal
with a sudden increase inthenormof the gradientsis to rescale them whenever they
go over a threshold
DeepLearning
[Link]–CSE R-22
Experimental analysis reveals that for a given task and model size, training is
not very sensitive to this [gradient norm] hyperparameter and the algorithm behaves
well even for rather small thresholds.
It is common to use the same gradient clipping configuration for all layers in
the network. Nevertheless, there are examples where a larger range of error
gradients are permitted in the output layer compared to hidden layers.
The output derivatives […]were clipped in the range [−100, 100], andthe LSTM
derivatives were clipped in the range [−10, 10]. Clipping the output gradients proved
vital for numerical stability; even so, the networks sometimes had numericalproblems
late on in training, after they had started overfitting on the training data.
GatedRecurrentUnit(GRU):
A Gated Recurrent Unit (GRU) is a Recurrent Neural Network (RNN) architecturetype.
Like other RNNs, a GRU can process sequential data such as time series, natural language,
and speech. The main difference between a GRU and other RNN architectures, such as the
Long Short-Term Memory (LSTM) network, is how the network handles information flow
through time.
Example:
"MymomgavemeabicycleonmybirthdaybecausesheknewthatIwantedtogobikingwith my
friends."
DeepLearning
[Link]–CSE R-22
As it can be observed from the above sentence, words that affect each other can be
further apart. For example, "bicycle" and "go biking" are closely related but are placed
further apart in the sentence. An RNN network finds tracking the state with such a long
context difficult. It needs to find out what information is important. However, a GRU cell
greatly alleviates this problem.
[Link]
contextsplacedfurtherapart,[Link] the
GRU cell in the GRU architecture is built.
UnderstandingtheGRUCell:
The GRU cell is the basic building block of a GRU network. It comprises three
maincomponents: an update gate, a reset gate, and a candidate hidden state.
One of the key advantages of the GRU cell is its simplicity. Since it has fewer
parameters than a long short-term memory (LSTM) cell, it is faster to train and run and less
prone to overfitting.
Additionally, one thing to remember is that the GRU cell architecture is simple, the
cell itself is a black box, and the final decision on how much we should consider the past
state and how much should be forgotten is taken by this GRU cell.
GRUvsLSTM
GRU LSTM
Simplerstructurewithtwogates Morecomplexstructurewiththreegates
Structure
(update and reset gate) (input, forget, and output gate)
Fewer parameters (3 weight
Parameters Moreparameters(4weightmatrices)
matrices)
DeepLearning
[Link]–CSE R-22
GRU LSTM
Training Fastertotrain Slow to train
Inmostcases,GRUtendtouse LSTMhasamorecomplexstructureand
Space fewermemoryresourcesduetoits alargernumberofparameters,thusmight
simpler structure and fewerrequire more memory resources and
Complexity
parameters,thusbettersuitedfor couldbelesseffectiveforlargedatasets
largedatasetsorsequences. orsequences.
Generallyperformedsimilarlyto
LSTMgenerallyperformswellonmany
LSTMonmanytasks,butinsome
tasks but is more computationally
cases,GRUhasbeenshownto
expensive and requires more memory
PerformanceoutperformLSTMandviceversa.
resources. LSTM has advantages over
It'sbettertotrybothandseewhich
GRU in natural language understanding
worksbetterforyourdatasetand
and machine translation tasks.
task.
TheArchitectureofGRU
AGRUcellkeepstrackoftheimportantinformationmaintainedthroughoutthenetwork.A
GRU network achieves this with the following two gates:
ResetGate
UpdateGate.
[Link]
inputs:
1. Theprevioushiddenstate
2. Theinputinthecurrenttimestamp.
The cell combinestheseandpasses them through the update and reset gates. To get
the output in the current timestep, we must pass this hidden state through a dense layer
with softmax activation to predict the output. Doing so, a new hidden state is obtained and
then passed on to the next time step.
DeepLearning
[Link]–CSE R-22
Updategate
An update gate determines what current GRU cell will pass information to the next
GRU cell. It helps in keeping track of the most important information.
ObtainingtheoutputoftheUpdateGateinaGRUcell:
The input to the update gate is the hidden layer at the previous timestep, h(t−1)and
the current input (xt). Both have their weights associated with them which are learned
during the training process. Let us say that the weights associated withh(t−1)isU(z), and thatof
xtis Wz. The output of the update gate Ztis given by,
zt=σ(W(z)xt+U(z)h(t−1)
Resetgate
ObtainingtheoutputoftheResetGateinaGRUcell:
The input to the reset gate is the hidden layer at the previous timestep h(t−1)andthe
current input xt. Both have their weights associated withthem whichare learned during
[Link] h(t−1)isUr,[Link]
output of the update gate rt is given by,
rt=σ(W(r)xt+U(r)h(t−1))
It is important to note that the weights associated with the hidden layer at the
previous timestep and the current input are different for both gates. The values for these
weights are learned during the training process.
HowDoesGRUWork?
Gated Recurrent Unit (GRU)networks process sequential data, such as time series or
natural language, bypassing the hidden state from one time step to the next. The hidden
state is a vector that captures the information from the past time steps relevant to the
[Link]
DeepLearning
[Link]–CSE R-22
CandidateHiddenState
ht′=tanh(Wxt+rt⊙Uht−1)
Here,W-weightassociatedwiththecurrentinput
rt-Outputoftheresetgate
U-Weightassociatedwiththehiddenlayeroftheprevioustimestep
ht-Candidatehiddenstate.
Hiddenstate
Thefollowingformulagivesthenewhiddenstateanddependsontheupdategate and
candidate hidden state.
ht=zt⊙ht−1+(1−zt)⊙ht′
Here,zt-OutputofupdategateKaTeXparseerrorExpected'EOF'got'’'atposition2:h’ t-
Candidatehiddenstate
ht−1-Hiddenstateattheprevioustimestep
It can be observed that whenever ztis 0, the information at the previously hidden
layer gets forgotten. It is updated with the value of the new candidate hidden layer
(as1−ztwillbe1).If ztis1,thentheinformationfromthepreviously hidden [Link]
is how the most relevant information is passed from one state to the next.
ForwardPropagationinaGRUCell
InaGatedRecurrentUnit(GRU)cell,theforwardpropagationprocessincludesseveral
steps:
DeepLearning
[Link]–CSE R-22
Calculatetheoutputoftheupdategate(zt)usingtheupdategateformula:
Calculatetheoutputoftheresetgate(rt)usingtheresetgateformula:
Calculatethecandidate'shiddenstate.
DeepLearning
[Link]–CSE R-22
Calculatethenewhiddenstate.
This is how forward propagation happens in a GRU cell at a GRU network. Next, the
process of how the weights is learnt in a GRU network to make the right prediction have to
be understood.
BackpropagationinaGRUCell
Leteachhiddenlayer(orangecolour)representaGRUcell.
In the above image, it is observed that whenever the network predicts wrongly, the
network compares it with the original label, and the loss is then propagated throughout the
[Link]'valuesareidentifiedsothatthevalueof theloss
function used to compute the loss is minimum. During this time, the weights and biases
associated with the hidden layers and the input are fine-tuned.
DeepLearning
[Link]–CSE R-22
AnalogybetweenLSTMandGRUintermsofarchitectureandperformance:
LSTM and GRU are two types of recurrent neural networks (RNNs) that can handle
sequential data, such as text, speech, or video. They are designed to overcome the problem of
vanishing or exploding gradients that affect the training of standard RNNs. However, they
have different architectures and performance characteristics that make them suitable for
different applications. In this article, you will learn about the differences and similarities
between LSTM and GRU in terms of architecture and performance.
LSTMArchitecture
LSTM stands for long short-term memory, and it consists of a series of memory cells
that can store and update information over long time steps. Each memory cell has three
gates: an input gate, an output gate, and a forget gate. The input gate decides what
information to add to the cell state, the output gate decides what information to output
from the cell state, and the forget gate decides what information to discard from the cell
state. The gates are learned by the network based on the input and the previous hidden
state.
GRUArchitecture
GRU standsfor gated recurrentunit, and it is asimplified versionof LSTM. It hasonly
two gates: a reset gate and an update gate. The reset gate decides how much of theprevious
hidden state to keep, and the update gate decides how much of the new input to
incorporate into the hidden [Link] hidden state also acts as the cell state andtheoutput,
so there is no separate output gate. The GRU is easier to implement and requires fewer
parameters than the LSTM.
PerformanceComparison
The performance of LSTM and GRU depends on the task, the data, and the
hyperparameters. Generally, LSTM is more powerful and flexible than GRU, but it is also
more complex and prone to overfitting. GRU is faster and more efficient than LSTM, but it
may not capture long-term dependencies as well as LSTM. Some empirical studies have
shownthatLSTMandGRUperformsimilarlyonmanynaturallanguageprocessingtasks,
DeepLearning
[Link]–CSE R-22
such as sentiment analysis, machine translation, and text generation. However, some tasks
may benefit from the specific features of LSTM or GRU, such as image captioning, speech
recognition, or video analysis.
SimilaritiesBetweenLSTMandGRU
Despite their differences, LSTM and GRU share some common characteristics that
[Link] and to
avoid the vanishing or exploding gradient problem. They both can learn long-term
dependencies and capture sequential patterns in the data. They both can be stacked into
multiple layers to increase the depth and complexity of the network.
They both can be combined with other neural network architectures, such as
convolutional neural networks (CNNs) or attention mechanisms, to enhance their
performance.
DifferencesBetweenLSTMandGRU
The main differences between LSTM and GRU lie in their architectures and their
trade-offs. LSTM has more gates and more parameters than GRU, which gives it more
flexibility and expressiveness, but also more computational cost and risk of overfitting. GRU
has fewer gates and fewer parameters than LSTM, which makes it simpler and faster, but
also less powerful and adaptable.
LSTM has a separate cell state and output, which allows it to store and output
different information, while GRU has a single hidden state that serves both purposes, which
may limit its capacity. LSTM and GRU may also have different sensitivities to the
hyperparameters, such as the learning rate, the dropout rate, or the sequence length.
BidirectionalLSTM
Introduction:
To understand the working of Bi-LSTM first, the working of the unit cell of LSTM
and LSTM network has to be understood. LSTM stands for long short-term memory. In
1977, Hochretier and Schmidhuber introduced LSTM networks. These are the most
commonly used recurrent neural networks.
DeepLearning
[Link]–CSE R-22
NeedofLSTM
As the sequential data is better handled by recurrent neural networks, but
sometimes it is also necessary to store the result of the previous data. For example, “I
will play cricket” and “I can play cricket” are two different sentences with different
meanings. The meaning of the sentence depends on a single word so, it is necessary to
store the data of previous words. But no such memory is available in simple RNN. To
solve this problem, LSTM is adopted.
TheArchitectureoftheLSTMUnit
TheLSTMunithasthreegates.
a) Inputgate
First, the current state x(t) and previous hidden state h(t-1) are passed into the
input gate, i.e., the second sigmoid function. The x(t) and h(t-1) values are transformed
between0and1,where 0isimportant,and1is [Link],thecurrent and
hidden state information will be passed through the tanh function. The output from the
tanh function will range from -1 to 1, and it will help to regulate the network. Theoutput
values generated from the activation functions are ready for point-by-point
multiplication.
b) Forgetgate
The forget gate decides which information needs to be kept for furtherprocessing
and which can be ignored. The hidden state h(t-1) and current input X(t)
[Link]
DeepLearning
[Link]–CSE R-22
thesigmoidfunction,itgeneratesvaluesbetween0and1thatconcludewhetherthepartof
the previous output is necessary (by giving the output closer to 1).
c) Outputgate
The output gate helps in deciding the value of the next hidden state. This state
contains information on previous inputs. First, the current and previously hidden state
values are passed into the third sigmoid function. Then the new cell state generated
fromthe cell state is passed throughthe tanh function. Boththese outputs aremultiplied
point-by-point. Based upon the final value, the network decides which information the
hidden state should carry. This hidden state is used for prediction.
Finally, the new cell state and the new hidden state are carried over to the next
step. To conclude, the forget gate determines which relevant information from the prior
stepsisneeded. Theinputgatedecideswhatrelevantinformationcanbeaddedfromthe
current step, and the output gates finalize the next hidden state.
HowdoLSTMwork?
TheLengthyShortTermMemoryarchitecture wasinspiredbyanexaminationof
error flow in current RNNs, which revealed that long time delays were inaccessible to
existing designs due to backpropagated error, which either blows up or decays
exponentially.
An LSTM layer is made up of memory blocks that are recurrently linked. These
blocks can be thought of as a differentiable version of a digital computer's memory
chips. Each one has recurrently connected memory cells as well as three multiplicative
units – the input, output, and forget gates – that offer continuous analogs of the cells'
write, read, and reset operations.
WhatisBi-LSTM?
Bidirectional LSTM networks function by presenting each training sequence
forward and backward to two independent LSTM networks, both of which are coupledto
the same output layer. This means that the Bi-LSTM contains comprehensive, sequential
information about all points before and after each point in a particular sequence.
In other words, rather than encoding the sequence in the forward direction only,
weencodeitinthebackwarddirectionaswellandconcatenatetheresultsfromboth
DeepLearning
[Link]–CSE R-22
[Link]
understands the words before and after the specific word.
BelowisthebasicarchitectureofBi-LSTM.
WorkingofBi-LSTM:
Considerthesentence“Iwillswimtoday”.Thebelowimagerepresentsthe
encodedrepresentationofthesentenceintheBi-LSTM network.
DeepLearning
[Link]–CSE R-22
UNIT-IV
GENERATIVEADVERSARIALNETWORKS(GANS):
Generativemodels,Conceptandprinciplesof GANs,Architecture of
GANs (generator and discriminator networks), Comparison
between discriminative and generative models, Generative
Adversarial Networks (GANs), Applicationsof GANs
GenerativeAdversarialNetworksanditsmodels
Introduction:
visually.
DeepLearning
[Link]–CSE R-22
Adversarial–Thetrainingofthemodelisdoneinanadversarialsetting.
Networks–Usedeepneuralnetworksfortrainingpurposes.
The generator network takes random input (typically noise) and generates
samples, such as images, text, or audio, that resemble the training data it
The discriminator network, on the other hand, tries to distinguish between real
and generated samples. It is trained with real samples from the training data and
the discriminator. The generator aims to produce samples that fool the discriminator,
while the discriminator tries to improve its ability to distinguish between real and
between real and generated data. Ideally, this process converges to a point wherethe
generator is capable of generating high-quality samples that are difficult for the
image synthesis, text generation, and even video generation. They have been used
for tasks like generating realistic images, creating deepfakes, enhancing low-
resolution images, and more. GANs have greatly advanced the field of generative
modeling and have opened up new possibilities for creative applications in artificial
intelligence.
DeepLearning
[Link]–CSE R-22
WhyGANswasDeveloped?
misclassify things by adding some amount of noise to data. After adding some
rise that, is it possible to implement something that neural networks can start
visualizing new patterns like sample train data. Thus, GANs were built that generate
ComponentsofGenerativeAdversarialNetworks(GANs):
WhatisGeometricIntuitionbehindtheworkingofGANs?
Two major components of GANs are Generator and Discriminator. The role of
the generator is like a thief to generate the fake samples based on the original
sample and make the discriminator fool to understand Fake as real. On the other
hand, a Discriminator is like a Police whose role is to identify the abnormalities in the
samples created by Generator and classify them as Fake or real. This competition
between both the component goes on until the level of perfection is achieved where
DeepLearning
[Link]–CSE R-22
Here, the generative model captures the distribution of data and is trained in
such a manner to generate the new sample that tries to maximize the probability of
on other hand is based on a model that estimates the probability that the sample it
receives is from training data not from the generator and tries to classify it accurately
and minimize the GAN accuracy. Hence the GAN network is formulated as aminimax
game where the Discriminator is trying to minimize its reward V(D, G)and the
Thebelowfigureaddressestheconstraints
DeepLearning
[Link]–CSE R-22
Howtwoneuralnetworksarebuildandtrainingandpredictionisdone?
connected to the input of the discriminator. And discriminator predicts it and through
Training&PredictionofGenerativeAdversarialNetworks(GANs):
The problem statement is key to the success of the project so the first step is
to define the problem. GANs work with a different set of problems you are aiming so
you need to define What you are creating like audio, poem, text, Image is a type of
problem.
Step-2)SelectArchitectureofGAN
TherearemanydifferenttypesofGAN&basedonthescenario(s),asuitable
GANarchitecture is chosen.
Step-3)TrainDiscriminatoronRealDataset
[Link].
DeepLearning
[Link]–CSE R-22
And the provided Data is without Noise and only contains real images, and for
fakeimages,Discriminatorusesinstancescreatedbythegeneratorasnegative output.
DiscriminatorTraining:
Itclassifiesbothrealandfakedata.
Thediscriminatorlosshelpsimproveitsperformanceandpenalizeitwhenitmisclassifies
weightsofthediscriminatorareupdatedthroughdiscriminatorloss.
Step-4)TrainGenerator
Provide some Fake inputs for the generator (Noise) and it will use some
random noise and generate some fake outputs. when Generator is trained,
generator training through any random noise as input, it tries to transform it into
meaningful data. to get meaningful output from the generator takes time and runs
Getrandomnoiseandproduceageneratoroutputonnoisesample
Predictgeneratoroutputfromdiscriminatorasoriginalorfake.
Calculatediscriminatorloss.
Performbackpropagationthroughdiscriminator,andgeneratorbothtocalculategradients.
Usegradientstoupdategeneratorweights.
Step-5)TrainDiscriminatoronFakeData
The samples which are generated by Generator will pass to Discriminator and
It will predict the data passed to it is Fake or real and provide feedback to Generator
again.
DeepLearning
[Link]–CSE R-22
Step-6)TrainGeneratorwiththeoutputofDiscriminator
improve performance. This is an iterative process and continues running until the
GenerativeAdversarialNetworks(GANs)LossFunction:
The loss function is used in minimize and maximize of the iterative process.
The generator tries tominimize the followingloss function while the discriminatortries
to maximize it. It is the same as a minimax game if you have ever played.
D(x)isthediscriminator’sestimateoftheprobabilitythatrealdatainstancexisreal.
Existheexpectedvalueoverallrealdatainstances.
G(z)isthegenerator’soutputwhengivennoisez.
D(G(z))isthediscriminator’sestimateoftheprobabilitythatafakeinstanceisreal.
Ezistheexpectedvalueoverallrandominputstothegenerator(ineffect,theexpectedvalue
DeepLearning
[Link]–CSE R-22
ChallengesFacedbyGenerativeAdversarialNetworks(GANs):
DifferentTypesofGenerativeAdversarialNetworks(GANs):
1) DC GAN –It is a Deep convolutional GAN. It is one of the most used, powerful,and
without max pooling and layers in this network are not completely connected.
learning neural network in which some additional parameters are used. Labels are
also put in inputs of Discriminator in order to help the discriminator to classify the
3) Least Square GAN (LSGAN) –It is a type of GAN that adopts the least-square
[Link] LSGANresults in
DeepLearning
[Link]–CSE R-22
4) Auxilary Classifier GAN (ACGAN) –It is the same as CGAN and an advanced
version of it. It says that the Discriminator should not only classify the image as real
or fake but should also provide the source or class label of the input image.
for video generation built upon the BigGAN architecture. DVD-GAN uses two
7) Cycle GAN - It is released in 2017 which performs the task of Image Translation.
Suppose we have trained it on a horse image dataset and we can translate it into
zebra images.
TopGenerativeAdversarialNetworksApplications:
1) Generate Examples for Image Datasets: GANs can be used to generate new
examples for image datasets in various domains, such as medical imaging, satellite
imagery,and [Link], researcherscan
augment existingdatasets and improve the performance of machine learning models.
DeepLearning
[Link]–CSE R-22
renderedimagescanbeusedtoaugmentexistingimagedatasetsortocreateentirelynew
datasets.
8) Face Frontal View Generation: GANs can generate frontal views of faces from
images that show the face at an angle. We can use it to improve face recognition
algorithm’s performance or synthesize pictures for use in other applications.
9) Generate New Human Poses: GANs can generate images of people in new
poses, such as difficult or impossible for humans to achieve. It can be used to create
new content or to augment existing image datasets.
10) Photos to Emojis: GANs can be used to convert photographs of people into
emojis, creating a more personalized and expressive form of communication.
11) Photograph Editing: GANs can be used to edit photographs in various ways,
such as changing the background, adding or removing objects, or altering the
appearance of people or animals in the image.
DeepLearning
[Link]–CSE R-22
12) Face Aging: GANs can be used to generate images of people at different ages,
allowing users to visualize how they might look in the future or to see what theymight
have looked like in the past.
DifferencesBetweenDiscriminativeandGenerativeModels
1) CoreIdea
2) MathematicalIntuition
3) Applications
Since these models use different approaches to machine learning, both are
suited for specific tasks i.e., Generative models are useful for unsupervised learning
tasks. In contrast, discriminative models are useful for supervised learning tasks.
GANs(Generativeadversarialnetworks)canbethoughtofasa competitionbetween the
generator, which is a component of the generative model, and the discriminator, so
basically, it is generative vs. discriminative model.
4) Outliers
Generativemodelshavemoreimpactonoutliersthandiscriminativemodels.
DeepLearning
[Link]–CSE R-22
5) ComputationalCost
ComparisonBetweenDiscriminativeandGenerativeModels:
1) Basedon Performance
2) BasedonMissingData
3) BasedontheAccuracyScore
4) BasedonApplications
GenerativeModelsvsDiscriminativeModels:
Machine learning (ML) and Deep Learning (DL) are two of the most exciting
[Link]
DeepLearning
[Link]–CSE R-22
technologies,machinesaregiventheabilitytolearnfrompastdataandpredictormake
decisions from future, unseen data.
The inspiration comes from the human mind, how we use past experiences to
help us make informed decisions in the present and the future. And while there are
already many applications of ML and DL, the future possibilities are endless.
Quintillions of data are generated all over the world almost daily, so getting
fresh data is easy. But in order to work with this gigantic amount of data, we need
new algorithms or we need to scale up existing ones.
1. Discriminativemodels
2. Generativemodels
DeepLearning
[Link]–CSE R-22
Discriminativemodel
Outliers have little to no effect on these models. They are a better choice than
generative models, but this leads to misclassification problems which can be a major
drawback.
Here are some examples and a brief description of the widely used
discriminative models:
2. Support vector machines: This isa powerful learning algorithm with applicationsin
both regression and classification scenarios. An n-dimensional space containing the
data points is divided into classes by decision boundaries using support vectors. The
best boundary is called a hyperplane.
3. Decision trees: A graphical tree-like model is used to map decisions and their
probable outcomes. It could be thought of as a robust version of If-else statements.
Generativemodel
As the name suggests, generative models can be used to generate new data
points. These models are usually used in unsupervised machine learning problems.
Generative models go in-depth to model the actual data distribution and learn the
different data points, rather than model just the decision boundary between classes.
These models are prone to outliers, which is their only drawback when
compared to discriminative models. The mathematics behind generative models is
quite intuitive too. The method is not direct like in the case of discriminative models.
DeepLearning
[Link]–CSE R-22
TocalculateP(Y|X),theyfirstestimatethepriorprobabilityP(Y)andthelikelihood
probability P(X|Y) from the data provided.
PuttingthevaluesintoBayes’theorem’sequation,wegetanaccurate
valueforP(Y|X).
Someexamplesaswellasadescriptionofgenerativemodelsareasfollows:
1. Bayesian network: Also known as Bayes’ network, this model uses a directed
acyclic graph (DAG) to draw Bayesian inferences over a set of random variables to
calculate probabilities. It has many applications like prediction, anomaly detection,
time series prediction, etc.
2. Autoregressive model: Mainly used for time series modeling, it finds a correlation
between past behaviors to predict future behaviors.
SomeotherexamplesincludeNaiveBayes,Markovrandomfield,hiddenMarkovmodel
(HMM), latent Dirichlet allocation (LDA), etc.
Discriminativevsgenerative:WhichisthebestfitforDeepLearning?
DeepLearning
[Link]–CSE R-22
Discriminative models divide the data space into classes by learning the
boundaries, whereas generative models understand how the data is embedded into
the space. Both the approaches are widely different, which makes them suited for
specific tasks.
Deep learning has mostly been using supervised machine learning algorithms
like Artificial Neural Networks (ANNs), convolutional neural networks (CNNs), and
Recurrent Neural Networks (RNNs). ANN is the earliest in the trio and leverages
artificial neurons, backpropagation, weights, and biases to identifypatterns based on
the inputs. CNN is mostly used for image recognition and computer vision tasks. It
works by pooling important features from an input image. RNN, which is the latest of
the three, is used in advanced fields like natural language processing, handwriting
recognition, time series analysis, etc.
This Person Does Not Exist - Random Face Generatoris an interesting website that
uses a type of generative model called StyleGAN to create realistic human faces,
even though the people in these images don’t exist!
DeepLearning
[Link]–CSE R-22
UNIT-V
Auto-encoders:
Autoencoders are a type of deep learning algorithm that are designed to
receive an input and transform it into a different representation. They play an
important part in image construction. Artificial Intelligence encircles a wide range of
technologies and techniques that enable computer systems to solve problems like
Data Compression which is used in computer vision, computer networks, computer
architecture, and many other fields.
What AreAutoencoders?
Similar machine learning algorithm i.e., PCA (Principal Component Analysis) which
does the same task also co-exists.
Autoencoders:ItsEmergence
AutoencodersarepreferredoverPCAbecause:
DeepLearning
[Link]–CSE R-22
Anautoencodercanlearnnon-lineartransformationswithanon-linear
activation function and multiple layers.
Itdoesn’[Link] convolutionallayerstolearn
which is better for video, image and series data.
Itismoreefficienttolearnseverallayerswithanautoencoderratherthanlearnone
huge transformation with PCA.
Anautoencoderprovidesarepresentationofeachlayerastheoutput.
Itcanmakeuseofpre-trainedlayersfromanothermodeltoapplytransferlearning
to enhance the encoder/decoder.
ApplicationsofAutoencoders
1) ImageColoring
Autoencoders are used for converting any black and white picture into a
colored image. Depending on what is in the picture, it is possible to tell what thecolor
should be.
2) Featurevariation
It extracts only the required features of an image and generates the output by
removing any noise or unnecessary interruption.
DeepLearning
[Link]–CSE R-22
3) DimensionalityReduction
The reconstructed image is the same as our input but with reduced
dimensions. It helps in providing the similar image with a reduced pixel value.
4) DenoisingImage
The input seen by the autoencoder is not the raw input but a stochastically
corrupted version. A denoising autoencoder is thus trained to reconstruct the original
input from the noisy version.
DeepLearning
[Link]–CSE R-22
5) WatermarkRemoval
Itisalsousedforremovingwatermarksfromimagesortoremoveanyobjectwhilefilminga video or
a movie.
ArchitectureofAutoencoders
AnAutoencoderconsistofthreelayers:
1. Encoder
2. Code
3. Decoder
Encoder:This part of the network compresses the input into a latent space
[Link] encodes theinputimageasa compressed
representation in a reduceddimension. The compressedimageis the distorted
version of the original image.
Code:Thispart of the network represents the compressed input which is fed to
the decoder.
DeepLearning
[Link]–CSE R-22
Thelayerbetweentheencoderanddecoder,[Link] as
Bottleneck. This is a well-designed approach to decide which aspects of observed
data are relevant information and what aspects can be discarded. It does this by
balancing two criteria:
Compactnessofrepresentation,measuredasthecompressibility.
Itretainssomebehaviourallyrelevantvariablesfromtheinput.
Traininganauto-encoderfordatacompressionandreconstruction:
The encoder network takes the input data and maps it to a lower-dimensional
representation. This lower-dimensional representation is the compressed data. The
decoder network takes this compressed data and maps it back to the original input
data. The decoder network is essentially the inverse of the encoder network.
The bottleneck layer is the layer in the middle of the autoencoder thatcontains
the compressed data. This layer is much smaller than the input data, which
DeepLearning
[Link]–CSE R-22
is what allows for compression. The size of the bottleneck layer determines the
amount of compression that can be achieved. Autoencoders differ from other deep
learning architectures, such as convolutional neural networks (CNNs) and recurrent
neural networks (RNNs), in that they do not require labeled data. Autoencoders can
learn the underlying structure of the data without any explicit labels.
Image CompressionwithAutoencoders
There are two types of image compression: lossless and lossy. Lossless
compression methods preserve all of the data in the original image, while lossy
compression methods discard some of the data to achieve higher compressionrates.
Autoencoders can be used for both lossless and lossy compression. Lossless
compression can be achieved by using a bottleneck layer that is the same size asthe
input data. In thiscase, the autoencoderessentiallylearns to encode anddecode the
input data without any loss of information.
DeepLearning
[Link]–CSE R-22
ImageReconstructionwithAutoencoders
Autoencoders are a type of neural network that can be used for image
compression and reconstruction. The process involves compressing an image into a
smaller representation and then reconstructing it back to its original form. Image
reconstruction is the process of creating an image from compressed data.
Explanationofimagereconstructionfromcompresseddata:
Howautoencoderscanbeusedforimagereconstruction:
Examplesofimagereconstructionusingautoencoders:
Autoencoder-basedreconstructiontechniquesefficiencyevaluation:
DeepLearning
[Link]–CSE R-22
comparingittotheoriginalimage,whileSSIMmeasuresthestructuralsimilaritybetween
the reconstructed and original images.
VariationsofAutoencodersforImageCompressionandReconstruction
1) Denoisingautoencoders:
2) Variationalautoencoders:
3) Convolutionalautoencoders:
Comparisonoftheeffectivenessofdifferenttypesofautoencodersforimagecompression&re
construction:
Real-TimeExamples:
DeepLearning
[Link]–CSE R-22
ApplicationsofAutoencodersforImageCompressionandReconstruction
1) MedicalImaging:
2) VideoCompression:
Autoencoders have also been used for video compression, where the goal isto
compress a sequence of images into a compact representation that can be
transmitted or stored efficiently. One example of this is the video codec AV1, which
uses a combination ofautoencodersand traditional compression methods to achieve
higher compression rates while maintaining video quality. The autoencoder
component of the codec is used to learn spatial and temporal features of the video
frames, which are then used to reduce redundancy in the video data.
DeepLearning
[Link]–CSE R-22
3) AutonomousVehicles:
Autoencoders are also useful for autonomous vehicle applications, where the
goal is to compress high-resolution camera images captured by the vehicle’ssensors
while preserving critical information for navigation and obstacle detection. For
example, researchers have developed an autoencoder-based approach for
compressing images captured by a self-driving car, which achieved highcompression
ratioswhilepreservingtheaccuracyof [Link] can have
significant implications for improving the performance and reliability of autonomous
vehicles, especially in scenarios where high-bandwidth communication is not
available.
4) SocialMediaandWebApplications:
Autoencoders have also been used in social media and web applications,
where the goal is to reduce the size of image files to improve website loading times
and reduce bandwidth usage. For example, Facebook uses an autoencoder-based
approach for compressing images uploaded to their platform, which achieves high
compression ratios while preserving image quality. This has led to faster loading
times for images on the platform and reduced data usage for users.
RelationshipbetweenAutoencodersandGANs:
DeepLearning
[Link]–CSE R-22
Autoencoders and GANs are both powerful techniques for learning from data
in an unsupervised way, but they have some differences and [Link]
are easier to train and more stable, but they tend to produce blurry or distorted
reconstructions or generations. GANs are harder to train and more proneto mode
collapse, where they produce only a few modes of the data distribution, but
[Link] and
your data, you might prefer one or the other, or even combine them in a hybrid
model.
Autoencoders are unsupervised models, which means that they are nottrained
on labeled data. Instead, they are trained on unlabeled data and learn to reconstruct
the input data. GANs, on the other hand, are supervised models, which means that
they are trained on labeled data. The generator in a GAN is trained to generate data
that looks like the labeled data, and the discriminator is trained to distinguish
between real and fake data. Autoencoders are typically used for tasks such as image
denoising and compression. GANs are typically used for tasks such as image
HybridModels:Encoder-DecoderGANs:
HowcanyoucombineGANsandautoencoderstocreatehybridmodelsforvarioustasks?
Generativeadversarialnetworks(GANs)andautoencodersaretwopowerfultypesof
artificial neural networks that can learn from data and generate new samples. But what
ifyoucouldcombine themtocreate hybridmodelsthatcanperformvarioustasks,suchas image
synthesis, anomaly detection, or domain adaptation.
GANsandautoencoders
generator tries to create realistic samples from random noise, while the discriminator
tries to distinguish between real and fake samples. The two networks compete with
each other, improving their skills over time. Autoencoders are composed of two
networks:[Link]
DeepLearning
[Link]–CSE R-22
the representation. The goal is to minimize the reconstruction error, while learning
useful features from the data.
Hybridmodels
Hybrid models are models that combine GANs and autoencoders in different
ways, depending on the task and the objective. For example, you can use an
autoencoder as the generator of a GAN, and train it to fool the discriminator, while
also minimizing the reconstruction error. This way, we can generate realistic samples
that are similar to the input data, but also have some variations. Alternatively, youcan
use a GAN as the encoder of an autoencoder, and train it to encode the input data
into a latent space that is compatible with the discriminator. This way, you can learn
Image synthesis
One of the most common tasks for hybrid models is image synthesis, which is
the process of creating new images from existing ones, or from scratch. For example,
you can use a hybrid model to synthesize images of faces, animals, or landscapes, by
using an autoencoder as the generator of a GAN, and feeding it with real images or
random noise. This way, you can create diverse and realistic images that preserve the
attributes of the input data, but also have some variations. You can also use a hybrid
feeding it with images from both domains. This way, you can learn a common latent
space that can be used to transfer the style or the attributes of one domain to
another.
DeepLearning
[Link]–CSE R-22
Anomalydetection
Another task for hybrid models is anomaly detection, which is the process of
measure of anomaly. You can also use a hybrid model to detect anomalies in time
series, such as sensor readings, or financial transactions, by using a GAN as the
encoder ofan autoencoder, and feeding it with normal time series. This way, you can
train the GAN to encode normal time series well, but fail to encode abnormal time
series. Then, we can use the latent space or the discriminator score as a measure of
anomaly.
Domainadaptation
A third task for hybrid models is domain adaptation, which is the process of
requiring labeled data from the target domain. For example, you can use a hybrid
with images from both domains. This way, you can train the GAN toencode both
domains into a shared latent space that is invariant to the domain differences.
DeepLearning