0% found this document useful (0 votes)

10 views29 pages

Encoders

Autoencoders are neural networks designed to replicate input data by compressing it into a lower-dimensional representation and then reconstructing it. They consist of an encoder, a code, and a decoder, with the goal of producing an output identical to the input. Various types of autoencoders, including undercomplete, regularized, stochastic, denoising, and contractive, serve different purposes in feature extraction and robustness against noise.

Uploaded by

lakshmiaswithagundabathula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views29 pages

Encoders

Uploaded by

lakshmiaswithagundabathula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

AUTO ENCODERS

Dr. R. Shiva Shankar, Assistant Professor, Dept of CSE, SRKREC(A)

AUTO ENCODERS
 Autoencoder is a type of neural network where the output layer has the same dimensionality as
the input layer.

 In simpler words, the number of output units in the output layer is equal to the number of
input units in the input layer.

 An autoencoder replicates the data from the input to the output in an unsupervised manner and
is therefore sometimes referred to as a replicator neural network.

 The autoencoders reconstruct each dimension of the input by passing it through the network.

 It may seem trivial to use a neural network for the purpose of replicating the input, but during
the replication process, the size of the input is reduced into its smaller representation.

 The middle layers of the neural network have a fewer number of units as compared to that of
input or output layers.

 Therefore, the middle layers hold the reduced representation of the input. The output is
reconstructed from this reduced representation of the input.
Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)
Architecture of Auto Encoders
An autoencoder consists of three components:

 Encoder: An encoder is a feed forward, fully connected neural network that compresses the input
into a latent space representation and encodes the input image as a compressed representation in
a reduced dimension. The compressed image is the distorted version of the original image.

 Code: This part of the network contains the reduced representation of the input that is fed into the
decoder.

 Decoder: Decoder is also a feed forward network like the encoder and has a similar structure to
the encoder. This network is responsible for reconstructing the input back to the original
dimensions from the code.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 First, the input goes through the encoder where it is compressed and stored in the layer called
Code, then the decoder decompresses the original input from the code.

 The main objective of the autoencoder is to get an output identical to the input.

 Note that the decoder architecture is the mirror image of the encoder.

 This is not a requirement but it‟s typically the case.

 The only requirement is the dimensionality of the input and output must be the same.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Under Complete
 The simplest architecture for constructing an autoencoder is to constrain the number of nodes
present in the hidden layer(s) of the network, limiting the amount of information that can flow
through the network.

 By penalizing the network according to the reconstruction error, our model can learn the most
important attributes of the input data and how to best reconstruct the original input from an
"encoded" state. Ideally, this encoding will learn and describe latent attributes of the input data.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 Because neural networks are capable of learning nonlinear relationships, this can be thought of as
a more powerful (nonlinear) generalization of PCA.

 Whereas PCA attempts to discover a lower dimensional hyperplane which describes the original
data, autoencoders are capable of learning nonlinear manifolds (a manifold is defined in simple
terms as a continuous, non-intersecting surface).

 An undercomplete autoencoder has no explicit regularization term - we simply train our model
according to the reconstruction loss.

 Thus, our only way to ensure that the model isn't memorizing the input data is the ensure that
we've sufficiently restricted the number of nodes in the hidden layer(s).

 For deep autoencoders, we must also be aware of the capacity of our encoder and decoder models.

 Even if the "bottleneck layer" is only one hidden node, it's still possible for our model to memorize
the training data provided that the encoder and decoder models have sufficient capability to learn
some arbitrary function which can map the data to an index.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

The objective of under complete auto encoder is to capture the most important features
present in the data.

 Undercomplete autoencoders have a smaller dimension for hidden layer compared to the input
layer.
 This helps to obtain important features from the data. It minimizes the loss function by penalizing
the g(f(x)) for being different from the input x.

Advantages

 Undercomplete autoencoders do not need any regularization as they maximize the probability of
data rather than copying the input to the output.

Drawbacks

 Using an overparameterized model due to lack of sufficient training data can create overfitting.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Regularized
 Undercomplete autoencoders, with code dimension less than the input dimension, can learn the
most salient features of the data distribution.

 We have seen that these autoencoders fail to learn anything useful if the encoder and decoder are
given too much capacity.

 A similar problem occurs if the hidden code is allowed to have dimension equal to the input, and in
the overcomplete case in which the hidden code has dimension greater than the input.

 In these cases, even a linear encoder and a linear decoder can learn to copy the input to the
output without learning anything useful about the data distribution.

 Ideally, one could train any architecture of autoencoder successfully, choosing the code dimension
and the capacity of the encoder and decoder based on the complexity of distribution to be modeled.

 Regularized autoencoders provide the ability to do so.

 Rather than limiting the model capacity by keeping the encoder and decoder shallow and the code
size small, regularized auto encoders use a loss function that encourages the model to have other
properties besides the ability to copy its input to its output.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 These other properties include sparsity of the representation, smallness of the derivative of the
representation, and robustness to noise or to missing inputs.

 A regularized autoencoder can be nonlinear and overcomplete but still learn something useful
about the data distribution, even if the model capacity is great enough to learn a trivial identity
function.

 In addition to the methods described here, which are most naturally interpreted as regularized
autoencoders, nearly any generative model with latent variables and equipped with an inference
procedure (for computing latent representations given input) may be viewed as a particular form of
autoencoder.

 Two generative modeling approaches that emphasize this connection with auto encoders are the
descendants of the Helmholtz machine, such as the variational auto encoder and the generative
stochastic networks.

 These models naturally learn high-capacity, over complete encodings of the input and do not
require regularization for these encodings to be useful.

 Their encodings are naturally useful because the models were trained to approximately maximize
the probability of the training data rather than to copy the input to the output

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Stochastic

 Autoencoders are just feedforward networks.

 The same loss functions and output unit types that can be used for traditional feedforward
networks are also used for autoencoders

 A general strategy for designing the output units and the loss function of a feedforward network is
to define an output distribution p(y | x) and minimize the negative log-likelihood−log p(y | x).

 In that setting, y is a vector of targets, such as class labels

 In an autoencoder, x is now the target as well as the input.

 Given a hidden code h, we may think of the decoder as providing a conditional distribution
pdecoder(x | h).

 We may then train the autoencoder by minimizing −log pdecoder(x | h).

 The exact form of this loss function will change depending on the form of pdecoder.
Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)
 As with traditional feedforward networks, we usually use linear output units to parametrize the
mean of a Gaussian distribution if x is real valued.

 In that case, the negative log-likelihood yields a mean squared error criterion.

 Similarly, binary x values correspond to a Bernoulli distribution whose parameters are given by a
sigmoid output unit, discrete x values correspond to a softmax distribution, and so on.

 Typically, the output variables are treated as being conditionally independent given h so that this
probability distribution is inexpensive to evaluate, but some techniques, such as mixture density
outputs, allow tractable modeling of outputs with correlations.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 To make a more radical departure from the feedforward networks we have seen previously,
we can also generalize the notion of an encoding function f(x) to an encoding distribution
pencoder(h | x), as illustrated in figure

The structure of a stochastic autoencoder, in

which both the encoder and the decoder are
not simple functions but instead involve some
noise injection, meaning that their output can
be seen as sampled from a distribution,
pencoder(h | x) for the encoder and
pdecoder(x | h) for the decoder.

 Any latent variable model pmodel(h, x) defines a stochastic encoder

pencoder(h | x) = pmodel(h | x) and a stochastic decoder
pdecoder(x | h) = pmodel(x | h).

 In general, the encoder and decoder distributions are not necessarily conditional
distributions compatible with a unique joint distribution pmodel (x, h).

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Denoising
 Denoising autoencoders create a corrupted copy of the input by introducing some noise.

 This helps to avoid the autoencoders to copy the input to the output without learning features
about the data.

 These autoencoders take a partially corrupted input while training to recover the original
undistorted input.

 The model learns a vector field for mapping the input data towards a lower dimensional manifold
which describes the natural data to cancel out the added noise.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Advantages

 It was introduced to achieve good representation. Such a representation is one that can be
obtained robustly from a corrupted input and that will be useful for recovering the corresponding
clean input.
 Corruption of the input can be done randomly by making some of the input as zero. Remaining
nodes copy the input to the noised input.
 Minimizes the loss function between the output node and the corrupted input.
 Setting up a single-thread denoising autoencoder is easy.

Drawbacks

 To train an auto encoder to denoise data, it is necessary to perform preliminary stochastic

mapping in order to corrupt the data and use as input.
 This model isn't able to develop a mapping which memorizes the training data because our input
and target output are no longer the same.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Contractive
 The objective of a contractive autoencoder is to
have a robust learned representation which is less
sensitive to small variation in the data.

 Robustness of the representation for the data is

done by applying a penalty term to the loss
function.

 Contractive autoencoder is another regularization

technique just like sparse and denoising
autoencoders.

 However, this regularizer corresponds to the

Frobenius norm of the Jacobian matrix of the
encoder activations with respect to the input.

 Frobenius norm of the Jacobian matrix for the

hidden layer is calculated with respect to input
and it is basically the sum of square of all
elements.
Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)
Advantages

 Contractive autoencoder is a better choice than denoising autoencoder to learn useful

feature extraction.

 This model learns an encoding in which similar inputs have similar encodings.

 Hence, we're forcing the model to learn how to contract a neighborhood of inputs into a
smaller neighborhood of outputs

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Optimization for Deep Learning
 Optimizers are algorithms or methods used to change the attributes of your neural network such
as weights and learning rate in order to reduce the losses.

 Optimizers help to get results faster.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 How you should change your weights or learning rates of your neural network to reduce the
losses is defined by the optimizers you use.

 Optimization algorithms or strategies are responsible for reducing the losses and to provide the
most accurate results possible.
We‟ll learn about different types of optimizers and their advantages:

Gradient Descent

 It is the most basic but most used optimization algorithm. It‟s used heavily in linear regression
and classification algorithms. Backpropagation in neural networks also uses a gradient descent
algorithm.
 It is a first-order optimization algorithm which is dependent on the first order derivative of a loss
function.
 It calculates that which way the weights should be altered so that the function can reach a
minima.
 Through backpropagation, the loss is transferred from one layer to another and the model‟s
parameters also known as weights and they are modified depending on the losses so that the
loss can be minimized.
θ = θ−α⋅∇J(θ)
Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)
Advantages

 Easy computation.
 Easy to implement.
 Easy to understand.

Disadvantages

 May trap at local minima.

 Weights are changed after calculating gradient on the whole dataset. So, if the dataset is too
large than this may take years to converge to the minima.

 Requires large memory to calculate gradient on the whole dataset.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Back Propagation

 Backprop is an abbreviation for “backward propagation of errors”.

 It is used in conjunction with gradient descent which means the practical implementation of the
gradient computation.

 Backpropagation (BP) was an important finding in the history of neural networks.

 This method calculates the gradient loss function taking weights in the network into
account.

 This gradient is fed to the optimisation method, which updates the weights
of the existing ones to minimise the loss function.

 Backpropagation has been used to calculate the loss function and to do that it requires a known
output or the desired output for each input value.

 Backpropagation has found its applications in areas like classification problems, function
approximation, time-space approximation, time-series prediction, face recognition, ALVINN-
Enhancing training, etc.
Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)
Convergence Theorem

 A neural network model in which each neuron performs a threshold logic function, the model
always converges to a state of stability while operating in a serial mode and to a cycle of the
length of the two while operating in full parallel mode.

 So, there are mainly two types of convergence results for gradient descent. If all the iterates are
bounded, then GD with a proper constant step size converges.

 There are many types of convergence theorems, like perceptron convergence theorem — a multi-
layered convergence theorem also known as neural network;
 Convergence theory for deep learning via over-parameterisation and convergence results for neural
networks via electrodynamics etc.

Learning Rate

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 For a neural network optimisation, there are all but two goals, one is to converge faster, and the
other one is to improve a particular metric of interest.

 A faster method doesn‟t generalise better and doesn‟t really enhance the metric of interest, which is
different from optimisation loss.

 Due to this, one has to try optimising idea in order to improve the convergence speed and accept
that idea only if it passes a specific „performance check‟.

 The learning rate is a tuning parameter in an optimisation algorithm that is responsible for
determining the step size at each iteration while moving towards a minimum function loss.

 It represents the speed at which a machine learning model learns since it influences the amount of
old information which is vetoed by the newly acquired knowledge.

 The learning rate is denoted by η or α.

 A learning rate schedule also keeps changing the step size during learning and changes between
iterations/epochs, mainly done with two parameters: decay and momentum. The other types are
time-based, step-based and exponential.
Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)
Initialisation

 The initialisation is one of the significant tricks for training deep neural networks.
 Because of the exploding gradients or vanishing gradient regions, there exists a large portion of the
whole space, and initialising in these regions will fail an algorithm.
 Thus this makes it ideal for picking the initial point in an excellent region to start with.

Types of initialisation

 Naive initialisation: The suitable region to pick the initial point is unknown, so the first step is to
find a simple initial point.
 One choice is the all-zero initial point, and the other one is a sparse initial point that is only a
small portion of the weights which are non-zero or drawing weights end up forming certain random
distribution.

 LeCun initialisation and Xavier initialisation which is designed for sigmoid activation functions.

 Kaiming initialisation for ReLU activation.

 Layered-Sequential Unit-Variance (LSUV), which shows empirical benefits for some problems.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Normalisation

 Normalisation can be viewed as the extension to initialisation, so instead of merely modifying the
initial point, this method changes the network for all the next iterates that follow.

 Batch Normalisation is again a standard technique in today‟s time. It reduces the covariance shift.

 Covariance shift happens is when an algorithm has learned X to Y mapping, then if the distribution
of X changes, then the model has to be retrained.

 Another thing about BatchNorm can do is allowing each layer of a network to learn by itself
independent of other layers.

 The benefit of BatchNorm is to allow larger learning rate.

 The networks which do not have BatchNorm have larger isolated eigenvalues, while those with
BatchNorm have no such issues with isolated eigenvalues.

 BatchNorm, however, does not work very well with mini-batches which do not have similar
statistics because the mean/variance for each mini-batch is computed as an approximation of the
mean /variance for all samples.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Stochastic Gradient Descent

 It‟s a variant of Gradient Descent. It tries to update the model‟s parameters more frequently.
 In this, the model parameters are altered after computation of loss on each training example.
 So, if the dataset contains 1000 rows SGD will update the model parameters 1000 times in one
cycle of dataset instead of one time as in Gradient Descent.
θ=θ−α⋅∇J(θ;x(i);y(i)) , where {x(i) ,y(i)} are the training examples.

 As the model parameters are frequently updated parameters have high variance and fluctuations in
loss functions at different intensities.

Advantages
 Frequent updates of model parameters hence, converges in less time.
 Requires less memory as no need to store values of loss functions.
 May get new minima‟s.
Disadvantages
 High variance in model parameters.
 May shoot even after achieving global minima.
 To get the same convergence as gradient descent needs to slowly reduce the value of learning rate.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Momentum

 Momentum was invented for reducing high variance in SGD and softens the convergence.
 It accelerates the convergence towards the relevant direction and reduces the fluctuation to the
irrelevant direction.
 One more hyperparameter is used in this method known as momentum symbolized by ‘γ’.

V(t)=γV(t−1)+α.∇J(θ)

 Now, the weights are updated by θ=θ−V(t).

 The momentum term γ is usually set to 0.9 or a similar value.

Advantages

 Reduces the oscillations and high variance of the parameters.

 Converges faster than gradient descent.

Disadvantages

 One more hyper-parameter is added which needs to be selected manually and accurately.
Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)
Adam

 Adam (Adaptive Moment Estimation) works with momentums of first and second order.
 The intuition behind the Adam is that we don‟t want to roll so fast just because we can jump over
the minimum, we want to decrease the velocity a little bit for a careful search.

 In addition to storing an exponentially decaying average of past squared gradients like AdaDelta,
Adam also keeps an exponentially decaying average of past gradients M(t).
 M(t) and V(t) are values of the first moment which is the Mean and the second moment which
is the uncentered variance of the gradients respectively.

First and second order of momentum

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 Here, we are taking mean of M(t) and V(t) so that E[m(t)] can be equal to E[g(t)] where, E[f(x)] is
an expected value of f(x).

 To update the parameter:

 Update the parameters

 The values for β1 is 0.9 , 0.999 for β2, and (10 x exp(-8)) for ‘ϵ’.

Advantages

 The method is too fast and converges rapidly.

 Rectifies vanishing learning rate, high variance.

Disadvantages
 Computationally costly.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Thank You

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

DUnit IV
No ratings yet
DUnit IV
22 pages
Autoencoders and Generative Models Overview
No ratings yet
Autoencoders and Generative Models Overview
25 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
D5 PPT
No ratings yet
D5 PPT
79 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
Deep Learning 2
No ratings yet
Deep Learning 2
36 pages
Autoencoders: Neural Network Guide
No ratings yet
Autoencoders: Neural Network Guide
20 pages
Auto Encoder S
No ratings yet
Auto Encoder S
52 pages
Lecture 6373 07
No ratings yet
Lecture 6373 07
53 pages
Autoencoders: Types and Applications
No ratings yet
Autoencoders: Types and Applications
91 pages
Stochastic Autoencoders in Deep Learning
No ratings yet
Stochastic Autoencoders in Deep Learning
42 pages
03 Autoencoders 4
No ratings yet
03 Autoencoders 4
159 pages
Types of Autoencoders Explained
No ratings yet
Types of Autoencoders Explained
13 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
Unit 4
No ratings yet
Unit 4
10 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Experiment 4
No ratings yet
Experiment 4
26 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
Deep Learning 2
No ratings yet
Deep Learning 2
4 pages
DL - Module 3
No ratings yet
DL - Module 3
62 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
Unit 3
No ratings yet
Unit 3
39 pages
Module 4
No ratings yet
Module 4
53 pages
ML Lec 19 Autoencoder
No ratings yet
ML Lec 19 Autoencoder
54 pages
UNIT-5 Part1
No ratings yet
UNIT-5 Part1
15 pages
Unit 5 DL
No ratings yet
Unit 5 DL
16 pages
Autoencoders in Machine Learning
No ratings yet
Autoencoders in Machine Learning
7 pages
DL Module III Till IA-1
No ratings yet
DL Module III Till IA-1
15 pages
DL Unit 4
No ratings yet
DL Unit 4
21 pages
Autoencoders U
No ratings yet
Autoencoders U
44 pages
Module 3 DL
No ratings yet
Module 3 DL
103 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
Brief Introduction On Current Research Areas - Autoencoders
No ratings yet
Brief Introduction On Current Research Areas - Autoencoders
20 pages
Module 4
No ratings yet
Module 4
10 pages
Unit 4
No ratings yet
Unit 4
11 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
Unit-V DL
No ratings yet
Unit-V DL
31 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
Unit II
No ratings yet
Unit II
35 pages
AAI Module 3
No ratings yet
AAI Module 3
11 pages
Gen AI Unit 2
100% (1)
Gen AI Unit 2
65 pages
DL Unit 2B
No ratings yet
DL Unit 2B
23 pages
Auto Encoders
No ratings yet
Auto Encoders
4 pages
DL Unit - 4
No ratings yet
DL Unit - 4
26 pages
Mod 5
No ratings yet
Mod 5
52 pages
Unit4 1
No ratings yet
Unit4 1
42 pages
L23 Autoencoders
No ratings yet
L23 Autoencoders
16 pages
Unit V
No ratings yet
Unit V
20 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
Unit 5
No ratings yet
Unit 5
27 pages
Unit 3
No ratings yet
Unit 3
23 pages
Autoencoders - Buffalo University
100% (1)
Autoencoders - Buffalo University
36 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
Autoencoders: K G Atram
No ratings yet
Autoencoders: K G Atram
15 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
MCQ's of Data Mining CIT-661 Part 1 - Prepared by GCUF Guiders
No ratings yet
MCQ's of Data Mining CIT-661 Part 1 - Prepared by GCUF Guiders
9 pages
Vi Ai & Ds Ccs338 CV QB Unit3
No ratings yet
Vi Ai & Ds Ccs338 CV QB Unit3
5 pages
CNN Detailing
No ratings yet
CNN Detailing
81 pages
IEEE Rice Leaf Detection
No ratings yet
IEEE Rice Leaf Detection
5 pages
Introduction To Large Language Models (LLMS) - Quiz - Week 4 - NOV25
No ratings yet
Introduction To Large Language Models (LLMS) - Quiz - Week 4 - NOV25
3 pages
22a91a05e7 DLT Record
No ratings yet
22a91a05e7 DLT Record
27 pages
Section 1 - Mathematical Foundations & Core Theory For Dog Behavior Detection From Video
No ratings yet
Section 1 - Mathematical Foundations & Core Theory For Dog Behavior Detection From Video
33 pages
AI Law and Justice 150 MCQ Exam
No ratings yet
AI Law and Justice 150 MCQ Exam
22 pages
Monika Heart Disease Prediction Research Paper
No ratings yet
Monika Heart Disease Prediction Research Paper
12 pages
Seq Track
No ratings yet
Seq Track
10 pages
AI Chatbot RAG LLaMA ChromaDB Scribd Quality
No ratings yet
AI Chatbot RAG LLaMA ChromaDB Scribd Quality
3 pages
Bigger Is Not Always Better: Scaling Properties of Latent Diffusion Models
No ratings yet
Bigger Is Not Always Better: Scaling Properties of Latent Diffusion Models
27 pages
Cascading Autoencoder With Attention Residual U-Net For Multi-Class Plant Leaf Disease Segmentation and Classification
No ratings yet
Cascading Autoencoder With Attention Residual U-Net For Multi-Class Plant Leaf Disease Segmentation and Classification
18 pages
Journal Q1 - AI Based ICD Coding and Classification Approaches Using Discharge
No ratings yet
Journal Q1 - AI Based ICD Coding and Classification Approaches Using Discharge
18 pages
Leaf Disease Detection Diagrams
No ratings yet
Leaf Disease Detection Diagrams
12 pages
Model (Aiml) A
No ratings yet
Model (Aiml) A
3 pages
Advance Concepts of Modeling in AI Class 10 Questions and Answers
No ratings yet
Advance Concepts of Modeling in AI Class 10 Questions and Answers
7 pages
Intro To AI
No ratings yet
Intro To AI
3 pages
Unit 2
No ratings yet
Unit 2
34 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Handbook of Pattern Recognition and Computer Vision 6th Edition C H Chen Read Instantly
No ratings yet
Handbook of Pattern Recognition and Computer Vision 6th Edition C H Chen Read Instantly
160 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
2 pages
Home Work SAC PPO and DDPG Reinforcement Learning
No ratings yet
Home Work SAC PPO and DDPG Reinforcement Learning
10 pages
Deep Learning For Traffic Scene Understanding A Re
No ratings yet
Deep Learning For Traffic Scene Understanding A Re
51 pages
Palm Leaf Health Management A Hybrid Approach For Automated Disease Detection and Therapy Enhancement
No ratings yet
Palm Leaf Health Management A Hybrid Approach For Automated Disease Detection and Therapy Enhancement
15 pages
hw9 Sol
No ratings yet
hw9 Sol
5 pages
Factor Analysis & Cluster Analysis - SRM WORKSHOP
No ratings yet
Factor Analysis & Cluster Analysis - SRM WORKSHOP
33 pages
35 Information Extraction 19-11-2024
No ratings yet
35 Information Extraction 19-11-2024
4 pages
Offline Actor-Critic Reinforcement Learning Scales To Large Models
No ratings yet
Offline Actor-Critic Reinforcement Learning Scales To Large Models
25 pages
Facial Emotion Recognition For University Students Using CNN Transforming Learning Environment
No ratings yet
Facial Emotion Recognition For University Students Using CNN Transforming Learning Environment
6 pages

Encoders

Uploaded by

Encoders

Uploaded by

AUTO ENCODERS

Dr. R. Shiva Shankar, Assistant Professor, Dept of CSE, SRKREC(A)

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 This is not a requirement but it‟s typically the case.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 Regularized autoencoders provide the ability to do so.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 Autoencoders are just feedforward networks.

 In that setting, y is a vector of targets, such as class labels

 In an autoencoder, x is now the target as well as the input.

 We may then train the autoencoder by minimizing −log pdecoder(x | h).

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

The structure of a stochastic autoencoder, in

 Any latent variable model pmodel(h, x) defines a stochastic encoder

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 To train an auto encoder to denoise data, it is necessary to perform preliminary stochastic

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 Robustness of the representation for the data is

 Contractive autoencoder is another regularization

 However, this regularizer corresponds to the

 Frobenius norm of the Jacobian matrix for the

 Contractive autoencoder is a better choice than denoising autoencoder to learn useful

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 Optimizers help to get results faster.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 May trap at local minima.

 Requires large memory to calculate gradient on the whole dataset.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 Backprop is an abbreviation for “backward propagation of errors”.

 Backpropagation (BP) was an important finding in the history of neural networks.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 The learning rate is denoted by η or α.

 Kaiming initialisation for ReLU activation.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 The benefit of BatchNorm is to allow larger learning rate.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 Now, the weights are updated by θ=θ−V(t).

 Reduces the oscillations and high variance of the parameters.

First and second order of momentum

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

 To update the parameter:

 Update the parameters

 The method is too fast and converges rapidly.

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

Dr. R. Shiva Shankar, Dept of CSE, SRKREC(A)

You might also like