0% found this document useful (0 votes)
20 views57 pages

Module 2 Gen

Variational Autoencoders (VAEs) are a type of neural network that learn efficient data representations by outputting probability distributions in latent space rather than single values. They are trained to minimize reconstruction loss while regularizing the latent space, allowing for the generation of new samples. However, VAEs have limitations such as generating blurry images, difficulties with high-resolution data, and challenges in capturing distinct variations.

Uploaded by

girik11004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views57 pages

Module 2 Gen

Variational Autoencoders (VAEs) are a type of neural network that learn efficient data representations by outputting probability distributions in latent space rather than single values. They are trained to minimize reconstruction loss while regularizing the latent space, allowing for the generation of new samples. However, VAEs have limitations such as generating blurry images, difficulties with high-resolution data, and challenges in capturing distinct variations.

Uploaded by

girik11004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Variational Autoencoders(VAEs)

Module -2
What is an Autoencoder?
An autoencoder is a type of artificial neural network used to learn efficient
representations of data, typically for the purpose of dimensionality reduction, data
compression, or unsupervised learning. Autoencoders are unsupervised learning
models because they don’t require labeled data; instead, they rely on reconstructing
their input data.
Introduction to
Autoencoders
Introduction to
Autoencoders
Autoencoders: The Sneaky Idea
Necessary conditions to learn a
representation

• Data should have dependencies across dimensions


• If dimensions are all independent, then it is impossible to learn lower dimensional
representation.
PCA vs Encoders

• Both Perform Dimensionality Reduction.


• PCA learns linear Relationships.
• Encoders can learn Non-linear Relationships.
• Encoders = PCA if it uses linear activation functions.
PCA vs Encoders
Decoders = Decompress representation back to original Domain
How can we train an Autoencoder?
• Backpropagation
• Minimize reconstruction error.
What we ask an Autoencoder?
• Sensitive enough to input data to reconstruct it.
• Insensitive enough to input data not to overfit it.
Deep Autoencoder
Deep Convolutional Autoencoder

• Similar architecture to AE
• Convolutional Layers
• Encoder: Convolution + Leaky ReLU + Batch Normalization
• Decoder: Convolution transpose + Leaky ReLU + Batch Normalization
Autoencoder Applications

• Generation
• Denoising
• Anomaly Detection
Generation with AEs
Generation with AEs
Generation with AEs
Generation with AEs
Denoising with AEs
Generative AI models works with different data
Variational Autoencoders
What is a Variational Autoencoder?
• Variational autoencoder was proposed in 2013 by Diederik P. Kingma
and Max Welling at Google and Qualcomm.
• A variational autoencoder (VAE) provides a probabilistic manner for
describing an observation in latent space. Thus, rather than building an
encoder that outputs a single value to describe each latent state attribute,
we’ll formulate our encoder to describe a probability distribution for each
latent attribute.
• It has many applications, such as data compression, synthetic data
creation, etc.
Variational Autoencoders

• Variational autoencoder is different from an autoencoder in a way that it


provides a statistical manner for describing the samples of the dataset in
latent space.
• In the variational autoencoder, the encoder outputs a probability
distribution in the bottleneck layer instead of a single output value.
Architecture of Variational Autoencoders
Architecture of Variational Autoencoders
• The encoder-decoder architecture lies at the heart of Variational Autoencoders (VAEs),
distinguishing them from traditional autoencoders. The encoder network takes raw
input data and transforms it into a probability distribution within the latent space.
• The latent code generated by the encoder is a probabilistic encoding, allowing the
VAE to express not just a single point in the latent space but a distribution of potential
representations.
• The decoder network, in turn, takes a sampled point from the latent distribution and
reconstructs it back into data space.
• During training, the model refines both the encoder and decoder parameters to
minimize the reconstruction loss – the disparity between the input data and the
decoded output. The goal is not just to achieve accurate reconstruction but also to
regularize the latent space, ensuring that it confirms to a specified distribution.
Architecture of Variational Autoencoders
• The process involves a delicate balance between two essential components: the
reconstruction loss and the regularization term, often represented by the Kullback-
Leibler divergence.
• The reconstruction loss compels the model to accurately reconstruct the input, while
the regularization term encourages the latent space to adhere to the chosen
distribution, preventing overfitting and promoting generalization.
• By iteratively adjusting these parameters during training, the VAE learns to encode
input data into a meaningful latent space representation.
• This optimized latent code encapsulates the underlying features and structures of
the data, facilitating precise reconstruction. The probabilistic nature of the latent
space also enables the generation of novel samples by drawing random points from
the learned distribution.
Mathematics behind Variational Autoencoders
• Variational autoencoder uses KL-divergence as its loss function, the goal of this is
to minimize the difference between a supposed distribution and original distribution
of dataset.
• Suppose we have a distribution z and we want to generate the observation x from it.
In other words, we want to calculate .
• We can do it as
But, the calculation of can be quite difficult.

This usually makes it an intractable distribution. Hence, we need to approximate to


to make it a tractable distribution.
Mathematics behind Variational Autoencoders
• To better approximate to we will minimize the KL-divergence loss which calculates
how similar two distributions are:

By simplifying, the above minimization problem is equivalent to the following


maximization problem :

The first term represents the reconstruction likelihood and the other term ensures that
our learned distribution is similar to the true prior distribution .
Thus, our total loss consists of two terms, one is reconstruction error and other is KL-
divergence loss:
Limitations of VAEs
Limitation Explanation
Blurry Image Generation VAEs generate blurry images because they optimize a likelihood-based
reconstruction loss, which often leads to averaging pixel values. The
output images lack sharp details.
Trade-off Between The KL divergence term forces the latent space to follow a specific prior
Reconstruction Quality distribution (e.g., Gaussian), which can reduce the quality of
and Latent Space reconstructions by making them less specific.
Regularization
Mode Averaging VAEs tend to average different data modes instead of capturing distinct
variations, resulting in unrealistic or less diverse outputs.
Poorly Structured Latent For highly complex data (e.g., realistic human faces), VAEs struggle to
Space for Complex Data capture meaningful latent representations due to their probabilistic
nature.
Inefficient Sampling Since VAEs rely on a fixed prior (like a Gaussian), samples from the latent
space may not align well with real-world data distributions, leading to
unrealistic generations.
KL Divergence Instability Tuning the weight of the KL divergence term is tricky. Too much
regularization leads to poor reconstructions, while too little reduces
generative diversity.
Overly Smoothed VAEs encourage smooth and continuous latent spaces, which can lead to
Limitations of VAEs
Limitation Explanation
Difficulties in High- Standard VAEs struggle with generating high-resolution images due to
Resolution Image the inherent limitations of their decoder structure.
Synthesis
Difficulty Handling VAEs perform well on continuous data but face challenges when working
Discrete Data with discrete data (e.g., text generation, categorical data) because of
the reparameterization trick limitations.
Lack of Sharp Latent The latent variables in VAEs often encode blurry or ambiguous
Representations representations due to the variational posterior approximation.
Higher Computational The extra KL divergence computation and sampling steps make VAEs
Cost than Standard computationally more expensive than standard autoencoders.
Autoencoders
Variability in Training While VAEs are generally more stable than GANs, fine-tuning their loss
Stability balance requires careful tuning of hyperparameters.
Suboptimal Although VAEs enforce structured representations, they might not
Representation Learning always capture the most meaningful features for downstream tasks like
classification.
Unclear Evaluation Evaluating VAEs is challenging because likelihood-based metrics don’t
Metrics always correlate well with perceptual quality.
Need for Post-Processing Often, extra steps like adversarial training or post-processing techniques
Why to choose GANs over VAEs
Reason to choose
Explanation
GANs
High-Quality, Sharp GANs use adversarial loss, which encourages the generator to produce
Images high-quality, sharp images rather than blurry reconstructions.

No Explicit Latent Space GANs do not enforce a predefined prior on the latent space, allowing for
Regularization more flexible and diverse data generation.

Better Mode Coverage Advanced GAN variants (e.g., WGAN, BigGAN) improve mode collapse
issues and capture multiple data modes effectively.
More Realistic GANs directly learn the data distribution and refine outputs using the
Generations discriminator, resulting in more realistic images.
Strong Performance in Models like StyleGAN can generate ultra-high-resolution, photorealistic
High-Resolution Image images.
Generation
Better Suitability for GANs are widely used for style transfer, face aging, and super-
Image Translation & resolution tasks where VAEs struggle.
Super-Resolution
More Flexible Latent Unlike VAEs, GANs do not enforce smooth distributions, allowing for
Representations sharper and more structured feature representations.
Works Well with Discrete GANs have been successfully adapted for text generation (e.g.,
Why to choose GANs over VAEs
Reason to choose
Explanation
GANs
More Popular in Creative GANs are used extensively in AI art, deepfake generation, and media
Applications content creation due to their high fidelity.

No Need for KL GANs do not suffer from the KL divergence balancing problem, making
Divergence Optimization them more effective in capturing fine details.

You might also like