STUDENT HANDOUT - 7
Variational Autoencoders (VAEs)
Topic Name: Variational Autoencoders (VAEs)
Includes: Autoencoder Review • VAE Architecture (Encoder, Latent Space, Decoder) •
Reparameterization Trick • VAE vs. GAN
🤖 Beyond Generation: Learning Meaningful Latent Spaces with VAEs
You've explored GANs, which are powerful for generating realistic images, and Transformer
Decoders for text generation. Now, let's introduce another fundamental generative model:
Variational Autoencoders (VAEs). While GANs excel at creating highly realistic samples, VAEs are
particularly good at learning a structured, continuous, and interpretable "latent space" of the data,
which enables more controlled and diverse generation.
🔍 A Quick Review: Autoencoders
Before diving into VAEs, let's briefly recall Autoencoders (AEs). An Autoencoder is a type of neural
network designed to learn efficient data codings (representations) in an unsupervised manner. It
consists of two main parts:
● Encoder: Maps input data (e.g., an image) to a lower-dimensional representation called the
latent space or bottleneck layer.
● Decoder: Takes the latent space representation and reconstructs the original input data.
The goal of a standard Autoencoder is to learn a compressed representation such that the
reconstructed output is as close as possible to the original input.
📌 Variational Autoencoder (VAE) Architecture
VAEs extend the concept of Autoencoders by introducing a probabilistic approach to the latent
space. Instead of mapping an input to a single point in the latent space, the encoder of a VAE maps it
to a probability distribution (specifically, a Gaussian distribution defined by its mean and
variance). This probabilistic encoding is what makes VAEs generative and allows for smooth
interpolation and sampling in the latent space.
The VAE architecture comprises:
1. Encoder (q(z∣x) - "Recognition Model"):
○ Takes an input x (e.g., an image).
○ Outputs two vectors:
■ Mean (μ): Represents the central point of the latent distribution.
■ Log Variance (logσ2): Represents the spread of the latent distribution (log
variance is often used for numerical stability).
○ Instead of directly outputting a latent vector z, the encoder outputs the parameters
of a distribution from which z is sampled.
2. Latent Space (z):
○ A vector z is sampled from the probability distribution defined by the mean and
variance outputs from the encoder. This sampling step is crucial for the generative
aspect and ensures the latent space is continuous.
3. Decoder (p(x∣z) - "Generative Model"):
○ Takes a sampled latent vector z as input.
○ Reconstructs the original data x' (e.g., an image) from this latent vector.
○ Its goal is to learn to map points in the latent space back to realistic data samples.
The Training Objective (Loss Function):
VAEs are trained to optimize a special loss function called the Evidence Lower Bound (ELBO),
which has two main components:
1. Reconstruction Loss: Measures how well the decoder reconstructs the original input
(similar to a standard Autoencoder). This pushes the decoder to generate realistic outputs.
2. KL Divergence Loss: Measures the difference between the latent distribution output by the
encoder (defined by μ and logσ2) and a simple prior distribution (usually a standard
normal distribution). This term acts as a regularizer, forcing the latent space to be well-
structured and continuous, preventing "dead" regions and encouraging similar inputs to
cluster closely in the latent space.
📌 The Reparameterization Trick
A challenge in training VAEs is that the sampling operation (z∼N(μ,σ2)) is not differentiable, which
makes backpropagation impossible. The Reparameterization Trick solves this problem:
Instead of directly sampling z from N(μ,σ2), we sample a noise vector ϵ from a simple standard
normal distribution N(0,1) and then compute z as:
z=μ+σ⋅ϵ
This trick moves the stochastic (random) part outside the neural network, allowing the gradients to
flow through μ and σ from the reconstruction loss back to the encoder.
📌 VAE vs. GAN
While both VAEs and GANs are powerful generative models, they have different strengths and
characteristics:
💻 Practical Applications of VAEs
● Controlled Image Generation: By manipulating specific dimensions in the latent space,
you can control attributes of generated images (e.g., generating faces with different
hairstyles, expressions, or age).
● Image Reconstruction & Denoising: VAEs can learn to reconstruct clean images from
noisy inputs.
● Anomaly Detection: Data points that fall into low-density regions of the learned latent
space can be identified as anomalies.
● Data Imputation: Filling in missing values in datasets.
● Drug Discovery: Generating novel molecular structures with desired properties
● Transfer Learning: Pre-trained VAE encoders can provide useful features for downstream
tasks.
🧾 Key Takeaways
● VAEs are generative models that learn a probabilistic mapping from input data to a
continuous latent space.
● The Encoder maps input to parameters (μ, σ2) of a latent distribution, and the Decoder
reconstructs data from samples (z) from this distribution.
● The Reparameterization Trick allows backpropagation through the sampling process.
● VAEs excel at learning structured and interpretable latent spaces, enabling controlled
generation and interpolation, which differentiates them from GANs.
● While GANs might produce sharper images, VAEs offer better generative control and
training stability.