Official Open
Visual Generative
AI Application
Variational Autoencoder (VAE)
Week 2
AY 24/25
S PEC IALIST D IPLOMA IN APPLIE D GE NE RATIV E AI ( S D GAI)
Official Open
Official Open
Official Open
Objectives
• By the end of this module, learners will be able to
• Describe the concepts of variational autoencoder
• Train VAE model to generate output
Official Open
Data Distribution and Images
What makes a picture of a cat looks like a cat?
• Each “datapoint” (image) has thousands or millions of dimensions
(pixels).
• There are dependencies between pixels, e.g., that nearby pixels
have similar color, and are organized into objects
• Generative model needs to “capture” these dependencies.
Data
distribution
Official Open
Generative Model Formulation
Given
Data
distribution
We want to generate
new samples that are
“similar” to the given
data distribution.
Some
Generated
initial
Generative Model Data
conditions/
distribution
distribution
Official Open
Taxonomy of Generative Models
Choices
• Explicit density
Generative
Models estimation:
• explicitly define and solve
generated data distribution
Explicit Implicit • Implicit density
Density Density
estimation:
• learn model that can
Generative
sample from generated
Approximate Tractable
Adversarial Markov Chain data distribution without
Density Density
Networks explicitly defining it
Restricted Fully Visible
Variational
Boltzmann Belief Nets
Autoencoders
Machines (PixelRNNs)
Official Open
Autoencoder
• The autoencoders’ main components Latent Representation (Intuition)
• an encoder, • Learning how to write numbers does
• a latent feature representation, and not require us to learn the gray values
• a decoder. of each pixel in the input image.
• We want the autoencoder to • Instead, we extract the essential
reconstruct the input well. Still, at the information that will allow us to solve
same time, it should create a latent the problem.
representation that is useful and • Latent representation (how to write
meaningful. each number) is very useful for various
tasks such as understanding the
essential features of a dataset.
Official Open
Unsupervised Representation Learning
• Unsupervised learning approach to learn a lower dimensional
feature representation from unlabelled training data (via an
encoder)
Latent feature space usually has a lower
dimension than that of the input data Latent Feature
(Dimensional Reduction)
Encoder
Input Data
Official Open
Autoencoder to Construct Input Data
• Decoder reconstruct data from
latent features. (Reconstructed) Input Data
• Latent features captures
factors of variation in training
data. Decoder
• It is important to ensure that
the feature in latent space are Latent Feature
trained such that it can
reconstruct the original data Encoder
(via the decoder)
Input Data
Official Open
Performance Loss Function
• L2 loss function compares the differences
between the input data and the
reconstructed data.
L2 Loss Function
(Reconstructed) Input Data
No labels used.
Train such that features
can be used to Decoder
reconstruct original data
Latent Feature
Encoder
Input Data
Official Open
Introducing Variational Autoencoders
• Autoencoders are not generative
models and have issues with
choosing latent dimension.
• Variational autoencoders consider
samples from the latent space to
generate data.
(Reconstructed) Input Data
Decoder
Latent Feature
Encoder
Input Data
Diederik P. Kingma and Max Welling (2019), “An Introduction to Variational Autoencoders”, Foundations and Trends in Machine Learning
Official Open
Main idea behind Variational AutoEncoder
Some prior Distribution of
• Assume distribution of latent space latent space
representation as Gaussian.
• “Sample” from the latent space Assume
distribution based on a given
distribution parameter (mean, Gaussian Distribution
variance) - Mean
- Covariance
Sample
• Decoder generates an output based
on this sampled latent space. Samples of
Estimates
Distribution
• The encoder attempts to estimate
the parameter of the (assumed)
distribution
Encoder Decoder
Official Open
Training a Variational Autoencoder
Image
• For each minibatch of input
data, compute the forward
Encoder
Estimate mean and
pass and the covariance
backpropagation to update
the weights
Backpropagation
Forward Pass
Sampled Latent
Space
Construct image
decoder
based on sampled
parameters
Output Image
Official Open
Generation with Variational Autoencoder
Dimension 1 of Sampled Space
Sampled Latent
Space
Construct image
based on sampled
parameters
Output Image
Dimension 2 of Sampled Space
Different dimensions of z or the (sampled) latent space encodes interpretable variations of the
images (e.g. different digits of different degree of smile Vs Head pose)
King and Welling, “Autoencoding variational Bayes”, ICLR 2014
Official Open
Practical 2
Variational Autoencoders
Official Open
Official Open
Official Open
Official Open
Official Open
Official Open