Representation Learning
and Latent Variable Models
ISOMAP: Embedding by Local Structure
• Construct neighborhood graph
k-nearest neighbor graph or 𝜺-neighborhood graph
• Compute shortest-path distances
Floyd-Warshall algorithm or Dijkstra
• MDS or Stress Majorization
Omit for time:
Already in our toolbox! Locally-linear embedding
(LLE)
Tenenbaum, de Silva, Langford.
“A Global Geometric Framework for Nonlinear Dimensionality Reduction.” Science (2000).
NLP: Popular Pretext Task
• Masked language modeling across large corpora helps follow-on tasks
• Some parts stripped before reusing, rest can be fine-tuned to new task
Today
“Swiss roll” “Two moons”
Can we learn useful features directly from the data itself?
...and use them for generative modeling?
Focus: Latent Variable Models
Big idea:
An unknown (simple/low-dimensional) latent variable is controlling the generation of an observable variable.
Typical setting: Probabilistic model explaining how a dataset was generated.
Simple example:
Gaussian mixture models (GMMs)
https://www.blog.dailydoseofds.com/p/gaussian-mixture-models-the-flexible
Modern Latent Variable Models
Today
Lillian Weng’s blog
Plan for Today
• Autoencoders
• (Slightly) new neural network architecture
• New loss function
• Variational autoencoders
• Autoencoders with some noise
• Alternatives
• Other latent variable models
Credit
Many images borrowed from
“Understanding Variational Autoencoders (VAEs)” (Rocca)
https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73
Math simplified from
“An Introduction to Variational Autoencoders” (Kingma and Welling)
https://arxiv.org/pdf/1906.02691.pdf
Plan for Today
• Autoencoders
• (Slightly) new neural network architecture
• New loss function
• Variational autoencoders
• Autoencoders with some noise
• Alternatives
• Other latent variable models
Autoencoders
“Bottleneck”
On the board:
• PCA as a special case
• Latent dimension:
Effect
https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73
Memorization in Autoencoders
https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73
Latent Structure in Autoencoders
https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73
Plan for Today
• Autoencoders
• (Slightly) new neural network architecture
• New loss function
• Variational autoencoders
• Autoencoders with some noise
• Alternatives
• Other latent variable models
Autoencoders for Sampling?
https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73
VAE: Big Idea
Wiggle around in the latent space before reconstructing!
https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73
Balance Two Terms
Averaged over 𝒙 from dataset
Probability Review
On the Board: Probabilistic Story
Rough outline:
• Decoder probabilistic model
• Maximum likelihood estimation
• ELBO bound
• “Reparameterization trick”
• Back where we started
Plan for Today
• Autoencoders
• (Slightly) new neural network architecture
• New loss function
• Variational autoencoders
• Autoencoders with some noise
• Alternatives
• Other latent variable models
Many Alternatives
Representation Learning
and Latent Variable Models