7.
VARIATIONAL AUTOENCODERS
The variational autoencoder or VAE is a directed model that uses
learned approximate inference and can be trained purely with gradient-based
methods.
To generate a sample from the model, the VAE first draws a sample z from
the code distribution pmodel (z).
The sample is then run through a differentiable generator network g(z ).
Finally, x is sampled from a distribution pmodel (x; g (z)) = pmodel (x |
z).
However, during training, the approximate inference network (or encoder)
q(z | x) is used to obtain z and pmodel(x | z) is then viewed as a decoder
network.
The key insight behind variational autoencoders is that they may be
trained by maximizing the variational lower bound L(q) associated with
data point x:
L(q) = Ez∼q(z|x) log pmodel(z, x) + H(q(z (1)
| x)) (2)
= Ez∼q(z|x) log pmodel(x | z) − DKL(q(z | x)||
pmodel(z))
≤ log p model(x). (3)
When q is chosen to be a Gaussian distribution, with noise added to a
predicted mean value, maximizing this entropy term encourages increasing the
standard deviation of this noise.
More generally, this entropy term encourages the variational posterior to place
high probability mass on many z values that could have generated x.
The second term tries to make the approximate posterior distribution q(z |
x) and the model prior pmodel(z) approach each other.
1
Traditional approaches to variational inference and learning infer q via an
opti- mization algorithm.
These approaches are slow and often require the ability to compute Ez∼q
log p model(z, x) in closed form.
The main idea behind the variational autoencoder is to train a parametric
encoder (also sometimes called an inference network or recognition
model) that produces the parameters of q.
So long as z is a continuous variable, we can then back-propagate through
samples of z drawn from q(z | x) = q (z; f(x; θ)) in order to obtain a
gradient with respect to θ.
Learning then consists solely of maximizing L with respect to the
parameters of the encoder and decoder.
All of the expectations in L may be approximated by Monte Carlo
sampling.
The variational autoencoder approach is elegant, theoretically pleasing,
and simple to implement.
It also obtains excellent results and is among the state of the art approaches
to generative modeling.
Its main drawback is that samples from variational autoencoders trained on
images tend to be somewhat blurry.
The causes of this phenomenon are not yet known. One possibility is that
the blurriness is an intrinsic effect of maximum likelihood, which minimizes
DKL(pdatapmodel).
2
VAE Framework
The VAE framework is very straightforward to extend to a wide range of
model architectures.
This is a key advantage over Boltzmann machines, which require extremely
careful model design to maintain tractability.
VAEs work very well with a diverse family of differentiable operators.
One particularly sophisticated VAE is the deep recurrent attention
writer or DRAW model.
DRAW uses a recurrent encoder and recurrent decoder combined with an
attention mechanism.
The generation process for the DRAW model consists of sequentially
visiting different small image patches and drawing the values of the pixels at
those points.
VAEs can also be extended to generate sequences by defining variational
RNNs by using a recurrent encoder and decoder within the VAE framework.
Generating a sample from a traditional RNN involves only non-deterministic
operations at the output space.
Variational RNNs also have random variability at the potentially more
abstract level captured by the VAE latent variables.
3
4