Autoencoders
1
Introduction [1]
Autoencoders (AE) are neural networks that aims
to copy their inputs to their outputs.
2
Introduction [1]
Autoencoders (AE) are neural networks that aims
to copy their inputs to their outputs.
They work by compressing the input into a latent-
space representation, and then reconstructing the
output from this representation.
3
Introduction [1]
This kind of network is composed of two parts:
4
Introduction [1]
This kind of network is composed of two parts:
Encoder: This is the part of the network that compresses the input
into a latent-space representation. It can be represented by an
encoding function h=f(x).
5
Introduction [1]
This kind of network is composed of two parts:
Encoder: This is the part of the network that compresses the input
into a latent-space representation. It can be represented by an
encoding function h=f(x).
Decoder: This part aims to reconstruct the input from the latent
space representation. It can be represented by a decoding function
r=g(h).
6
Introduction [1]
This kind of network is composed of two parts:
Encoder: This is the part of the network that compresses the input
into a latent-space representation. It can be represented by an
encoding function h=f(x).
Decoder: This part aims to reconstruct the input from the latent
space representation. It can be represented by a decoding function
r=g(h).
The autoencoder as a whole can thus be described by the function 7
g(f(x)) = r where you want r as close as the original input x.
Why copying the input to the output ? [1]
If the only purpose of autoencoders was to copy
the input to the output, they would be useless.
8
Why copying the input to the output ? [1]
If the only purpose of autoencoders was to copy
the input to the output, they would be useless.
Indeed, we hope that, by training the autoencoder
to copy the input to the output, the latent
representation h will take on useful properties.
9
Why copying the input to the output ? [1]
This can be achieved by creating constraints on
the copying task.
10
Why copying the input to the output ? [1]
This can be achieved by creating constraints on
the copying task.
One way to obtain useful features from the
autoencoder is to constrain h to have smaller
dimensions than x, in this case the autoencoder is
called undercomplete.
11
Why copying the input to the output ? [1]
This can be achieved by creating constraints on
the copying task.
One way to obtain useful features from the
autoencoder is to constrain h to have smaller
dimensions than x, in this case the autoencoder is
called undercomplete.
By training an undercomplete representation, we
force the autoencoder to learn the most salient
features of the training data.
12
Why copying the input to the output ? [1]
This can be achieved by creating constraints on
the copying task.
One way to obtain useful features from the
autoencoder is to constrain h to have smaller
dimensions than x, in this case the autoencoder is
called undercomplete.
By training an undercomplete representation, we
force the autoencoder to learn the most salient
features of the training data.
If the autoencoder is given too much capacity, it
can learn to perform the copying task without
extracting any useful information about the
distribution of the data. 13
Why copying the input to the output ? [1]
This can also occur if the dimension of the latent
representation is the same as the input, and in the
overcomplete case, where the dimension of the
latent representation is greater than the input.
14
Why copying the input to the output ? [1]
This can also occur if the dimension of the latent
representation is the same as the input, and in the
overcomplete case, where the dimension of the
latent representation is greater than the input.
In these cases, even a linear encoder and linear
decoder can learn to copy the input to the output
without learning anything useful about the data
distribution.
15
Why copying the input to the output ? [1]
This can also occur if the dimension of the latent
representation is the same as the input, and in the
overcomplete case, where the dimension of the
latent representation is greater than the input.
In these cases, even a linear encoder and linear
decoder can learn to copy the input to the output
without learning anything useful about the data
distribution.
Ideally, one could train any architecture of
autoencoder successfully, choosing the code
dimension and the capacity of the encoder and
decoder based on the complexity of distribution to
be modelled. 16
Types of Autoencoders [1]
Vanilla autoencoder
Multilayer autoencoder
Convolutional autoencoder
Regularized autoencoder
17
Types of Autoencoders [1]
Vanilla autoencoder
In its simplest form, the autoencoder is a three layers
net, i.e. a neural net with one hidden layer.
The input and output are the same, and we learn how to
reconstruct the input, for example using the adam
optimizer and the mean squared error loss function.
18
Types of Autoencoders [1]
Vanilla autoencoder
autoencoder.fit(x_train, x_train, epochs=5)
reconstructed = autoencoder.predict(x_test)
19
Types of Autoencoders [2]
Vanilla autoencoder
Practical Advise:
We have total control over the architecture of the
autoencoder.
We can make it very powerful by increasing the number of
layers, nodes per layer and most importantly the code size.
Increasing these hyperparameters will let the autoencoder
to learn more complex codings.
But we should be careful to not make it too powerful.
Otherwise the autoencoder will simply learn to copy its
inputs to the output, without learning any meaningful
representation.
It will just mimic the identity function.
The autoencoder will reconstruct the training data
perfectly, but it will be overfitting without being able to
generalize to new instances, which is not what we want. 20
Types of Autoencoders [2]
Vanilla autoencoder
Practical Advise:
This is why we prefer a “sandwitch” architecture, and
deliberately keep the code size small.
Since the coding layer has a lower dimensionality than the
input data, the autoencoder is said to be undercomplete.
It won’t be able to directly copy its inputs to the output,
and will be forced to learn intelligent features.
If the input data has a pattern, for example the digit “1”
usually contains a somewhat straight line and the digit “0” is
circular, it will learn this fact and encode it in a more compact
form.
If the input data was completely random without any
internal correlation or dependency, then an undercomplete
autoencoder won’t be able to recover it perfectly.
But luckily, in the real-world there is a lot of
21
dependency.
Types of Autoencoders [1]
Multilayer autoencoder
autoencoder.fit(x_train, x_train, epochs=5)
reconstructed = autoencoder.predict(x_test)
22
Types of Autoencoders [1]
Multilayer autoencoder
Now our implementation uses 3 hidden layers instead of
just one.
Any of the hidden layers can be picked as the feature
representation but we will make the network symmetrical
and use the middle-most layer.
23
Types of Autoencoders [1]
Convolutional autoencoder
Notice: padding=“valid”
autoencoder.fit(x_train, x_train, epochs=5)
reconstructed = autoencoder.predict(x_test) 24
Types of Autoencoders [1]
Regularized autoencoder
There are other ways we can constraint the
reconstruction of an autoencoder than to impose a hidden
layer of smaller dimension than the input.
Rather than limiting the model capacity by keeping the
encoder and decoder shallow and the code size small,
regularized autoencoders use a loss function that
encourages the model to have other properties besides the
ability to copy its input to its output.
In practice, we usually find two types of regularized
autoencoder:
the sparse autoencoder and
the denoising autoencoder.
25
Types of Autoencoders
Sparse autoencoder
Please recall, when we apply weight regularizer, it is like
adding a term in loss function like 𝑖 𝑗 |𝑤𝑖𝑗 | for L1
regularization and 𝑖 𝑗 𝑤𝑖𝑗 for L2 regularization. This
2
ensures that weights are very small and therefore our
model is simple and we can avoid overfitting.
Here, in sparse autoencoder, we regularize output (that
is activations) of neurons and therefore they are small and
many are zero leading to a sparse representation.
26
Types of Autoencoders [1]
Sparse autoencoder
autoencoder.fit(x_train, x_train, epochs=5)
27
reconstructed = autoencoder.predict(x_test)
Types of Autoencoders [2]
Denoising autoencoder
Keeping the code layer small forced our autoencoder to
learn an intelligent representation of the data.
There is another way to force the autoencoder to learn
useful features, which is adding random noise to its inputs
and making it recover the original noise-free data.
This way the autoencoder can’t simply copy the input to
its output because the input also contains random noise.
We are asking it to subtract the noise and produce the
underlying meaningful data. This is called a denoising
autoencoder.
28
Types of Autoencoders [2]
Denoising autoencoder
The top row contains the original images.
We add random Gaussian noise to them and the noisy
data becomes the input to the autoencoder.
The autoencoder doesn’t see the original image at all.
But then we expect the autoencoder to regenerate the
noise-free original image.
29
Types of Autoencoders [2]
Denoising autoencoder
There is only one small difference between the
implementation of denoising autoencoder and the regular
one. The architecture doesn’t change at all, only the fit
function.
We trained the regular autoencoder as follows:
autoencoder.fit(x_train, x_train)
Denoising autoencoder is trained as:
autoencoder.fit(x_train_noisy, x_train) 30
Types of Autoencoders [2]
Denoising Autoencoders
31
Stacked Autoencoders [3]
Suppose you wished to train a stacked autoencoder with 2
hidden layers for classification of MNIST digits.
First, you would train a sparse autoencoder on the raw inputs
x(k) to learn primary features h(1)(k) on the raw input.
32
Stacked Autoencoders [3]
Next, you would feed the raw input into this trained sparse
autoencoder, obtaining the primary feature activations h(1)(k)
for each of the inputs x(k).
You would then use these primary features as the "raw input"
to another sparse autoencoder to learn secondary features
h(2)(k) on these primary features.
33
Stacked Autoencoders [3]
Following this, you would feed the primary features into the
second sparse autoencoder to obtain the secondary feature
activations h(2)(k) for each of the primary features h(1)(k)
(which correspond to the primary features of the
corresponding inputs x(k)).
You would then treat these secondary features as "raw
input" to a softmax classifier, training it to map secondary
features to digit labels.
34
Stacked Autoencoders [3]
Finally, you would combine all three layers together to form a
stacked autoencoder with 2 hidden layers and a final softmax
classifier layer capable of classifying the MNIST digits as
desired.
35
Stacked Autoencoders [3]
A stacked autoencoder enjoys all the benefits of any deep
network of greater expressive power.
Further, it often captures a useful "hierarchical grouping" or
"part-whole decomposition" of the input.
To see this, recall that an autoencoder tends to learn
features that form a good representation of its input.
The first layer of a stacked autoencoder tends to learn first-
order features in the raw input (such as edges in an image).
The second layer of a stacked autoencoder tends to learn
second-order features corresponding to patterns in the
appearance of first-order features (e.g., in terms of what
edges tend to occur together--for example, to form contour or
corner detectors).
Higher layers of the stacked autoencoder tend to learn even
higher-order features. 36
Stacked Autoencoders [4]
37
Stacked Autoencoders [4]
38
Fine Tuning Stacked Autoencoders [4]
Fine Tuning
39
Stacked Denoising Autoencoders [4]
40
Stacked Denoising Autoencoders [4]
41
Stacked Denoising Autoencoders [4]
42
References
1. https://towardsdatascience.com/deep-inside-
autoencoders-7e41f319999f
2. https://towardsdatascience.com/applied-deep-learning-
part-3-autoencoders-1c083af4d798
3. http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoe
ncoders
4. https://in.mathworks.com/help/deeplearning/examples/tr
ain-stacked-autoencoders-for-image-classification.html
Disclaimer
These slides are not original and have been
prepared from various sources for teaching
purpose.