Autoencoders, Extensions, and Applications
Piyush Rai
IIT Kanpur
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 1
Outline
Introduction to Autoencoders
Autoencoder Variants and Extensions
Some Applications of Autoencoders
Autoencoders for Recommender Systems
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 2
Autoencoder
Similar to the standard feedforward neural network with a key difference:
Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3
Autoencoder
Similar to the standard feedforward neural network with a key difference:
Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input
Defined by two (possibly nonlinear) mapping functions: Encoding function f , Decoding function g
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3
Autoencoder
Similar to the standard feedforward neural network with a key difference:
Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input
Defined by two (possibly nonlinear) mapping functions: Encoding function f , Decoding function g
h = f (x) denotes an encoding (possibly nonlinear) for the input x
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3
Autoencoder
Similar to the standard feedforward neural network with a key difference:
Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input
Defined by two (possibly nonlinear) mapping functions: Encoding function f , Decoding function g
h = f (x) denotes an encoding (possibly nonlinear) for the input x
x̂ = g (h) = g (f (x)) denotes the reconstruction (or the “decoding”) for the input x
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3
Autoencoder
Similar to the standard feedforward neural network with a key difference:
Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input
Defined by two (possibly nonlinear) mapping functions: Encoding function f , Decoding function g
h = f (x) denotes an encoding (possibly nonlinear) for the input x
x̂ = g (h) = g (f (x)) denotes the reconstruction (or the “decoding”) for the input x
For an Autoencoder, f and g are learned with a goal to minimize the difference between x̂ and x
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3
Autoencoder for Feature Learning
The learned code h = f (x) can be used as a new feature representation of the input x
Therefore autoencoders can also be used for “feature learning”
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 4
Autoencoder for Feature Learning
The learned code h = f (x) can be used as a new feature representation of the input x
Therefore autoencoders can also be used for “feature learning”
Note: Size of the hidden units (encoding) can also be larger than the input
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 4
A Simple Autoencoder
Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK
We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c
where f is defined by W ∈ RK ×D and b ∈ RK ×1
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5
A Simple Autoencoder
Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK
We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c
where f is defined by W ∈ RK ×D and b ∈ RK ×1 , g is defined by W∗ ∈ RD×K and c ∈ RD×1
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5
A Simple Autoencoder
Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK
We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c
where f is defined by W ∈ RK ×D and b ∈ RK ×1 , g is defined by W∗ ∈ RD×K and c ∈ RD×1
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5
A Simple Autoencoder
Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK
We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c
where f is defined by W ∈ RK ×D and b ∈ RK ×1 , g is defined by W∗ ∈ RD×K and c ∈ RD×1
Note: If we learn f , g to minimize the squared error ||x̂ − x||2 then the linear autoencoder with
W∗ = W> is optimal
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5
A Simple Autoencoder
Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK
We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c
where f is defined by W ∈ RK ×D and b ∈ RK ×1 , g is defined by W∗ ∈ RD×K and c ∈ RD×1
Note: If we learn f , g to minimize the squared error ||x̂ − x||2 then the linear autoencoder with
W∗ = W> is optimal, and is equivalent to Principal Component Analysis (PCA)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5
Autoencoder: Zooming in..
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6
Autoencoder: Zooming in..
W: K × D matrix of weights of edges between input and hidden layer
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6
Autoencoder: Zooming in..
W: K × D matrix of weights of edges between input and hidden layer
Wkd is the weight of edge connecting input layer node d to hidden layer node k
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6
Autoencoder: Zooming in..
W: K × D matrix of weights of edges between input and hidden layer
Wkd is the weight of edge connecting input layer node d to hidden layer node k
W∗ : D × K matrix of weights of edges between hidden and output layer
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6
Autoencoder: Zooming in..
W: K × D matrix of weights of edges between input and hidden layer
Wkd is the weight of edge connecting input layer node d to hidden layer node k
W∗ : D × K matrix of weights of edges between hidden and output layer
∗
Wdk is the weight of edge connecting hidden layer node k to output layer node d
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6
Autoencoder: Zooming in..
W: K × D matrix of weights of edges between input and hidden layer
Wkd is the weight of edge connecting input layer node d to hidden layer node k
W∗ : D × K matrix of weights of edges between hidden and output layer
∗
Wdk is the weight of edge connecting hidden layer node k to output layer node d
If W∗ = W> , the autoencoder architecture is said to have “tied weights”
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6
Nonlinear Autoencoders
The hidden nodes can also be nonlinear transforms of the inputs, e.g.,
Can define h as a linear transform of x followed by a nonlinearity (e.g., sigmoid, ReLU)
h = sigmoid(Wx + b)
1
where the nonlinearity sigmoid(z) = 1+exp(−z)
squashes the real-valued z to lie between 0 and 1
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 7
Nonlinear Autoencoders
The hidden nodes can also be nonlinear transforms of the inputs, e.g.,
Can define h as a linear transform of x followed by a nonlinearity (e.g., sigmoid, ReLU)
h = sigmoid(Wx + b)
1
where the nonlinearity sigmoid(z) = 1+exp(−z)
squashes the real-valued z to lie between 0 and 1
Most commonly used autoencoders use such nonlinear transforms
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 7
Nonlinear Autoencoders
The hidden nodes can also be nonlinear transforms of the inputs, e.g.,
Can define h as a linear transform of x followed by a nonlinearity (e.g., sigmoid, ReLU)
h = sigmoid(Wx + b)
1
where the nonlinearity sigmoid(z) = 1+exp(−z)
squashes the real-valued z to lie between 0 and 1
Most commonly used autoencoders use such nonlinear transforms
Note: If inputs x ∈ {0, 1}D are binary, it may be more appropriate to also define x̂ as
x̂ = sigmoid(W∗ h + c)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 7
What’s Learned by an Autoencoder?
Figure below: The K × D matrix W learned on digits data. Each tiny block visualizes a row of W
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 8
What’s Learned by an Autoencoder?
Figure below: The K × D matrix W learned on digits data. Each tiny block visualizes a row of W
Thus W captures the possible “patterns” in the training data (akin to the K basis vectors in PCA)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 8
What’s Learned by an Autoencoder?
Figure below: The K × D matrix W learned on digits data. Each tiny block visualizes a row of W
Thus W captures the possible “patterns” in the training data (akin to the K basis vectors in PCA)
For any input x, the encoding h tells us how much each of these K features in present in x
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 8
Training the Autoencoder
To train the autoencoder, we need to define a loss function `(x̂, x)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9
Training the Autoencoder
To train the autoencoder, we need to define a loss function `(x̂, x)
The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9
Training the Autoencoder
To train the autoencoder, we need to define a loss function `(x̂, x)
The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9
Training the Autoencoder
To train the autoencoder, we need to define a loss function `(x̂, x)
The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
For a single input x = [x1 , . . . , xD ] and its reconstruction x̂ = [x̂1 , . . . , x̂D ]
D
X
`(x̂, x) = (x̂d − xd )2 (squared loss; used if input are real-valued)
d=1
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9
Training the Autoencoder
To train the autoencoder, we need to define a loss function `(x̂, x)
The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
For a single input x = [x1 , . . . , xD ] and its reconstruction x̂ = [x̂1 , . . . , x̂D ]
D
X
`(x̂, x) = (x̂d − xd )2 (squared loss; used if input are real-valued)
d=1
D
X
`(x̂, x) = − [xd log(x̂d ) + (1 − xd ) log(1 − x̂d )] (cross-entropy loss; used if input are binary)
d=1
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9
Training the Autoencoder
To train the autoencoder, we need to define a loss function `(x̂, x)
The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
For a single input x = [x1 , . . . , xD ] and its reconstruction x̂ = [x̂1 , . . . , x̂D ]
D
X
`(x̂, x) = (x̂d − xd )2 (squared loss; used if input are real-valued)
d=1
D
X
`(x̂, x) = − [xd log(x̂d ) + (1 − xd ) log(1 − x̂d )] (cross-entropy loss; used if input are binary)
d=1
We find (W, b, W∗ , c) by minimizing the reconstruction error (summed over all training data)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9
Training the Autoencoder
To train the autoencoder, we need to define a loss function `(x̂, x)
The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
For a single input x = [x1 , . . . , xD ] and its reconstruction x̂ = [x̂1 , . . . , x̂D ]
D
X
`(x̂, x) = (x̂d − xd )2 (squared loss; used if input are real-valued)
d=1
D
X
`(x̂, x) = − [xd log(x̂d ) + (1 − xd ) log(1 − x̂d )] (cross-entropy loss; used if input are binary)
d=1
We find (W, b, W∗ , c) by minimizing the reconstruction error (summed over all training data)
This can be done using backpropagation
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9
Undercomplete, Overcomplete, and Need for Regularization
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10
Undercomplete, Overcomplete, and Need for Regularization
In both cases, it is important to control the capacity of encoder and decoder
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10
Undercomplete, Overcomplete, and Need for Regularization
In both cases, it is important to control the capacity of encoder and decoder
Undercomplete: Imagine K = 1 and very powerful f and g . Can achieve very small
reconstruction error but the learned code will not capture any interesting properties in the data
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10
Undercomplete, Overcomplete, and Need for Regularization
In both cases, it is important to control the capacity of encoder and decoder
Undercomplete: Imagine K = 1 and very powerful f and g . Can achieve very small
reconstruction error but the learned code will not capture any interesting properties in the data
Overcomplete: Imagine K ≥ D and trivial (identity) functions f and g . Can achieve even zero
reconstruction error but the learned code will not capture any interesting properties in the data
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10
Undercomplete, Overcomplete, and Need for Regularization
In both cases, it is important to control the capacity of encoder and decoder
Undercomplete: Imagine K = 1 and very powerful f and g . Can achieve very small
reconstruction error but the learned code will not capture any interesting properties in the data
Overcomplete: Imagine K ≥ D and trivial (identity) functions f and g . Can achieve even zero
reconstruction error but the learned code will not capture any interesting properties in the data
It is therefore important to regularize the functions as well as the learned code, and not just focus
on minimizing the reconstruction error
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10
Regularized Autoencoders
Several ways to regularize the model, e.g.
Make the learned code sparse (Sparse Autoencoders)
Make the model robust against noisy/incomplete inputs (Denoising Dutoencoders)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 11
Regularized Autoencoders
Several ways to regularize the model, e.g.
Make the learned code sparse (Sparse Autoencoders)
Make the model robust against noisy/incomplete inputs (Denoising Dutoencoders)
Make the model robust against small changes in the input (Contractive Autoencoders)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 11
Sparse Autoencoders
Make the learned code sparse (Sparse Autoencoders). Done by adding a sparsity penalty on h
Loss Function: `(x̂, x) + Ω(h)
PK
where Ω(h) = k=1 |hk | is the `1 norm of h
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 12
Sparse Autoencoders
Make the learned code sparse (Sparse Autoencoders). Done by adding a sparsity penalty on h
Loss Function: `(x̂, x) + Ω(h)
PK
where Ω(h) = k=1 |hk | is the `1 norm of h
Sparse autoencoder is learned by minimizing the above regularized loss function
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 12
Denoising Autoencoders
First add some noise (e.g., Gaussian noise) to the original input x
Let’s denote x̃ as the corrupted version of x
The encoder f operates on x̃, i.e., h = f (x̃)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 13
Denoising Autoencoders
First add some noise (e.g., Gaussian noise) to the original input x
Let’s denote x̃ as the corrupted version of x
The encoder f operates on x̃, i.e., h = f (x̃)
However, we still want to reconstruction x̂ to be close to the original uncorrupted input x
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 13
Denoising Autoencoders
First add some noise (e.g., Gaussian noise) to the original input x
Let’s denote x̃ as the corrupted version of x
The encoder f operates on x̃, i.e., h = f (x̃)
However, we still want to reconstruction x̂ to be close to the original uncorrupted input x
Since the corruption is stochastic, we minimize the expected loss: Ex̃∼p(x̃|x) [`(x̂, x̃)] + Ω(h)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 13
Deep/Stacked Autoencoders
Most autoencoders can be extended to have more than one hidden layer
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 14
Stochastic Autoencoders
Can also define the encoder and decoder functions using probability distributions
pencoder (h|x)
pdecoder (x|h)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15
Stochastic Autoencoders
Can also define the encoder and decoder functions using probability distributions
pencoder (h|x)
pdecoder (x|h)
The choice of distributions depends on the type of data being modeled and of the encodings
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15
Stochastic Autoencoders
Can also define the encoder and decoder functions using probability distributions
pencoder (h|x)
pdecoder (x|h)
The choice of distributions depends on the type of data being modeled and of the encodings
This gives a probabilistic approach for designing autoencoders
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15
Stochastic Autoencoders
Can also define the encoder and decoder functions using probability distributions
pencoder (h|x)
pdecoder (x|h)
The choice of distributions depends on the type of data being modeled and of the encodings
This gives a probabilistic approach for designing autoencoders
Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15
Stochastic Autoencoders
Can also define the encoder and decoder functions using probability distributions
pencoder (h|x)
pdecoder (x|h)
The choice of distributions depends on the type of data being modeled and of the encodings
This gives a probabilistic approach for designing autoencoders
Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Can also use a prior distribution p(h) on the encodings (equivalent to regularizer)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15
Stochastic Autoencoders
Can also define the encoder and decoder functions using probability distributions
pencoder (h|x)
pdecoder (x|h)
The choice of distributions depends on the type of data being modeled and of the encodings
This gives a probabilistic approach for designing autoencoders
Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Can also use a prior distribution p(h) on the encodings (equivalent to regularizer)
Such ideas have been used to design generative models for autoencoders
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15
Stochastic Autoencoders
Can also define the encoder and decoder functions using probability distributions
pencoder (h|x)
pdecoder (x|h)
The choice of distributions depends on the type of data being modeled and of the encodings
This gives a probabilistic approach for designing autoencoders
Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Can also use a prior distribution p(h) on the encodings (equivalent to regularizer)
Such ideas have been used to design generative models for autoencoders
Variational Autoencoder (VAE) is a popular example of such a model
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15
Stochastic Autoencoders
Can also define the encoder and decoder functions using probability distributions
pencoder (h|x)
pdecoder (x|h)
The choice of distributions depends on the type of data being modeled and of the encodings
This gives a probabilistic approach for designing autoencoders
Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Can also use a prior distribution p(h) on the encodings (equivalent to regularizer)
Such ideas have been used to design generative models for autoencoders
Variational Autoencoder (VAE) is a popular example of such a model
Generative models like VAE can be used to “generate” new data using a random code h
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15
Variational Autoencoders (VAE)
Learns a distribution (e.g., a Gaussian) on the encoding1
1 http://www.birving.com/presentations/autoencoders/
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 16
Variational Autoencoders (VAE)
Learns a distribution (e.g., a Gaussian) on the encoding1
Unlike standard AE, a VAE model learns to generate plausible data from random encodings
1 http://www.birving.com/presentations/autoencoders/
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 16
Some Applications of Autoencoders
(Unsupervised) Feature learning and Dimensionality reduction
Denoising and inpainting
Pre-training of deep neural networks
Recommender systems applications
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 17
Feature learning and Dimensionality Reduction
Example: A deep AE for low-dim feature learning for 784-dimensional MNIST images2
2 Figure credit: Hinton and Salakhutdinov
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 18
Feature learning and Dimensionality Reduction
Example: Low-dim feature learning for 2000-dimensional bag-of-words documents
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 19
Denoising and Inpainting
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 20
Denoising and Inpainting
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 20
Applications in Recommender Systems
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 21
Recommender Systems
Assume we are given a partially known N × M ratings matrix R of N users on M items (movies)
Denote by r (u) the (partially known) M × 1 ratings vector of user u
Denote by r (i) the (partially known) N × 1 ratings vector of item i
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 22
Recommender Systems
Assume we are given a partially known N × M ratings matrix R of N users on M items (movies)
Denote by r (u) the (partially known) M × 1 ratings vector of user u
Denote by r (i) the (partially known) N × 1 ratings vector of item i
How can we use this data to build a recommender system?
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 22
Recommender Systems via Matrix Completion
An idea: If the predicted value of a user’s rating for a movie is high, then we should ideally
recommend this movie to the user
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 23
Recommender Systems via Matrix Completion
An idea: If the predicted value of a user’s rating for a movie is high, then we should ideally
recommend this movie to the user
Thus if we can “reconstruct” the missing entries in R, we can use this method to recommend
movies to users. Using an autoencoders can help us do this!
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 23
An Autoencoder based Approach
Using the rating vectors {r (u) }N
u=1 of all users, can learn an autoencoder
Note: During backprop, only update weights in W that are connected to the observed ratings3
Once learned, the model can predict (reconstruct) the missing ratings
3 AutoRec: Autoencoders Meet Collaborative Filtering (Sedhain et al, WWW 2015)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 24
Another Autoencoder based Approach
Another approach is to combine (denoising) autoencoders with a matrix factorization model4
4 Deep Collaborative Filtering via Marginalized Denoising Auto-encoder (Li et al, CIKM 2015)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 25
Another Autoencoder based Approach
Another approach is to combine (denoising) autoencoders with a matrix factorization model4
Idea: Rating of a user u on an item i can be defined using the inner-product based similarity of
>
their features learned via an autoencoder: Rui = f (h (u) h (i) ) where f is some compatibity function
4 Deep Collaborative Filtering via Marginalized Denoising Auto-encoder (Li et al, CIKM 2015)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 25
Another Autoencoder based Approach
Another approach is to combine (denoising) autoencoders with a matrix factorization model4
Idea: Rating of a user u on an item i can be defined using the inner-product based similarity of
>
their features learned via an autoencoder: Rui = f (h (u) h (i) ) where f is some compatibity function
Denoting {h (u) }N (i) M
u=1 = U and {h }i=1 = V, we can write R = UV
>
4 Deep Collaborative Filtering via Marginalized Denoising Auto-encoder (Li et al, CIKM 2015)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 25
Other Approaches on Autoencoders for Recommender Systems
Several recent papers on similar autoencoder based ideas
Collaborative Denoising Auto-Encoders for Top-N Recommender Systems (Wu et al, WSDM 2016)
Collaborative Deep Learning for Recommender Systems (Wang et al, KDD 2015)
Also possible to incorporate side information about the users and/or items (Wang et al, KDD 2015)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 26
Autoencoders: Summary
Simple and powerful for (nonlinear) feature learning
Learned features are able to capture salient properties of data
Several extensions (sparse, denoising, stochastic, etc.)
Can also be stacked to create “deep” autoencoders
Recent focus on autoencoders that are based on generative models of data
Example: Variational Autoencoders
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 27