Machine Learning
Matrix Factorization
Dong-Kyu Chae
PI of the Data Intelligence Lab @HYU
Department of Computer Science
Hanyang University
Problem Setting
❑ Suppose we have a sparse user-item rating matrix
❑ Which rating will be given on each empty cell?
Problem Setting
❑ This is a typical recommendation scenario
❑ Several ways to predict future ratings
❑ Simple approach: popularity sorting, High Avg. rating, …
▪ Simple and easy-to-implement
▪ Global recommendation (non-personalized)
Problem Setting
❑ This is a typical recommendation scenario
❑ Several ways to predict future ratings
❑ Heuristic approach: Neighborhood-based
▪ Looking for the most similar K users
▪ Aggregating their ratings on a target movie
• 𝑁: Set of neighbors for user u
• sim(u,u’): similarity between users
• 𝑟𝑢ҧ : Average of ratings of user u
Problem Setting
❑ This is a typical recommendation scenario
❑ Several ways to predict future ratings
❑ Machine learning approach?
▪ What is ground truth?: observed ratings
▪ What is our model?: Matrix factorization! (a.k.a. latent factor models)
Matrix Factorization
❑ Ratings come from interaction between latent factors
❑ Assumption: there are user latent factors and item latent factors that
result in the rating score.
[a, b, c] x = 5
5
[a′, b′, c′]
[a, b, c] [a′, b′, c′]
<user latent factors>
3
[a, b, c] x = 3
[a′, b′, c′]
[a′, b′, c′]
<item latent factors> <linear interaction> <ratings>
Matrix Factorization
❑ Latent factors
❑ Features that describe characteristics of users and items hidden in data
❑ We don′t know the explicit meaning of each factor in the latent matrices,
but there might be “some”meaning
How much the user 2 cares about
the first factor
How much the item 3 satisfies the
first factor
V
A low-rank matrix of
U
item latent factors
R A low-rank matrix of
user latent factors
Matrix Factorization
❑ Perspective of latent space
Matrix Factorization
❑ How can I know such latent factor values?
❑ “Learn” the latent factors, 𝑈 and 𝑉, that may originate the ratings
❑ All the latent factors in 𝑈 and 𝑉 are model parameters
❑ Randomly initialized, and learned through some optimization
techniques.
Objective:
Matrix Factorization
❑ Rating prediction with the completely learned U and V
❑ We can reconstruct “dense” rating matrix through 𝑈⋅𝑉𝑇
Matrix Factorization
❑ Objective function: minimizing the sum of squared error
❑ Practically: there is an overfitting issue
100 -81
Matrix Factorization
❑ Objective function with the regularization term
❑ The goodness of fit is to reduce the prediction error
❑ The regularization term is used to alleviate the overfitting problem
❑ Two ways to training the MF model
❑ Stochastic gradient descent (SGD)
▪ Training U and V simultaneously
❑ Alternating least squares (ALS)
▪ Fix one of U and V, and then optimize the other
Stochastic Gradient Descent
❑ Objective:
❑ Process:
1. Initialize U and V randomly
2. Repeat:
1. Choose a rating (r, u, i) randomly
2. Then the loss on the chosen (u, i) is:
1
( )
2
3. Update via
Stochastic Gradient Descent
1
❑ Partial derivative of the loss 𝐿 = ( )
2
𝜕𝐿 1 ′
= 2𝑒𝑢𝑖 ∙ 𝑒𝑢𝑖 +2𝜆𝑈𝑢
𝜕𝑈𝑢 2
′
= 𝑒𝑢𝑖 ∙ 𝑒𝑢𝑖 +𝜆𝑈𝑢
= 𝑒𝑢𝑖 ∙ (−𝑉𝑖𝑇 ) + 𝜆𝑈𝑢
𝜕𝐿 1 ′
= 2𝑒𝑢𝑖 ∙ 𝑒𝑢𝑖 +2𝜆𝑉𝑖
𝜕𝑉𝑖 2
′
= 𝑒𝑢𝑖 ∙ 𝑒𝑢𝑖 +𝜆𝑉𝑖
= 𝑒𝑢𝑖 ∙ (−𝑈𝑢 ) + 𝜆𝑉𝑖
Alternating Least Squares
❑ Fix one, then the loss surface becomes simple
𝝏𝑬
=0
𝝏𝑼𝒊
Alternating Least Squares
❑ Objective:
❑ Procedure
1. Initialize the user latent factors U randomly
2. For each item 𝑖, let 𝑟𝑖 be the vector of ratings of that item. Compute:
𝑉𝑖𝑇 = 𝑈 𝑇 𝑈 + 𝜆𝐼 −1 𝑈 𝑇 𝑟
𝑖
(here, 𝑈 is constant)
3. For each user 𝑢, let 𝑟𝑢 be the vector of ratings of that item. Compute:
𝑈𝑢𝑇 = 𝑉 𝑇 𝑉 + 𝜆𝐼 𝑉 𝑟𝑢 (here, 𝑉 is constant)
−1 𝑇
4. Repeat 2), 3) until convergence
Alternating Least Squares
❑ Partial derivative of the loss
❑ When U is constant, 𝐿 = )
𝑇
𝐿 = 𝑟𝑖 − 𝑈𝑉𝑖𝑇 𝑟𝑖 − 𝑈𝑉𝑖𝑇 + 𝜆𝑉𝑖 𝑉𝑖𝑇 *
Let 𝑉𝑖𝑇 = 𝛽. Then,
𝐿 = 𝑟𝑖 − 𝑈𝛽 𝑇 𝑟𝑖 − 𝑈𝛽 + 𝜆𝛽 𝑇 𝛽
𝜕𝐿
Let = 0. Then,
𝜕𝛽
𝛽 = 𝑉𝑖𝑇 = 𝑈 𝑇 𝑈 + 𝜆𝐼 −1 𝑈 𝑇 𝑟
𝑖
repeat i=1 to n
Alternating Least Squares
❑ Partial derivative of the loss
❑ When V is constant, 𝐿 = )
Let 𝑟𝑢 = 𝑅𝑢𝑇 . Then, we can re-write the loss:
*
𝐿 = 𝑟𝑢 − 𝑉𝑈𝑢𝑇 𝑇
𝑟𝑢 − 𝑉𝑈𝑢𝑇 + 𝜆𝑈𝑢 𝑈𝑢𝑇
Let 𝑈𝑢𝑇 = 𝛽. Then,
𝐿 = 𝑟𝑢 − 𝑉𝛽 𝑇 𝑟𝑢 − 𝑉𝛽 + 𝜆𝛽 𝑇 𝛽
𝜕𝐿
Let = 0. Then,
𝜕𝛽
𝛽 = 𝑈𝑢𝑇 = 𝑉 𝑇 𝑉 + 𝜆𝐼 −1 𝑇
𝑉 𝑟𝑢
repeat u=1 to m
Autoencoder for MF
❑ Same objective function, different model
❑ Matrix factorization => Autoencoder
❑ Two types of Autoencoders for recommendation
❑ user-based and item-based
❑ Users and items are represented as vectors
Learned embeddings
[ x,
3,
…
…
…
…
…
x,
5]
reconstruction loss
2
ℒ𝑚𝑠𝑒 =σ𝑖 (𝐫𝒊 − 𝐫ො𝒊 ) ∙ 𝐲𝒊
Actual Predicted Indicator
ratings ratings {0, 1}
Thank You