NumPy Matrix Factorization (npMF) is a Python package that only depends on NumPy providing a unified interface to different constrained and unconstrained low-rank matrix factorization methods.
npMF currently implements the following algorithms:
- Stochastic gradient descent (SGD) for MF, with and without biases
- Alternating least squares (ALS) for MF, with and without biases
- Alternating nonnegative least squares (ANLS) for NMF
- Bounded matrix factorization (BMF)
- Probabilistic Matrix Factorization (PMF)
Each of these methods is also extended to take a matrix of confidence levels to give each entry a different weight, as commonly required in implicit feedback recommender systems.
A few initialization, quality scoring and learning rate decay functions are also included in npMF.
Given a user-item ratings matrix
whose missing entries are denoted by 0s, we can find a rank-k approximation given by
, where
, by using SGD as follows:
import npmf.models
from npmf.learning_rate_decay import inverse_time_decay
# data
M = ...
# hyperparameters
k = 3
init_lr = 0.1
decay_rate = 1/1.2
lambda_u = 0.1
lambda_i = 0.1
nanvalue = 0
max_iter=2000
# factorize data matrix
W, Z, user_biases, item_biases, loss, err_train, pred_fn = \
npmf.models.sgd(M, num_features=k, nanvalue=nanvalue, lr0=init_lr, batch_size=16,
decay_fn=lambda lr, step: inverse_time_decay(lr, step, decay_rate, max_iter),
lambda_user=lambda_u, lambda_item=lambda_i, max_iter=max_iter)
M_hat = pred_fn(W, Z, user_biases, item_biases)If we also have access to a confidence matrix
giving different weights to each observed entry in M, we can find a rank-k approximation given by
, where
is obtained by prepending the vector of user biases to
(analogously for the other factor), by using SGD as follows:
from npmf.weighted_models import sgd_bias_weight
from npmf.learning_rate_decay import exponential_decay
# data
M = ...
# hyperparameters
k = 3
init_lr = 0.1
decay_rate = 1/1.2
lambda_u = 0.1
lambda_i = 0.1
nanvalue = 0
max_iter=2000
# factorize data matrix
W, Z, user_biases, item_biases, loss, err_train, pred_fn = \
sgd_bias_weight(M, confidence=C, num_features=k, nanvalue=nanvalue, lr0=init_lr, batch_size=16,
decay_fn=lambda lr, step: exponential_decay(lr, step, decay_rate, max_iter),
lambda_user=lambda_u, lambda_item=lambda_i, max_iter=max_iter)
M_hat = pred_fn(W, Z, user_biases, item_biases)We can keep track of every parameter as well as different errors for a model into a class as follows:
import npmf.models
import npmf.error_metrics
from npmf.learning_rate_decay import inverse_time_decay
from npmf.wrapper_classes import MF
# data
train_matrix = ...
valid_matrix = ...
test_matrix = ...
# hyperparameters
k = 3
init_lr = 0.1
decay_rate = 1/1.2
lambda_u = 0.1
lambda_i = 0.1
nanvalue = 0
max_iter=2000
# instantiate model class
SGD = MF(npmf.models.sgd, num_features=k, nanvalue=nanvalue, lr0=init_lr, batch_size=16,
decay_fn=lambda lr, step: inverse_time_decay(lr, step, decay_rate, max_iter, False),
lambda_user=lambda_u, lambda_item=lambda_i, max_iter=max_iter)
# train
SGD.fit(train_matrix)
# predict
predicted_matrix = SGD.predict()
# retrieve factors
W = SGD.user_features
Z = SGD.item_features
# evaluate scores
SGD.score(err_fn=npmf.error_metrics.rmse, matrix=train_matrix, err_type='train')
SGD.score(err_fn=npmf.error_metrics.rmse, matrix=valid_matrix, err_type='validation')
SGD.score(err_fn=npmf.error_metrics.rmse, matrix=test_matrix, err_type='test')
print(SGD.train_errors)We can use a cross-validation class to keep track of every parameter of a model and evaluate its performance over multiple splits:
import npmf.models
import npmf.error_metrics
from npmf.learning_rate_decay import inverse_time_decay
from npmf.wrapper_classes import CvMF
# data
train_matrices = [...]
valid_matrices = [...]
# hyperparameters
k = 3
init_lr = 0.1
decay_rate = 1/1.2
lambda_u = 0.1
lambda_i = 0.1
nanvalue = 0
max_iter=2000
# instantiate model class
cvSGD = CvMF(npmf.models.sgd_bias, num_features=k, nanvalue=nanvalue, lr0=init_lr, batch_size=16,
decay_fn=lambda lr, step: inverse_time_decay(lr, step, decay_rate, max_iter, False),
lambda_user=lambda_u, lambda_item=lambda_i, max_iter=max_iter)
# fit the model
cvSGD.fit(train_matrices)
# training accuracy
cvSGD.score(err_fn=npmf.error_metrics.rmse, matrices_list=train_matrices, err_type='train',
agg_fn=np.mean, dev_fn=npmf.error_metrics.se)
cvSGD.score(err_fn=npmf.error_metrics.mae, matrices_list=train_matrices, err_type='train',
agg_fn=np.mean, dev_fn=npmf.error_metrics.se)
# validation accuracy
cvSGD.score(err_fn=npmf.error_metrics.rmse, matrices_list=valid_matrices, err_type='validation',
agg_fn=np.mean, dev_fn=npmf.error_metrics.se)
cvSGD.score(err_fn=npmf.error_metrics.mae, matrices_list=valid_matrices, err_type='validation',
agg_fn=np.mean, dev_fn=npmf.error_metrics.se)The project was started at Technicolor AI Lab in early 2018.
It is now distributed under the CC BY-NC 4.0 license. See LICENSE for details.