Skip to content

e-bug/npmf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

npMF

NumPy Matrix Factorization (npMF) is a Python package that only depends on NumPy providing a unified interface to different constrained and unconstrained low-rank matrix factorization methods.

npMF currently implements the following algorithms:

Each of these methods is also extended to take a matrix of confidence levels to give each entry a different weight, as commonly required in implicit feedback recommender systems.

A few initialization, quality scoring and learning rate decay functions are also included in npMF.

Usage

Using a model

Given a user-item ratings matrix whose missing entries are denoted by 0s, we can find a rank-k approximation given by , where , by using SGD as follows:

import npmf.models
from npmf.learning_rate_decay import inverse_time_decay

# data
M = ...

# hyperparameters
k = 3
init_lr = 0.1
decay_rate = 1/1.2
lambda_u = 0.1
lambda_i = 0.1
nanvalue = 0
max_iter=2000

# factorize data matrix
W, Z, user_biases, item_biases, loss, err_train, pred_fn = \
    npmf.models.sgd(M, num_features=k, nanvalue=nanvalue, lr0=init_lr, batch_size=16,
                    decay_fn=lambda lr, step: inverse_time_decay(lr, step, decay_rate, max_iter),
                    lambda_user=lambda_u, lambda_item=lambda_i, max_iter=max_iter)
M_hat = pred_fn(W, Z, user_biases, item_biases)

Using a weighted model

If we also have access to a confidence matrix giving different weights to each observed entry in M, we can find a rank-k approximation given by , where is obtained by prepending the vector of user biases to (analogously for the other factor), by using SGD as follows:

from npmf.weighted_models import sgd_bias_weight
from npmf.learning_rate_decay import exponential_decay

# data
M = ...

# hyperparameters
k = 3
init_lr = 0.1
decay_rate = 1/1.2
lambda_u = 0.1
lambda_i = 0.1
nanvalue = 0
max_iter=2000

# factorize data matrix
W, Z, user_biases, item_biases, loss, err_train, pred_fn = \
    sgd_bias_weight(M, confidence=C, num_features=k, nanvalue=nanvalue, lr0=init_lr, batch_size=16,
                    decay_fn=lambda lr, step: exponential_decay(lr, step, decay_rate, max_iter),
                    lambda_user=lambda_u, lambda_item=lambda_i, max_iter=max_iter)
M_hat = pred_fn(W, Z, user_biases, item_biases)

Using a class for a model

We can keep track of every parameter as well as different errors for a model into a class as follows:

import npmf.models
import npmf.error_metrics
from npmf.learning_rate_decay import inverse_time_decay
from npmf.wrapper_classes import MF

# data
train_matrix = ...
valid_matrix = ...
test_matrix = ...

# hyperparameters
k = 3
init_lr = 0.1
decay_rate = 1/1.2
lambda_u = 0.1
lambda_i = 0.1
nanvalue = 0
max_iter=2000

# instantiate model class
SGD = MF(npmf.models.sgd, num_features=k, nanvalue=nanvalue, lr0=init_lr, batch_size=16,
         decay_fn=lambda lr, step: inverse_time_decay(lr, step, decay_rate, max_iter, False),
         lambda_user=lambda_u, lambda_item=lambda_i, max_iter=max_iter)

# train
SGD.fit(train_matrix)
# predict
predicted_matrix = SGD.predict()
# retrieve factors
W = SGD.user_features 
Z = SGD.item_features
# evaluate scores
SGD.score(err_fn=npmf.error_metrics.rmse, matrix=train_matrix, err_type='train')
SGD.score(err_fn=npmf.error_metrics.rmse, matrix=valid_matrix, err_type='validation')
SGD.score(err_fn=npmf.error_metrics.rmse, matrix=test_matrix, err_type='test')

print(SGD.train_errors)

Using a class for cross-validating a model

We can use a cross-validation class to keep track of every parameter of a model and evaluate its performance over multiple splits:

import npmf.models
import npmf.error_metrics
from npmf.learning_rate_decay import inverse_time_decay
from npmf.wrapper_classes import CvMF

# data
train_matrices = [...]
valid_matrices = [...]

# hyperparameters
k = 3
init_lr = 0.1
decay_rate = 1/1.2
lambda_u = 0.1
lambda_i = 0.1
nanvalue = 0
max_iter=2000

# instantiate model class
cvSGD = CvMF(npmf.models.sgd_bias, num_features=k, nanvalue=nanvalue, lr0=init_lr, batch_size=16,
             decay_fn=lambda lr, step: inverse_time_decay(lr, step, decay_rate, max_iter, False),
             lambda_user=lambda_u, lambda_item=lambda_i, max_iter=max_iter)

# fit the model
cvSGD.fit(train_matrices)
# training accuracy
cvSGD.score(err_fn=npmf.error_metrics.rmse, matrices_list=train_matrices, err_type='train', 
            agg_fn=np.mean, dev_fn=npmf.error_metrics.se)
cvSGD.score(err_fn=npmf.error_metrics.mae, matrices_list=train_matrices, err_type='train', 
            agg_fn=np.mean, dev_fn=npmf.error_metrics.se)
# validation accuracy
cvSGD.score(err_fn=npmf.error_metrics.rmse, matrices_list=valid_matrices, err_type='validation', 
            agg_fn=np.mean, dev_fn=npmf.error_metrics.se)
cvSGD.score(err_fn=npmf.error_metrics.mae, matrices_list=valid_matrices, err_type='validation', 
            agg_fn=np.mean, dev_fn=npmf.error_metrics.se)

License

The project was started at Technicolor AI Lab in early 2018. It is now distributed under the CC BY-NC 4.0 license. See LICENSE for details.

About

npMF: NumPy Matrix Factorization package.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages