LIBMF - large-scale sparse matrix factorization - for Ruby
Check out Disco for higher-level collaborative filtering
Add this line to your application’s Gemfile:
gem "libmf"Prep your data in the format row_index, column_index, value
data = Libmf::Matrix.new
data.push(0, 0, 5.0)
data.push(0, 2, 3.5)
data.push(1, 1, 4.0)Create a model
model = Libmf::Model.new
model.fit(data)Make predictions
model.predict(row_index, column_index)Get the latent factors (these approximate the training matrix)
model.p_factors
model.q_factorsGet the bias (average of all elements in the training matrix)
model.biasSave the model to a file
model.save("model.txt")Load the model from a file
model = Libmf::Model.load("model.txt")Pass a validation set
model.fit(data, eval_set: eval_set)Perform cross-validation
model.cv(data)Specify the number of folds
model.cv(data, folds: 5)Pass parameters - default values below
Libmf::Model.new(
loss: :real_l2, # loss function
factors: 8, # number of latent factors
threads: 12, # number of threads used
bins: 25, # number of bins
iterations: 20, # number of iterations
lambda_p1: 0, # coefficient of L1-norm regularization on P
lambda_p2: 0.1, # coefficient of L2-norm regularization on P
lambda_q1: 0, # coefficient of L1-norm regularization on Q
lambda_q2: 0.1, # coefficient of L2-norm regularization on Q
learning_rate: 0.1, # learning rate
alpha: 1, # importance of negative entries
c: 0.0001, # desired value of negative entries
nmf: false, # perform non-negative MF (NMF)
quiet: false # no outputs to stdout
)For real-valued matrix factorization
:real_l2- squared error (L2-norm):real_l1- absolute error (L1-norm):real_kl- generalized KL-divergence
For binary matrix factorization
:binary_log- logarithmic error:binary_l2- squared hinge loss:binary_l1- hinge loss
For one-class matrix factorization
:one_class_row- row-oriented pair-wise logarithmic loss:one_class_col- column-oriented pair-wise logarithmic loss:one_class_l2- squared error (L2-norm)
Calculate RMSE (for real-valued MF)
model.rmse(data)Calculate MAE (for real-valued MF)
model.mae(data)Calculate generalized KL-divergence (for non-negative real-valued MF)
model.gkl(data)Calculate logarithmic loss (for binary MF)
model.logloss(data)Calculate accuracy (for binary MF)
model.accuracy(data)Calculate MPR (for one-class MF)
model.mpr(data, transpose)Calculate AUC (for one-class MF)
model.auc(data, transpose)Download the MovieLens 100K dataset and use:
require "csv"
train_set = Libmf::Matrix.new
valid_set = Libmf::Matrix.new
CSV.foreach("u.data", col_sep: "\t").with_index do |row, i|
data = i < 80000 ? train_set : valid_set
data.push(row[0].to_i, row[1].to_i, row[2].to_f)
end
model = Libmf::Model.new(factors: 20)
model.fit(train_set, eval_set: valid_set)
puts model.rmse(valid_set)For performance, read data directly from files
model.fit("train.txt", eval_set: "validate.txt")
model.cv("train.txt")Data should be in the format row_index column_index value:
0 0 5.0
0 2 3.5
1 1 4.0Get latent factors as Numo arrays
model.p_factors(format: :numo)
model.q_factors(format: :numo)View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/libmf-ruby.git
cd libmf-ruby
bundle install
bundle exec rake vendor:all
bundle exec rake test