Factorization Machines for Recommendation and Ranking Problems with Implicit Feedback Data
pip
installs on both OSX and LinuxMANIFEST.in
to include all C source and headers in the sdist
archivesetup.py
to favor building extensions from the generated C source rather than re-cythonizing the .pyx
files. This is best practice according to the Cython docs.sdist
from now on.no changes, just syncing things up.
_fit()
, _predict()
, _recommend()
- the Cython _fit()
function is 5X-10X faster than the original Numba version, and predict()
/recommend()
are about the same speed.regularization
into two parameters: alpha
to control the L2 regularization for user/item indicators and beta
to control the regularization for user-features/item-features. In testing user-features/item-features tended to have exploding gradients/overwhelm utility scores unless more strongly regularized, especially with fairly dense side features. Typically beta
should be set fairly high (e.g. 0.1) to avoid numerical instability.loss
param out of the private Numba internals and into the public fit()
function_init_interactions
to extend rather than replace the user_items
dictionary item setsfit()
. This reduces training time by over 50% if just using the base interaction matrix (no additional user/item features).similar_users()
, similar_items()
were performing validation checks on the original ID versus the zero-based index (wrong) instead of original values (correct) - this was causes a bunch of bogus assertion errors saying that the item_id wasn't in the training setloss
and max_samples
_fit()
function to elegantly (IMHO) handle both BPR and WARP loss[user_id, item_id]
were not getting loaded/indexed correctly - fixed by using the input class determination utility created