Rankfm Versions Save

Factorization Machines for Recommendation and Ranking Problems with Implicit Feedback Data

v0.2.5

3 years ago

Added

  • working PyPI and GitHub pip installs on both OSX and Linux
  • wrapped the external Mersenne Twister C library to generate better random numbers for BPR/WARP training
  • added a MANIFEST.in to include all C source and headers in the sdist archive

Changed

  • changed the logic in setup.py to favor building extensions from the generated C source rather than re-cythonizing the .pyx files. This is best practice according to the Cython docs.
  • removed Cython as a formal dependency as the generated C code will be included in the package sdist from now on.

v0.2.3

3 years ago

Changed

  • needed to instruct Python to compile the created .c file instead of the .pyx file as the latter doesn't get added to the sdist
  • build tested and working now on both OSX and Linux

v0.2.2

3 years ago

no changes, just syncing things up.

v0.2.0

3 years ago

Added

  • Cython back-end for _fit(), _predict(), _recommend() - the Cython _fit() function is 5X-10X faster than the original Numba version, and predict()/recommend() are about the same speed.

Changed

  • split regularization into two parameters: alpha to control the L2 regularization for user/item indicators and beta to control the regularization for user-features/item-features. In testing user-features/item-features tended to have exploding gradients/overwhelm utility scores unless more strongly regularized, especially with fairly dense side features. Typically beta should be set fairly high (e.g. 0.1) to avoid numerical instability.

v0.1.3

3 years ago

Changed

  • pull the string loss param out of the private Numba internals and into the public fit() function
  • change _init_interactions to extend rather than replace the user_items dictionary item sets
  • added conditional logic to skip expensive user-feature/item-feature dot products if user and/or item features were not provided in the call to fit(). This reduces training time by over 50% if just using the base interaction matrix (no additional user/item features).

Fixed

  • bug where similar_users(), similar_items() were performing validation checks on the original ID versus the zero-based index (wrong) instead of original values (correct) - this was causes a bunch of bogus assertion errors saying that the item_id wasn't in the training set

v0.1.2

3 years ago

Added

  • WARP loss - while slower to train this yields slightly better performance on dense interaction data and much better performance on highly sparse interaction data relative to BPR
  • new hyperparameters loss and max_samples
  • re-wrote the numba _fit() function to elegantly (IMHO) handle both BPR and WARP loss

v0.1.1

3 years ago

Added

  • added support for sample weights - you can now pass importance weights in addition to interactions
  • automatically determine the input data class (np.ndarray vs. pd.dataframe/pd.series)
  • assert/ensure that all model weights are finite after each training epoch to fail fast for exploding weights

Fixed

  • bug where pd.dataframe interactions with columns not named [user_id, item_id] were not getting loaded/indexed correctly - fixed by using the input class determination utility created

Changed

  • more efficient loops for updating item feature and user/item feature factor weights - this cuts training time by around 30% with no auxiliary features, and by 50%+ in the presence of auxiliary features

v0.1.0

3 years ago

Added

  • core package functionality
  • example notebook: quickstart.ipynb
  • source distribution and package wheel
  • basic test suite
  • CircleCI build, lint, test CI workflows