Pyltr Save

Python learning to rank (LTR) toolkit

Project README

pyltr

|pypi version| |Build status|

.. |pypi version| image:: https://img.shields.io/pypi/v/pyltr.svg :target: https://pypi.python.org/pypi/pyltr .. |Build status| image:: https://secure.travis-ci.org/jma127/pyltr.svg :target: http://travis-ci.org/jma127/pyltr

pyltr is a Python learning-to-rank toolkit with ranking models, evaluation metrics, data wrangling helpers, and more.

This software is licensed under the BSD 3-clause license (see LICENSE.txt).

The author may be contacted at ma127jerry <@t> gmail with general feedback, questions, or bug reports.

Example

Import pyltr::

import pyltr

Import a LETOR <http://research.microsoft.com/en-us/um/beijing/projects/letor/>_ dataset (e.g. MQ2007 <http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2007.rar>_ )::

with open('train.txt') as trainfile, \
        open('vali.txt') as valifile, \
        open('test.txt') as evalfile:
    TX, Ty, Tqids, _ = pyltr.data.letor.read_dataset(trainfile)
    VX, Vy, Vqids, _ = pyltr.data.letor.read_dataset(valifile)
    EX, Ey, Eqids, _ = pyltr.data.letor.read_dataset(evalfile)

Train a LambdaMART <http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf>_ model, using validation set for early stopping and trimming::

metric = pyltr.metrics.NDCG(k=10)

# Only needed if you want to perform validation (early stopping & trimming)
monitor = pyltr.models.monitors.ValidationMonitor(
    VX, Vy, Vqids, metric=metric, stop_after=250)

model = pyltr.models.LambdaMART(
    metric=metric,
    n_estimators=1000,
    learning_rate=0.02,
    max_features=0.5,
    query_subsample=0.5,
    max_leaf_nodes=10,
    min_samples_leaf=64,
    verbose=1,
)

model.fit(TX, Ty, Tqids, monitor=monitor)

Evaluate model on test data::

Epred = model.predict(EX)
print 'Random ranking:', metric.calc_mean_random(Eqids, Ey)
print 'Our model:', metric.calc_mean(Eqids, Ey, Epred)

Features

Below are some of the features currently implemented in pyltr.

Models

LambdaMART (pyltr.models.LambdaMART)
- Validation & early stopping
- Query subsampling

Metrics

(N)DCG (pyltr.metrics.DCG, pyltr.metrics.NDCG)
- pow2 and identity gain functions
ERR (pyltr.metrics.ERR)
- pow2 and identity gain functions
(M)AP (pyltr.metrics.AP)
Kendall's Tau (pyltr.metrics.KendallTau)
AUC-ROC -- Area under the ROC curve (pyltr.metrics.AUCROC)

Data Wrangling

Data loaders (e.g. pyltr.data.letor.read)
Query groupers and validators (pyltr.util.group.check_qids, pyltr.util.group.get_groups)

Running Tests

Use the run_tests.sh script to run all unit tests.

Building Docs

cd into the docs/ directory and run make html. Docs are generated in the docs/_build directory.

Contributing

Quality contributions or bugfixes are gratefully accepted. When submitting a pull request, please update AUTHOR.txt so you can be recognized for your work :).

By submitting a Github pull request, you consent to have your submitted code released under the terms of the project's license (see LICENSE.txt).

Open Source Agenda is not affiliated with "Pyltr" Project. README Source: jma127/pyltr

Stars

461

Open Issues

Last Commit

2 years ago

Repository

jma127/pyltr

License

BSD-3-Clause

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/pyltr"><img src="https://www.opensourceagenda.com/projects/pyltr/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022