🔬 Research Framework for Single and Multi-Players 🎰 Multi-Arms Bandits (MAB) Algorithms, implementing all the state-of-the-art algorithms for single-player (UCB, KL-UCB, Thompson...) and multi-player (MusicalChair, MEGA, rhoRand, MCTop/RandTopM etc).. Available on PyPI: https://pypi.org/project/SMPyBandits/ and documentation on
Open-Source Python package for Single- and Multi-Players multi-armed Bandits algorithms.
This repository contains the code of Lilian Besson's numerical environment, written in Python (2 or 3), for numerical simulations on :slot_machine: single-player and multi-players Multi-Armed Bandits (MAB) algorithms.
It contains the most complete collection of single-player (classical) bandit algorithms on the Internet (over 65!), as well as implementation of all the state-of-the-art multi-player algorithms.
I follow very actively the latest publications related to Multi-Armed Bandits (MAB) research, and usually implement quite quickly the new algorithms (see for instance, Exp3++, CORRAL and SparseUCB were each introduced by articles (for Exp3++, for CORRAL, for SparseUCB) presented at COLT in July 2017, LearnExp comes from a NIPS 2017 paper, and kl-UCB++ from an ALT 2017 paper.). More recent examples are klUCBswitch from a paper from May 2018, and also MusicalChairNoSensing from a paper from August 2018.
rhoRand
from 2009, MEGA
from 2015, MusicalChair
, and our state-of-the-art algorithms RandTopM
and MCTopM
, along with very recent algorithms SIC-MMAB
from arXiv:1809.08151 and MusicalChairNoSensing
from arXiv:1808.08416).With this numerical framework, simulations can run on a single CPU or a multi-core machine, and summary plots are automatically saved as high-quality PNG, PDF and EPS (ready for being used in research article).
Making new simulations is very easy, one only needs to write a configuration script and basically no code! See these examples (files named configuration_*.py
).
A complete Sphinx documentation for each algorithms and every piece of code (included constants in the configurations!) is available here: SMPyBandits.GitHub.io. (I will use ReadTheDocs for this project, but I won't use any continuous integration, don't even think of it!)
I (Lilian Besson) have started my PhD in October 2016, and this is a part of my on going research since December 2016.
I launched the documentation on March 2017, I wrote my first research articles using this framework in 2017 and decided to (finally) open-source my project in February 2018. / : /
If you use this package for your own work, please consider citing it with this piece of BibTeX:
@misc{SMPyBandits,
title = {{SMPyBandits: an Open-Source Research Framework for Single and Multi-Players Multi-Arms Bandits (MAB) Algorithms in Python}},
author = {Lilian Besson},
year = {2018},
url = {https://github.com/SMPyBandits/SMPyBandits/},
howpublished = {Online at: \url{github.com/SMPyBandits/SMPyBandits}},
note = {Code at https://github.com/SMPyBandits/SMPyBandits/, documentation at https://smpybandits.github.io/}
}
I also wrote a small paper to present SMPyBandits, and I will send it to JMLR MLOSS. The paper can be consulted here on my website.
A DOI will arrive as soon as possible! I tried to publish a paper on both JOSS and MLOSS.
I designed and added the Aggregator
policy, in order to test its validity and performance.
It is a "simple" voting algorithm to combine multiple bandit algorithms into one.
Basically, it behaves like a simple MAB bandit just based on empirical means (even simpler than UCB), where arms are the child algorithms A_1 .. A_N
, each running in "parallel".
For more details, refer to this file:
Aggregation.md
and this research article.
PDF : BKM_IEEEWCNC_2018.pdf | HAL notice : BKM_IEEEWCNC_2018 | BibTeX : BKM_IEEEWCNC_2018.bib | Source code and documentation
There is another point of view: instead of comparing different single-player policies on the same problem, we can make them play against each other, in a multi-player setting.
The basic difference is about collisions : at each time t
, if two or more user chose to sense the same channel, there is a collision. Collisions can be handled in different way from the base station point of view, and from each player point of view.
For more details, refer to this file:
MultiPlayers.md
and this research article.
PDF : BK__ALT_2018.pdf | HAL notice : BK__ALT_2018 | BibTeX : BK__ALT_2018.bib | Source code and documentation
I studied what Doubling Trick can and can't do to obtain efficient anytime version of non-anytime optimal Multi-Armed Bandits algorithms.
For more details, refer to this file:
DoublingTrick.md
and this research article.
PDF : BK__DoublingTricks_2018.pdf | HAL notice : BK__DoublingTricks_2018 | BibTeX : BK__DoublingTricks_2018.bib | Source code and documentation
With Emilie Kaufmann, we studied the Generalized Likelihood Ratio Test (GLRT) for sub-Bernoulli distributions, and proposed the B-GLRT algorithm for change-point detection for piece-wise stationary one-armed bandit problems. We combined the B-GLRT with the kl-UCB multi-armed bandit algorithm and proposed the GLR-klUCB algorithm for piece-wise stationary multi-armed bandit problems. We prove finite-time guarantees for the B-GLRT and the GLR-klUCB algorithm, and we illustrate its performance with extensive numerical experiments.
For more details, refer to this file:
NonStationaryBandits.md
and this research article.
PDF : BK__COLT_2019.pdf | HAL notice : BK__COLT_2019 | BibTeX : BK__COLT_2019.bib | Source code and documentation
UCB
, kl-UCB, MOSS
and Thompson Sampling algorithms, as well as other less known algorithms (OCUCB
, BESA
, OSSB
etc).SparseWrapper
is a generalization of the SparseUCB from this article.kl-UCB++
(from this article), UCB-dagger
(from this article), or MOSS-anytime
(from this article).BlackBoxOpt
or UnsupervisedLearning
(using Gaussian processes to learn the arms distributions).Bernoulli
, bounded (truncated) or unbounded Gaussian
, Exponential
, Gamma
or Poisson
distributions, and more.MAB.MAB
), but there is also a perfect support for "Bayesian" problems where the mean vector µ1,…,µK change at every repetition (see MAB.DynamicMAB
).MAB.MarkovianMAB
, even though I didn't implement any policies tailored for Markovian problems.MAB.PieceWiseStationaryMAB
is already working well. Use it with policies designed for piece-wise stationary problems, like Discounted-Thompson, one of the CD-UCB algorithms, M-UCB, SlidingWindowUCB or Discounted-UCB, or SW-UCB#.API.md
).Evaluator
and EvaluatorMultiPlayers
classes, so the simulations are easily parallelized on multi-core machines. (Put n_jobs = -1
or PARALLEL = True
in the config file to use all your CPU cores, as it is by default).See this document:
How_to_run_the_code.md
for more details (or this documentation page).
TL;DR: this short bash snippet shows how to clone the code, install the requirements for Python 3 (in a virtualenv, and starts some simulation for N=100 repetitions of the default non-Bayesian Bernoulli-distributed problem, for K=9 arms, an horizon of T=10000 and on 4 CPUs (it should take about 20 minutes for each simulations):
cd /tmp/ # or wherever you want
git clone -c core.symlinks=true https://GitHub.com/SMPyBandits/SMPyBandits.git
cd SMPyBandits
# just be sure you have the latest virtualenv from Python 3
sudo pip3 install --upgrade --force-reinstall virtualenv
# create and active the virtualenv
virtualenv venv
. venv/bin/activate
type pip # check it is /tmp/SMPyBandits/venv/bin/pip
type python # check it is /tmp/SMPyBandits/venv/bin/python
# install the requirements in the virtualenv
pip install -r requirements_full.txt
# run a single-player simulation!
N=100 T=10000 K=9 N_JOBS=4 make single
# run a multi-player simulation!
N=100 T=10000 M=3 K=9 N_JOBS=4 make moremulti
You can also install it directly with pip
and from GitHub:
cd /tmp/ ; mkdir SMPyBandits ; cd SMPyBandits/
virtualenv venv
. venv/bin/activate
type pip # check it is /tmp/SMPyBandits/venv/bin/pip
type python # check it is /tmp/SMPyBandits/venv/bin/python
pip install git+https://github.com/SMPyBandits/SMPyBandits.git#egg=SMPyBandits[full]
- If speed matters to you and you want to use algorithms based on kl-UCB, you should take the time to build and install the fast C implementation of the utilities KL functions. Default is to use kullback.py, but using the C version from Policies/C/ really speeds up the computations. Just follow the instructions, it should work well (you need
gcc
to be installed).- And if speed matters, be sure that you have a working version of Numba, it is used by many small functions to (try to automatically) speed up the computations.
A pinned Nix environment is available for this experimental setup in the nix/pkgs/
directory.
From the root of the project:
$ nix-shell
nix-shell$ jupyter_notebook
nix-shell$ N=100 T=10000 K=9 N_JOBS=4 make single
The following one-liner lets you explore one of the example notebooks from any Nix-enabled machine, without cloning the repository:
$ nix-shell https://github.com/SMPYBandits/SMPyBandits/archive/master.tar.gz --run 'jupyter-notebook $EXAMPLE_NOTEBOOKS/Example_of_a_small_Multi-Player_Simulation__with_Centralized_Algorithms.ipynb'
I don't except issues or pull requests on this project, but you are welcome to.
Contributions (issues, questions, pull requests) are of course welcome, but this project is and will stay a personal environment designed for quick research experiments, and will never try to be an industry-ready module for applications of Multi-Armed Bandits algorithms. If you want to contribute, please have a look to the CONTRIBUTING.md file, and if you want to be more seriously involved, read the CODE_OF_CONDUCT.md file.
See this file
TODO.md
, and the issues on GitHub.
MIT Licensed (file LICENSE).
© 2016-2018 Lilian Besson, with help from contributors.