Reinforcement learning algorithms in RLlib

Project README

====== raylab

Reinforcement learning algorithms in RLlib <https://github.com/ray-project/ray/tree/master/rllib>_ and PyTorch <https://pytorch.org>_.


.. code:: bash

      pip install raylab


Raylab provides agents and environments to be used with a normal RLlib/Tune setup. You can an agent's name (from the Algorithms_ section) to :code:raylab info list to list its top-level configurations:

.. code-block:: zsh

raylab info list SoftAC

.. code-block::

learning_starts: 0
    Hold this number of timesteps before first training operation.
policy: {}
    Sub-configurations for the policy class.
wandb: {}
    Configs for integration with Weights & Biases.

    Accepts arbitrary keyword arguments to pass to `wandb.init`.
    The defaults for `wandb.init` are:
    * name: `_name` property of the trainer.
    * config: full `config` attribute of the trainer
    * config_exclude_keys: `wandb` and `callbacks` configs
    * reinit: True

    Don't forget to:
      * install `wandb` via pip
      * login to W&B with the appropriate API key for your
      * set the `wandb/project` name in the config dict

    Check out the Quickstart for more information:

You can add the :code:--rllib flag to get the descriptions for all the options common to RLlib agents (or :code:Trainer\s)

Launching experiments can be done via the command line using :code:raylab experiment passing a file path with an agent's configuration through the :code:--config flag. The following command uses the cartpole example <examples/PG/cartpole_defaults.py>_ configuration file to launch an experiment using the vanilla Policy Gradient agent from the RLlib library.

.. code-block:: zsh

raylab experiment PG --name PG -s training_iteration 10 --config examples/PG/cartpole_defaults.py

You can also launch an experiment from a Python script normally using Ray and Tune. The following shows how you may use Raylab to perform an experiment comparing different types of exploration for the NAF agent.

.. code-block:: python

         import ray
         from ray import tune
         import raylab

         def main():
                 stop={"timesteps_total": 100000},
                     "env": "CartPoleSwingUp-v0",
                     "exploration_config": {
                         "type": tune.grid_search([

         if __name__ == "__main__":

One can then visualize the results using :code:raylab dashboard, passing the :code:local_dir used in the experiment. The dashboard lets you filter and group results in a quick way.

.. code-block:: zsh

raylab dashboard data/NAF/

.. image:: https://i.imgur.com/bVc6WC5.png :align: center

You can find the best checkpoint according to a metric (:code:episode_reward_mean by default) using :code:raylab find-best.

.. code-block:: zsh

raylab find-best data/NAF/

Finally, you can pass a checkpoint to :code:raylab rollout to see the returns collected by the agent and render it if the environment supports a visual :code:render() method. For example, you can use the output of the :code:find-best command to see the best agent in action.

.. code-block:: zsh

raylab rollout $(raylab find-best data/NAF/) --agent NAF


+--------------------------------------------------------+-------------------------+ | Paper | Agent Name | +--------------------------------------------------------+-------------------------+ | Actor Critic using Kronecker-factored Trust Region_ | ACKTR | +--------------------------------------------------------+-------------------------+ | Trust Region Policy Optimization_ | TRPO | +--------------------------------------------------------+-------------------------+ | Normalized Advantage Function_ | NAF | +--------------------------------------------------------+-------------------------+ | Stochastic Value Gradients_ | SVG(inf)/SVG(1)/SoftSVG | +--------------------------------------------------------+-------------------------+ | Soft Actor-Critic_ | SoftAC | +--------------------------------------------------------+-------------------------+ | Streamlined Off-Policy_ (DDPG) | SOP | +--------------------------------------------------------+-------------------------+ | Model-Based Policy Optimization_ | MBPO | +--------------------------------------------------------+-------------------------+ | Model-based Action-Gradient-Estimator_ | MAGE | +--------------------------------------------------------+-------------------------+

.. _Actor Critic using Kronecker-factored Trust Region: https://arxiv.org/abs/1708.05144 .. _Trust Region Policy Optimization: http://proceedings.mlr.press/v37/schulman15.html .. _Normalized Advantage Function: http://proceedings.mlr.press/v48/gu16.html .. _Stochastic Value Gradients: http://papers.nips.cc/paper/5796-learning-continuous-control-policies-by-stochastic-value-gradients .. _Soft Actor-Critic: http://proceedings.mlr.press/v80/haarnoja18b.html .. _Model-Based Policy Optimization: http://arxiv.org/abs/1906.08253 .. _Streamlined Off-Policy: https://arxiv.org/abs/1910.02208 .. _Model-based Action-Gradient-Estimator: https://arxiv.org/abs/2004.14309

Command-line interface

.. role:: bash(code) :language: bash

For a high-level description of the available utilities, run :bash:raylab --help

.. code:: bash

Usage: raylab [OPTIONS] COMMAND [ARGS]...

  RayLab: Reinforcement learning algorithms in RLlib.

  --help  Show this message and exit.

  dashboard    Launch the experiment dashboard to monitor training progress.
  episodes     Launch the episode dashboard to monitor state and action...
  experiment   Launch a Tune experiment from a config file.
  find-best    Find the best experiment checkpoint as measured by a metric.
  info         View information about an agent's config parameters.
  rollout      Wrap `rllib rollout` with customized options.
  test-module  Launch dashboard to test generative models from a checkpoint.


The project is structured as follows ::

|-- agents            # Trainer and Policy classes
|-- cli               # Command line utilities
|-- envs              # Gym environment registry and utilities
|-- logger            # Tune loggers
|-- policy            # Extensions and customizations of RLlib's policy API
|   |-- losses        # RL loss functions
|   |-- modules       # PyTorch neural network modules for TorchPolicy
|-- pytorch           # PyTorch extensions
|-- utils             # miscellaneous utilities
