Rl Baselines3 Zoo Versions Save

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

v2.3.0

1 month ago

Breaking Changes

Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
Upgraded to SB3 >= 2.3.0

Other

Added test dependencies to setup.py (@power-edge)
Simplify dependencies of requirements.txt (remove duplicates from setup.py)

Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.2.1...v2.3.0

v2.2.1

6 months ago

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes

Removed gym dependency, the package is still required for some pretrained agents.
Upgraded to SB3 >= 2.2.1
Upgraded to Huggingface-SB3 >= 3.0
Upgraded to pytablewriter >= 1.0

New Features

Added --eval-env-kwargs to train.py (@Quentin18)
Added ppo_lstm to hyperparams_opt.py (@technocrat13)

Bug fixes

Upgraded to pybullet_envs_gymnasium>=0.4.0
Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)

Documentation

Other

Updated docker image, removed support for X server
Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)
Switched to ruff for sorting imports
Updated tests to use shlex.split()
Fixed rl_zoo3/hyperparams_opt.py type hints
Fixed rl_zoo3/exp_manager.py type hints

v2.1.0

8 months ago

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes

Dropped python 3.7 support
SB3 now requires PyTorch 1.13+
Upgraded to SB3 >= 2.1.0
Upgraded to Huggingface-SB3 >= 2.3
Upgraded to Optuna >= 3.0
Upgraded to cloudpickle >= 2.2.1

New Features

Added python 3.11 support

Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.0.0...v2.1.0

v2.0.0

10 months ago

Warning Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023). We highly recommended you to upgrade to Python >= 3.8.

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes

Fixed bug in HistoryWrapper, now returns the correct obs space limits
Upgraded to SB3 >= 2.0.0
Upgraded to Huggingface-SB3 >= 2.2.5
Upgraded to Gym API 0.26+, RL Zoo3 doesn't work anymore with Gym 0.21

New Features

Added Gymnasium support
Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper

Bug fixes

Renamed CarRacing-v1 to CarRacing-v2 in hyperparameters
Huggingface push to hub now accepts a --n-timesteps argument to adjust the length of the video
Fixed record_video steps (before it was stepping in a closed env)

Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v1.8.0...v2.0.0

v1.8.0

1 year ago

Release 1.8.0 (2023-04-07)

We have run a massive and open source benchmark of all algorithms on all environments from the RL Zoo: Open RL Benchmark

New documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/

Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

Breaking Changes

Upgraded to SB3 >= 1.8.0
Upgraded to new HerReplayBuffer implementation that supports multiple envs
Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.

New Features

Tuned hyperparameters for RecurrentPPO on Swimmer
Documentation is now built using Sphinx and hosted on read the doc
Open RL Benchmark

Bug fixes

Set highway-env version to 1.5 and setuptools to v65.5 for the CI
Removed use_auth_token for push to hub util
Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see https://github.com/openai/gym/pull/1304)
Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)

Documentation

Documentation is now built using Sphinx and hosted on read the doc: https://rl-baselines3-zoo.readthedocs.io/en/master/

Other

Added support for ruff (fast alternative to flake8) in the Makefile
Removed Gitlab CI file
Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
Switched to ruff and pyproject.toml
Removed online_sampling and max_episode_length argument when using HerReplayBuffer

v1.7.0

1 year ago

Release 1.7.0 (2023-01-10)

SB3 v1.7.0, added support for python config files

We are currently creating an open source benchmark, please read https://github.com/openrlbenchmark/openrlbenchmark/issues/7 if you want to help

Breaking Changes

--yaml-file argument was renamed to -conf (--conf-file) as now python file are supported too
Upgraded to SB3 >= 1.7.0 (changed net_arch=[dict(pi=.., vf=..)] to net_arch=dict(pi=.., vf=..))

New Features

Specifying custom policies in yaml file is now supported (@Rick-v-E)
Added monitor_kwargs parameter
Handle the env_kwargs of render:True under the hood for panda-gym v1 envs in enjoy replay to match visualzation behavior of other envs
Added support for python config file
Tuned hyperparameters for PPO on Swimmer
Added -tags/--wandb-tags argument to train.py to add tags to the wandb run
Added a sb3 version tag to the wandb run

Bug fixes

Allow python -m rl_zoo3.cli to be called directly
Fixed a bug where custom environments were not found despite passing --gym-package when using subprocesses
Fixed TRPO hyperparameters for MinitaurBulletEnv-v0, MinitaurBulletDuckEnv-v0, HumanoidBulletEnv-v0, InvertedDoublePendulumBulletEnv-v0 and InvertedPendulumSwingupBulletEnv

Documentation

Other

scripts/plot_train.py plots models such that newer models appear on top of older ones.
Added additional type checking using mypy
Standardized the use of from gym import spaces

v1.6.2

1 year ago

Highlights

You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).

File: train.py (you can use python train.py --algo sbx_tqc --env Pendulum-v1 afterward)

import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train

from sbx import TQC

# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS

if __name__ == "__main__":
    train()

Breaking Changes

RL Zoo is now a python package
low pass filter was removed

New Features

RL Zoo cli: rl_zoo3 train and rl_zoo3 enjoy

v1.6.1

1 year ago

Breaking Changes

Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
Upgraded to sb3-contrib >= 1.6.1

New Features

Added --yaml-file argument option for train.pyto read hyperparameters from custom yaml files (@JohannesUl)

Bug fixes

Added custom_object parameter on record_video.py (@Affonso-Gui)
Changed optimize_memory_usage to False for DQN/QR-DQN on record_video.py (@Affonso-Gui)
In ExperimentManager _maybe_normalize set training to False for eval envs, to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani).
Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
Added progress bar via the -P argument using tqdm and rich

v1.6.0

1 year ago

Release 1.6.0 (2022-08-05)

Breaking Changes

Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
Updated default --eval-freq from 10k to 25k steps
Update default horizon to 2 for the HistoryWrapper
Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
Upgrade to sb3-contrib >= 1.6.0

New Features

Support setting PyTorch's device with thye --device flag (@gregwar)
Add --max-total-trials parameter to help with distributed optimization. (@ernestum)
Added vec_env_wrapper support in the config (works the same as env_wrapper)
Added Huggingface hub integration
Added RecurrentPPO support (aka ppo_lstm)
Added autodownload for "official" sb3 models from the hub
Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
Added MsPacman models

Bug fixes

Fix Reacher-v3 name in PPO hyperparameter file
Pinned ale-py==0.7.4 until new SB3 version is released
Fix enjoy / record videos with LSTM policy
Fix bug with environments that have a slash in their name (@ernestum)
Changed optimize_memory_usage to False for DQN/QR-DQN on Atari games, if you want to save RAM, you need to deactivate handle_timeout_termination in the replay_buffer_kwargs

Documentation

Other

When pruner is set to "none", use NopPruner instead of diverted MedianPruner (@qgallouedec)

v1.5.0

2 years ago

Release 1.5.0 (2022-03-25)

Support for Weight and Biases experiment tracking

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
Upgrade to sb3-contrib >= 1.5.0
Upgraded to gym 0.21

New Features

Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
Support experiment tracking via Weights and Biases via the --track flag (@vwxyzjn)
Support tracking raw episodic stats via RawStatisticsCallback (@vwxyzjn, see https://github.com/DLR-RM/rl-baselines3-zoo/pull/216)

Bug fixes

Policies saved during during optimization with distributed Optuna load on new systems (@jkterry)
Fixed script for recording video that was not up to date with the enjoy script