A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
setup.py
(@power-edge)requirements.txt
(remove duplicates from setup.py
)Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.2.1...v2.3.0
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
gym
dependency, the package is still required for some pretrained agents.--eval-env-kwargs
to train.py
(@Quentin18)ppo_lstm
to hyperparams_opt.py (@technocrat13)pybullet_envs_gymnasium>=0.4.0
optuna.suggest_uniform(...)
by optuna.suggest_float(..., low=..., high=...)
shlex.split()
rl_zoo3/hyperparams_opt.py
type hintsrl_zoo3/exp_manager.py
type hintsSB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.0.0...v2.1.0
Warning Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023). We highly recommended you to upgrade to Python >= 3.8.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
CarRacing-v1
to CarRacing-v2
in hyperparameters--n-timesteps
argument to adjust the length of the videorecord_video
steps (before it was stepping in a closed env)Full Changelog: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v1.8.0...v2.0.0
We have run a massive and open source benchmark of all algorithms on all environments from the RL Zoo: Open RL Benchmark
New documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/
Warning Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
HerReplayBuffer
implementation that supports multiple envsTimeFeatureWrapper
for Panda and Fetch envs, as the new replay buffer should handle timeout.highway-env
version to 1.5 and setuptools to
v65.5 for the CIuse_auth_token
for push to hub utilgym-minigrid
policy (from MlpPolicy
to MultiInputPolicy
)ruff
(fast alternative to flake8) in the Makefileoptuna.suggest_loguniform(...)
by optuna.suggest_float(..., log=True)
ruff
and pyproject.toml
online_sampling
and max_episode_length
argument when using HerReplayBuffer
SB3 v1.7.0, added support for python config files
We are currently creating an open source benchmark, please read https://github.com/openrlbenchmark/openrlbenchmark/issues/7 if you want to help
--yaml-file
argument was renamed to -conf
(--conf-file
) as now python file are supported toonet_arch=[dict(pi=.., vf=..)]
to net_arch=dict(pi=.., vf=..)
)monitor_kwargs
parameterenv_kwargs
of render:True
under the hood for panda-gym v1 envs in enjoy
replay to match visualzation behavior of other envs-tags/--wandb-tags
argument to train.py
to add tags to the wandb runpython -m rl_zoo3.cli
to be called directly--gym-package
when using subprocessesscripts/plot_train.py
plots models such that newer models appear on top of older ones.from gym import spaces
You can now install the RL Zoo via pip: pip install rl-zoo3
and it has a basic command line interface (rl_zoo3 train|enjoy|plot_train|all_plots
) that has the same interface as the scripts (train.py|enjoy.py|...
).
You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).
File: train.py
(you can use python train.py --algo sbx_tqc --env Pendulum-v1
afterward)
import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train
from sbx import TQC
# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS
if __name__ == "__main__":
train()
rl_zoo3 train
and rl_zoo3 enjoy
--yaml-file
argument option for train.py
to read hyperparameters from custom yaml files (@JohannesUl)custom_object
parameter on record_video.py (@Affonso-Gui)optimize_memory_usage
to False
for DQN/QR-DQN on record_video.py (@Affonso-Gui)ExperimentManager
_maybe_normalize
set training
to False
for eval envs,
to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani).-P
argument using tqdm and richHistoryWrapper
--device
flag (@gregwar)--max-total-trials
parameter to help with distributed optimization. (@ernestum)vec_env_wrapper
support in the config (works the same as env_wrapper
)RecurrentPPO
support (aka ppo_lstm
)Reacher-v3
name in PPO hyperparameter fileoptimize_memory_usage
to False
for DQN/QR-DQN on Atari games,
if you want to save RAM, you need to deactivate handle_timeout_termination
in the replay_buffer_kwargs
"none"
, use NopPruner
instead of diverted MedianPruner
(@qgallouedec)Support for Weight and Biases experiment tracking
--track
flag (@vwxyzjn)RawStatisticsCallback
(@vwxyzjn, see https://github.com/DLR-RM/rl-baselines3-zoo/pull/216)