Stable Baselines3 Versions Save

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

v2.3.0

1 month ago

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

The defaults hyperparameters of TD3 and DDPG have been changed to be more consistent with SAC


  # SB3 < 2.3.0 default hyperparameters
  # model = TD3("MlpPolicy", env, train_freq=(1, "episode"), gradient_steps=-1, batch_size=100)
  # SB3 >= 2.3.0:
  model = TD3("MlpPolicy", env, train_freq=1, gradient_steps=1, batch_size=256)

[!NOTE] Two inconsistencies remain: the default network architecture for TD3/DDPG is [400, 300] instead of [256, 256] for SAC (for backward compatibility reasons, see report on the influence of the network size ) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see W&B report on the influence of the lr )

The default learning_starts parameter of DQN have been changed to be consistent with the other offpolicy algorithms


  # SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
  # model = DQN("MlpPolicy", env, learning_starts=50_000)
  # SB3 >= 2.3.0:
  model = DQN("MlpPolicy", env, learning_starts=100)

For safety, torch.load() is now called with weights_only=True when loading torch tensors, policy load() still uses weights_only=False as gymnasium imports are required for it to work
When using huggingface_sb3, you will now need to set TRUST_REMOTE_CODE=True when downloading models from the hub, as pickle.load is not safe.

New Features:

Log success rate rollout/success_rate when available for on policy algorithms (@corentinlger)

Bug Fixes:

Fixed monitor_wrapper argument that was not passed to the parent class, and dones argument that wasn't passed to _update_into_buffer (@corentinlger)

SB3-Contrib

Added rollout_buffer_class and rollout_buffer_kwargs arguments to MaskablePPO
Fixed train_freq type annotation for tqc and qrdqn (@Armandpl)
Fixed sb3_contrib/common/maskable/*.py type annotations
Fixed sb3_contrib/ppo_mask/ppo_mask.py type annotations
Fixed sb3_contrib/common/vec_env/async_eval.py type annotations
Add some additional notes about MaskablePPO (evaluation and multi-process) (@icheered)

RL Zoo

Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
Added test dependencies to setup.py (@power-edge)
Simplify dependencies of requirements.txt (remove duplicates from setup.py)

SBX (SB3 + Jax)

Added support for MultiDiscrete and MultiBinary action spaces to PPO
Added support for large values for gradient_steps to SAC, TD3, and TQC
Fix train() signature and update type hints
Fix replay buffer device at load time
Added flatten layer
Added CrossQ

Others:

Updated black from v23 to v24
Updated ruff to >= v0.3.1
Updated env checker for (multi)discrete spaces with non-zero start.

Documentation:

Added a paragraph on modifying vectorized environment parameters via setters (@fracapuano)
Updated callback code example
Updated export to ONNX documentation, it is now much simpler to export SB3 models with newer ONNX Opset!
Added video link to "Practical Tips for Reliable Reinforcement Learning" video
Added render_mode="human" in the README example (@marekm4)
Fixed docstring signature for sum_independent_dims (@stagoverflow)
Updated docstring description for log_interval in the base class (@rushitnshah).

Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.2.1...v2.3.0

v2.2.1

5 months ago

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

[!NOTE] Stable-Baselines3 (SB3) v2.2.0 was yanked after a breaking change was found in GH#1751. Please use SB3 v2.2.1 and not v2.2.0.

Breaking Changes:

Switched to ruff for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
Dropped x is False in favor of not x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)

New Features:

Improved error message of the env_checker for env wrongly detected as GoalEnv (compute_reward() is defined)
Improved error message when mixing Gym API with VecEnv API (see GH#1694)
Add support for setting options at reset with VecEnv via the set_options() method. Same as seeds logic, options are reset at the end of an episode (@ReHoss)
Added rollout_buffer_class and rollout_buffer_kwargs arguments to on-policy algorithms (A2C and PPO)

Bug Fixes:

Prevents using squash_output and not use_sde in ActorCritcPolicy (@PatrickHelm)
Performs unscaling of actions in collect_rollout in OnPolicyAlgorithm (@PatrickHelm)
Moves VectorizedActionNoise into _setup_learn() in OffPolicyAlgorithm (@PatrickHelm)
Prevents out of bound error on Windows if no seed is passed (@PatrickHelm)
Calls callback.update_locals() before callback.on_rollout_end() in OnPolicyAlgorithm (@PatrickHelm)
Fixed replay buffer device after loading in OffPolicyAlgorithm (@PatrickHelm)
Fixed render_mode which was not properly loaded when using VecNormalize.load()
Fixed success reward dtype in SimpleMultiObsEnv (@NixGD)
Fixed check_env for Sequence observation space (@corentinlger)
Prevents instantiating BitFlippingEnv with conflicting observation spaces (@kylesayrs)
Fixed ResourceWarning when loading and saving models (files were not closed), please note that only path are closed automatically, the behavior stay the same for tempfiles (they need to be closed manually), the behavior is now consistent when loading/saving replay buffer

SB3-Contrib

Added set_options for AsyncEval
Added rollout_buffer_class and rollout_buffer_kwargs arguments to TRPO

RL Zoo

Removed gym dependency, the package is still required for some pretrained agents.
Added --eval-env-kwargs to train.py (@Quentin18)
Added ppo_lstm to hyperparams_opt.py (@technocrat13)
Upgraded to pybullet_envs_gymnasium>=0.4.0
Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
Updated docker image, removed support for X server
Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)

SBX (SB3 + Jax)

Added DDPG and TD3 algorithms

Others:

Fixed stable_baselines3/common/callbacks.py type hints
Fixed stable_baselines3/common/utils.py type hints
Fixed stable_baselines3/common/vec_envs/vec_transpose.py type hints
Fixed stable_baselines3/common/vec_env/vec_video_recorder.py type hints
Fixed stable_baselines3/common/save_util.py type hints
Updated docker images to Ubuntu Jammy using micromamba 1.5
Fixed stable_baselines3/common/buffers.py type hints
Fixed stable_baselines3/her/her_replay_buffer.py type hints
Buffers do no call an additional .copy() when storing new transitions
Fixed ActorCriticPolicy.extract_features() signature by adding an optional features_extractor argument
Update dependencies (accept newer Shimmy/Sphinx version and remove sphinx_autodoc_typehints)
Fixed stable_baselines3/common/off_policy_algorithm.py type hints
Fixed stable_baselines3/common/distributions.py type hints
Fixed stable_baselines3/common/vec_env/vec_normalize.py type hints
Fixed stable_baselines3/common/vec_env/__init__.py type hints
Switched to PyTorch 2.1.0 in the CI (fixes type annotations)
Fixed stable_baselines3/common/policies.py type hints
Switched to mypy only for checking types
Added tests to check consistency when saving/loading files

Documentation:

Updated RL Tips and Tricks (include recommendation for evaluation, added links to DroQ, ARS and SBX).
Fixed various typos and grammar mistakes

Full changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.1.0...v2.2.1

v2.1.0

8 months ago

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

Removed Python 3.7 support
SB3 now requires PyTorch >= 1.13

New Features:

Added Python 3.11 support
Added Gymnasium 0.29 support (@pseudo-rnd-thoughts)

SB3-Contrib

Fixed MaskablePPO ignoring stats_window_size argument
Added Python 3.11 support

RL Zoo

Upgraded to Huggingface-SB3 >= 2.3
Added Python 3.11 support

Bug Fixes:

Relaxed check in logger, that was causing issue on Windows with colorama
Fixed off-policy algorithms with continuous float64 actions (see #1145) (@tobirohrer)
Fixed env_checker.py warning messages for out of bounds in complex observation spaces (@Gabo-Tor)

Others:

Updated GitHub issue templates
Fix typo in gym patch error message (@lukashass)
Refactor test_spaces.py tests

Documentation:

Fixed callback example (@BertrandDecoster)
Fixed policy network example (@kyle-he)
Added mobile-env as new community project (@stefanbschneider)
Added DeepNetSlice to community projects (@AlexPasqua)

Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v2.0.0...v2.1.0

v2.0.0

10 months ago

[!WARNING] Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023). We highly recommended you to upgrade to Python >= 3.8.

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the shimmy package (@carlosluis, @arjun-kg, @tlpss)
The deprecated online_sampling argument of HerReplayBuffer was removed
Removed deprecated stack_observation_space method of StackedObservations
Renamed environment output observations in evaluate_policy to prevent shadowing the input observations during callbacks (@npit)
Upgraded wrappers and custom environment to Gymnasium
Refined the HumanOutputFormat file check: now it verifies if the object is an instance of io.TextIOBase instead of only checking for the presence of a write method.
Because of new Gym API (0.26+), the random seed passed to vec_env.seed(seed=seed) will only be effective after then env.reset() call.

New Features:

Added Gymnasium support (Gym 0.21 and 0.26 are supported via the shimmy package)

SB3-Contrib

Fixed QRDQN update interval for multi envs

RL Zoo

Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
Renamed CarRacing-v1 to CarRacing-v2 in hyperparameters
Huggingface push to hub now accepts a --n-timesteps argument to adjust the length of the video
Fixed record_video steps (before it was stepping in a closed env)
Dropped Gym 0.21 support

Bug Fixes:

Fixed VecExtractDictObs does not handle terminal observation (@WeberSamuel)
Set NumPy version to >=1.20 due to use of numpy.typing (@troiganto)
Fixed loading DQN changes target_update_interval (@tobirohrer)
Fixed env checker to properly reset the env before calling step() when checking for Inf and NaN (@lutogniew)
Fixed HER truncate_last_trajectory() (@lbergmann1)
Fixed HER desired and achieved goal order in reward computation (@JonathanKuelz)

Others:

Fixed stable_baselines3/a2c/*.py type hints
Fixed stable_baselines3/ppo/*.py type hints
Fixed stable_baselines3/sac/*.py type hints
Fixed stable_baselines3/td3/*.py type hints
Fixed stable_baselines3/common/base_class.py type hints
Fixed stable_baselines3/common/logger.py type hints
Fixed stable_baselines3/common/envs/*.py type hints
Fixed stable_baselines3/common/vec_env/vec_monitor|vec_extract_dict_obs|util.py type hints
Fixed stable_baselines3/common/vec_env/base_vec_env.py type hints
Fixed stable_baselines3/common/vec_env/vec_frame_stack.py type hints
Fixed stable_baselines3/common/vec_env/dummy_vec_env.py type hints
Fixed stable_baselines3/common/vec_env/subproc_vec_env.py type hints
Upgraded docker images to use mamba/micromamba and CUDA 11.7
Updated env checker to reflect what subset of Gymnasium is supported and improve GoalEnv checks
Improve type annotation of wrappers
Tests envs are now checked too
Added render test for VecEnv and VecEnvWrapper
Update issue templates and env info saved with the model
Changed seed() method return type from List to Sequence
Updated env checker doc and requirements for tuple spaces/goal envs

Documentation:

Added Deep RL Course link to the Deep RL Resources page
Added documentation about VecEnv API vs Gym API
Upgraded tutorials to Gymnasium API
Make it more explicit when using VecEnv vs Gym env
Added UAV_Navigation_DRL_AirSim to the project page (@heleidsn)
Added EvalCallback example (@sidney-tio)
Update custom env documentation
Added pink-noise-rl to projects page
Fix custom policy example, ortho_init was ignored
Added SBX page

Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v1.8.0...v2.0.0

v1.8.0

1 year ago

[!WARNING] Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend. Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs). You can find a migration guide here. If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

Removed shared layers in mlp_extractor (@AlexPasqua)
Refactored StackedObservations (it now handles dict obs, StackedDictObservations was removed)
You must now explicitely pass a features_extractor parameter when calling extract_features()
Dropped offline sampling for HerReplayBuffer
As HerReplayBuffer was refactored to support multiprocessing, previous replay buffer are incompatible with this new version
HerReplayBuffer doesn't require a max_episode_length anymore

New Features:

Added repeat_action_probability argument in AtariWrapper.
Only use NoopResetEnv and MaxAndSkipEnv when needed in AtariWrapper
Added support for dict/tuple observations spaces for VecCheckNan, the check is now active in the env_checker() (@DavyMorgan)
Added multiprocessing support for HerReplayBuffer
HerReplayBuffer now supports all datatypes supported by ReplayBuffer
Provide more helpful failure messages when validating the observation_space of custom gym environments using check_env (@FieteO)
Added stats_window_size argument to control smoothing in rollout logging (@jonasreiher)

SB3-Contrib

Added warning about potential crashes caused by check_env in the MaskablePPO docs (@AlexPasqua)
Fixed sb3_contrib/qrdqn/*.py type hints
Removed shared layers in mlp_extractor (@AlexPasqua)

RL Zoo

Open RL Benchmark
Upgraded to new HerReplayBuffer implementation that supports multiple envs
Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.
Tuned hyperparameters for RecurrentPPO on Swimmer
Documentation is now built using Sphinx and hosted on read the doc
Removed use_auth_token for push to hub util
Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see https://github.com/openai/gym/pull/1304)
Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)
Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
Switched to ruff and pyproject.toml
Removed online_sampling and max_episode_length argument when using HerReplayBuffer

Bug Fixes:

Fixed Atari wrapper that missed the reset condition (@luizapozzobon)
Added the argument dtype (default to float32) to the noise for consistency with gym action (@sidney-tio)
Fixed PPO train/n_updates metric not accounting for early stopping (@adamfrly)
Fixed loading of normalized image-based environments
Fixed DictRolloutBuffer.add with multidimensional action space (@younik)

Deprecations:

Others:

Fixed tests/test_tensorboard.py type hint
Fixed tests/test_vec_normalize.py type hint
Fixed stable_baselines3/common/monitor.py type hint
Added tests for StackedObservations
Removed Gitlab CI file
Moved from setup.cg to pyproject.toml configuration file
Switched from flake8 to ruff
Upgraded AutoROM to latest version
Fixed stable_baselines3/dqn/*.py type hints
Added extra_no_roms option for package installation without Atari Roms

Documentation:

Renamed load_parameters to set_parameters (@DavyMorgan)
Clarified documentation about subproc multiprocessing for A2C (@Bonifatius94)
Fixed typo in A2C docstring (@AlexPasqua)
Renamed timesteps to episodes for log_interval description (@theSquaredError)
Removed note about gif creation for Atari games (@harveybellini)
Added information about default network architecture
Update information about Gymnasium support

v1.7.0

1 year ago

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Warning Shared layers in MLP policy (mlp_extractor) are now deprecated for PPO, A2C and TRPO. This feature will be removed in SB3 v1.8.0 and the behavior of net_arch=[64, 64] will create separate networks with the same architecture, to be consistent with the off-policy algorithms.

Note A2C and PPO models saved with SB3 < 1.7.0 will show a warning about missing keys in the state dict when loaded with SB3 >= 1.7.0. To suppress the warning, simply save the model again. You can find more info in issue #1233

Breaking Changes:

Removed deprecated create_eval_env, eval_env, eval_log_path, n_eval_episodes and eval_freq parameters, please use an EvalCallback instead
Removed deprecated sde_net_arch parameter
Removed ret attributes in VecNormalize, please use returns instead
VecNormalize now updates the observation space when normalizing images

New Features:

Introduced mypy type checking
Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
Added with_bias argument to create_mlp
Added support for multidimensional spaces.MultiBinary observations
Features extractors now properly support unnormalized image-like observations (3D tensor) when passing normalize_images=False
Added normalized_image parameter to NatureCNN and CombinedExtractor
Added support for Python 3.10

SB3-Contrib

Fixed a bug in RecurrentPPO where the lstm states where incorrectly reshaped for n_lstm_layers > 1 (thanks @kolbytn)
Fixed RuntimeError: rnn: hx is not contiguous while predicting terminal values for RecurrentPPO when n_lstm_layers > 1

RL Zoo

Added support for python file for configuration
Added monitor_kwargs parameter

Bug Fixes:

Fixed ProgressBarCallback under-reporting (@dominicgkerr)
Fixed return type of evaluate_actions in ActorCritcPolicy to reflect that entropy is an optional tensor (@Rocamonde)
Fixed type annotation of policy in BaseAlgorithm and OffPolicyAlgorithm
Allowed model trained with Python 3.7 to be loaded with Python 3.8+ without the custom_objects workaround
Raise an error when the same gym environment instance is passed as separate environments when creating a vectorized environment with more than one environment. (@Rocamonde)
Fix type annotation of model in evaluate_policy
Fixed Self return type using TypeVar
Fixed the env checker, the key was not passed when checking images from Dict observation space
Fixed normalize_images which was not passed to parent class in some cases
Fixed load_from_vector that was broken with newer PyTorch version when passing PyTorch tensor

Deprecations:

You should now explicitely pass a features_extractor parameter when calling extract_features()
Deprecated shared layers in MlpExtractor (@AlexPasqua)

Others:

Used issue forms instead of issue templates
Updated the PR template to associate each PR with its peer in RL-Zoo3 and SB3-Contrib
Fixed flake8 config to be compatible with flake8 6+
Goal-conditioned environments are now characterized by the availability of the compute_reward method, rather than by their inheritance to gym.GoalEnv
Replaced CartPole-v0 by CartPole-v1 is tests
Fixed tests/test_distributions.py type hints
Fixed stable_baselines3/common/type_aliases.py type hints
Fixed stable_baselines3/common/torch_layers.py type hints
Fixed stable_baselines3/common/env_util.py type hints
Fixed stable_baselines3/common/preprocessing.py type hints
Fixed stable_baselines3/common/atari_wrappers.py type hints
Fixed stable_baselines3/common/vec_env/vec_check_nan.py type hints
Exposed modules in __init__.py with the __all__ attribute (@ZikangXiong)
Upgraded GitHub CI/setup-python to v4 and checkout to v3
Set tensors construction directly on the device (~8% speed boost on GPU)
Monkey-patched np.bool = bool so gym 0.21 is compatible with NumPy 1.24+
Standardized the use of from gym import spaces
Modified get_system_info to avoid issue linked to copy-pasting on GitHub issue

Documentation:

Updated Hugging Face Integration page (@simoninithomas)
Changed env to vec_env when environment is vectorized
Updated custom policy docs to better explain the mlp_extractor's dimensions (@AlexPasqua)
Updated custom policy documentation (@athatheo)
Improved tensorboard callback doc
Clarify doc when using image-like input
Added RLeXplore to the project page (@yuanmingqi)

v1.6.2

1 year ago

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3: https://github.com/DLR-RM/rl-baselines3-zoo

New Features:

Added progress_bar argument in the learn() method, displayed using TQDM and rich packages
Added progress bar callback

RL Zoo3

The RL Zoo can now be installed as a package (pip install rl_zoo3)

Bug Fixes:

self.num_timesteps was initialized properly only after the first call to on_step() for callbacks
Set importlib-metadata version to ~=4.13 to be compatible with gym=0.21

Deprecations:

Added deprecation warning if parameters eval_env, eval_freq or create_eval_env are used (see #925) (@tobirohrer)

Others:

Fixed type hint of the env_id parameter in make_vec_env and make_atari_env (@AlexPasqua)

Documentation:

Extended docstring of the wrapper_class parameter in make_vec_env (@AlexPasqua)

v1.6.1

1 year ago

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

Switched minimum tensorboard version to 2.9.1

New Features:

Support logging hyperparameters to tensorboard (@timothe-chaumont)
Added checkpoints for replay buffer and VecNormalize statistics (@anand-bala)
Added option for Monitor to append to existing file instead of overriding (@sidney-tio)
The env checker now raises an error when using dict observation spaces and observation keys don't match observation space keys

SB3-Contrib

Fixed the issue of wrongly passing policy arguments when using CnnLstmPolicy or MultiInputLstmPolicy with RecurrentPPO (@mlodel)

Bug Fixes:

Fixed issue where PPO gives NaN if rollout buffer provides a batch of size 1 (@hughperkins)
Fixed the issue that predict does not always return action as np.ndarray (@qgallouedec)
Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
Added multidimensional action space support (@qgallouedec)
Fixed missing verbose parameter passing in the EvalCallback constructor (@burakdmb)
Fixed the issue that when updating the target network in DQN, SAC, TD3, the running_mean and running_var properties of batch norm layers are not updated (@honglu2875)
Fixed incorrect type annotation of the replay_buffer_class argument in common.OffPolicyAlgorithm initializer, where an instance instead of a class was required (@Rocamonde)
Fixed loading saved model with different number of envrionments
Removed forward() abstract method declaration from common.policies.BaseModel (already defined in torch.nn.Module) to fix type errors in subclasses (@Rocamonde)
Fixed the return type of .load() and .learn() methods in BaseAlgorithm so that they now use TypeVar (@Rocamonde)
Fixed an issue where keys with different tags but the same key raised an error in common.logger.HumanOutputFormat (@Rocamonde and @AdamGleave)

Others:

Fixed DictReplayBuffer.next_observations typing (@qgallouedec)
Added support for device="auto" in buffers and made it default (@qgallouedec)
Updated ResultsWriter` (used internally by Monitorwrapper) to automatically create missing directories whenfilename`` is a path (@dominicgkerr)

Documentation:

Added an example of callback that logs hyperparameters to tensorboard. (@timothe-chaumont)
Fixed typo in docstring "nature" -> "Nature" (@Melanol)
Added info on split tensorboard logs into (@Melanol)
Fixed typo in ppo doc (@francescoluciano)
Fixed typo in install doc(@jlp-ue)
Clarified and standardized verbosity documentation
Added link to a GitHub issue in the custom policy documentation (@AlexPasqua)
Fixed typos (@Akhilez)

v1.6.0

1 year ago

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
SB3 now requires PyTorch >= 1.11
Changed the default network architecture when using CnnPolicy or MultiInputPolicy with SAC or DDPG/TD3, share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)

SB3-Contrib

Added Recurrent PPO (PPO LSTM). See https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/53

Bug Fixes:

Fixed saving and loading large policies greater than 2GB (@jkterry1, @ycheng517)
Fixed final goal selection strategy that did not sample the final achieved goal (@qgallouedec)
Fixed a bug with special characters in the tensorboard log name (@quantitative-technologies)
Fixed a bug in DummyVecEnv's and SubprocVecEnv's seeding function. None value was unchecked (@ScheiklP)
Fixed a bug where EvalCallback would crash when trying to synchronize VecNormalize stats when observation normalization was disabled
Added a check for unbounded actions
Fixed issues due to newer version of protobuf (tensorboard) and sphinx
Fix exception causes all over the codebase (@cool-RR)
Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination due to a bug (@MWeltevrede)
Fixed a bug in kl_divergence check that would fail when using numpy arrays with MultiCategorical distribution

Others:

Upgraded to Python 3.7+ syntax using pyupgrade
Removed redundant double-check for nested observations from BaseAlgorithm._wrap_env (@TibiGG)

Documentation:

Added link to gym doc and gym env checker
Fix typo in PPO doc (@bcollazo)
Added link to PPO ICLR blog post
Added remark about breaking Markov assumption and timeout handling
Added doc about MLFlow integration via custom logger (@git-thor)
Updated Huggingface integration doc
Added copy button for code snippets
Added doc about EnvPool and Isaac Gym support

v1.5.0

2 years ago

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

Switched minimum Gym version to 0.21.0.

New Features:

Added StopTrainingOnNoModelImprovement to callback collection (@caburu)
Makes the length of keys and values in HumanOutputFormat configurable, depending on desired maximum width of output.
Allow PPO to turn of advantage normalization (see PR #763) @vwxyzjn

SB3-Contrib

coming soon: Cross Entropy Method, see https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/62

Bug Fixes:

Fixed a bug in VecMonitor. The monitor did not consider the info_keywords during stepping (@ScheiklP)
Fixed a bug in HumanOutputFormat. Distinct keys truncated to the same prefix would overwrite each others value, resulting in only one being output. This now raises an error (this should only affect a small fraction of use cases with very long keys.)
Routing all the nn.Module calls through implicit rather than explict forward as per pytorch guidelines (@manuel-delverme)
Fixed a bug in VecNormalize where error occurs when norm_obs is set to False for environment with dictionary observation (@buoyancy99)
Set default env argument to None in HerReplayBuffer.sample (@qgallouedec)
Fix batch_size typing in DQN (@qgallouedec)
Fixed sample normalization in DictReplayBuffer (@qgallouedec)

Others:

Fixed pytest warnings
Removed parameter remove_time_limit_termination in off policy algorithms since it was dead code (@Gregwar)

Documentation:

Added doc on Hugging Face integration (@simoninithomas)
Added furuta pendulum project to project list (@armandpl)
Fix indentation 2 spaces to 4 spaces in custom env documentation example (@Gautam-J)
Update MlpExtractor docstring (@gianlucadecola)
Added explanation of the logger output
Update Directly Accessing The Summary Writer in tensorboard integration (@xy9485)

Full Changelog: https://github.com/DLR-RM/stable-baselines3/compare/v1.4.0...v1.5.0