Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
MaskablePPO
was updated to match latest SB3 PPO
version (timeout handling and new method for the policy object)TRPO
(@cyprienc)HerReplayBuffer
currently not supported)MaskablePPO
WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.
sde_net_arch
MaskablePPO
algorithm (@kronion)MaskablePPO
Dictionary Observation support (@glmcdona)TQC
Blog post: https://araffin.github.io/post/sb3/
QR-DQN
predict method when using deterministic=False
with image spaceTimeFeatureWrapper
to the wrappersQR-DQN
algorithm (@ku2482
_)TQC
when saving/loading the policy only with non-default number of quantilesQR-DQN
when calculating the target quantiles (@ku2482, @guyk1971)TQC
to match new SB3 versionquantile_huber_loss
to common/utils.py
(@ku2482)