An elegant PyTorch deep reinforcement learning library.
This release focuses on updating and improving Tianshou internals (in particular, code quality) while creating relatively few breaking changes (apart from things like the python and dependencies' versions).
We view it as a significant step for transforming Tianshou into the go-to place both for RL researchers, as well as for RL practitioners working on industry projects. This is the first release after the appliedAI Institute (the TransferLab division) has decided to further develop Tianshou and provide long-term support.
offpolicy_trainer
in favor of OffpolicyTrainer(...).run()
(this affects all example scripts)**kwargs
from signatures, renamings of internal attributes (like critic1
-> critic
)_hl.py
critic2
no longer has to be explicitly constructed and passed if it is supposed to be the same network as critic
(formerly critic1
)BatchPrototypes
to cover the fields needed and returned by methods relying on batches in a backwards compatible way**kwargs
from policies' constructorskwargs
and replacing dicts by dataclasses in several places.Generic
to express different kinds of stats that can be returned by learn
and update
tests
and examples
, close to passing mypydist.mode
instead of inferring loc
or argmax
from the dist_fn
inputThe team working on this release of Tianshou consisted of @opcode81 @MischaPanch @maxhuettenrauch @carlocagnetta @bordeauxred
env.np_random.integers
instead of env.np_random.randint
in Atari examples (#613, @ycheng517)>=0.23.1
, support seed
and return_info
arguments for reset (#613, @ycheng517)utils.network
: change action_dim to action_shape (#602, @Squeemos)wandb.init(..., sync_tensorboard=True)
(#558, #562)tianshou.utils.deprecation
for a unified deprecation wrapper. (#575)This release is to fix the conda pkg publish, support more gym version instead of only the newest one, and keep compatibility of internal API. See #536.
writer.flush()
in TensorboardLogger to ensure real-time logging result (#485)test_collector=None
in 3 trainers to turn off testing during training (#485)Critic
class for its critic, following conventions in other actor-critic policies (#485)ActorCritic
class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic (#485)examples/offline
and tests to test/offline
(#485)