Tianshou Versions Save

An elegant PyTorch deep reinforcement learning library.

v1.0.0

1 month ago

Release 1.0.0

This release focuses on updating and improving Tianshou internals (in particular, code quality) while creating relatively few breaking changes (apart from things like the python and dependencies' versions).

We view it as a significant step for transforming Tianshou into the go-to place both for RL researchers, as well as for RL practitioners working on industry projects. This is the first release after the appliedAI Institute (the TransferLab division) has decided to further develop Tianshou and provide long-term support.

Breaking Changes

dropped support of python<3.11
dropped support of gym, from now on only Gymnasium envs are supported
removed functions like offpolicy_trainer in favor of OffpolicyTrainer(...).run() (this affects all example scripts)
several breaking changes related to removing **kwargs from signatures, renamings of internal attributes (like critic1 -> critic)
Outputs of training methods are now dataclasses instead of dicts

Functionality Extensions

Major

High level interfaces for experiments, demonstrated by the new example scripts with names ending in _hl.py

Minor

Method to compute action directly from a policy's observation, can be used for unrolling
Support for custom keys in ReplayBuffer
Support for CalQL as part of CQL
Support for explicit setting of multiprocessing context for SubprocEnvWorker
critic2 no longer has to be explicitly constructed and passed if it is supposed to be the same network as critic (formerly critic1)

Internal Improvements

Build and Docs

Completely changed the build pipeline. Tianshou now uses poetry, black, ruff, poethepoet, nbqa and other niceties.
Notebook tutorials are now part of the repository (previously they were in a drive). They were fixed and are executed during the build as integration tests, in addition to serving as documentation. Parts of the content have been improved.
Documentation is now built with jupyter book. JavaScript code has been slightly improved, JS dependencies are included as part of the repository.
Many improvements in docstrings

Typing

Adding BatchPrototypes to cover the fields needed and returned by methods relying on batches in a backwards compatible way
Removing **kwargs from policies' constructors
Overall, much stricter and more correct typing. Removing kwargs and replacing dicts by dataclasses in several places.
Making use of Generic to express different kinds of stats that can be returned by learn and update
Improved typing in tests and examples, close to passing mypy

General

Reduced duplication, improved readability and simplified code in several places
Use dist.mode instead of inferring loc or argmax from the dist_fn input

Contributions

The OG creators

@Trinkle23897 participated in almost all aspects of the coordination and reviewed most of the merged PRs
@nuance1979 participated in several discussions

From appliedAI

The team working on this release of Tianshou consisted of @opcode81 @MischaPanch @maxhuettenrauch @carlocagnetta @bordeauxred

External contributions

@BFAnas participated in several discussions and contributed the CalQL implementation, extending the pre-processing logic.
@dantp-ai fixed many mypy issues and improved the tests
@arnaujc91 improved the logic of computing deterministic actions
Many other contributors, among them many new ones participated in this release. The Tianshou team is very grateful for your contributions!

v0.5.0

1 year ago

Enhancement

Gymnasium Integration (#789, @Markus28)
Implement args/kwargs for init of norm_layers and activation (#788, @janofsun)
Add "act" to preprocess_fn call in collector. (#801, @jamartinh)
Various update (#803, #826, @Trinkle23897)

Bug fix

Fix a bug in batch._is_batch_set (#825, @zbenmo)
Fix a bug in HERReplayBuffer (#817, @sunkafei)

v0.4.11

1 year ago

Enhancement

Hindsight Experience Replay as a replay buffer (#753, @Juno-T)
Fix Atari PPO example (#780, @nuance1979)
Update experiment details of MuJoCo benchmark (#779, @ChenDRAG)
Tiny change since the tests are more than unit tests (#765, @fzyzcjy)

Bug Fix

Multi-agent: gym->gymnasium; render() update (#769, @WillDudley)
Updated atari wrappers (#781, @Markus28)
Fix info not pass issue in PGPolicy (#787, @Trinkle23897)

v0.4.10

1 year ago

Enhancement

Changes to support Gym 0.26.0 (#748, @Markus28)
Added pre-commit (#752, @Markus28)
Added support for new PettingZoo API (#751, @Markus28)
Fix docs tictactoc dummy vector env (#749, @5cat)

Bug fix

Fix 2 bugs and refactor RunningMeanStd to support dict obs norm (#695, @Trinkle23897)
Do not allow async simulation for test collector (#705, @CWHer)
Fix venv wrapper reset retval error with gym env (#712, @Trinkle23897)

v0.4.9

1 year ago

Bug Fix

Fix save_checkpoint_fn return value to checkpoint_path (#659, @Trinkle23897)
Fix an off-by-one bug in trainer iterator (#659, @Trinkle23897)
Fix a bug in Discrete SAC evaluation; default to deterministic mode (#657, @nuance1979)
Fix a bug in trainer about test reward not logged because self.env_step is not set for offline setting (#660, @nuance1979)
Fix exception with watching pistonball environments (#663, @ycheng517)
Use env.np_random.integers instead of env.np_random.randint in Atari examples (#613, @ycheng517)

API Change

Upgrade gym to >=0.23.1, support seed and return_info arguments for reset (#613, @ycheng517)

New Features

Add BranchDQN for large discrete action spaces (#618, @BFAnas)
Add show_progress option for trainer (#641, @michalgregor)
Added support for clipping to DQNPolicy (#642, @michalgregor)
Implement TD3+BC for offline RL (#660, @nuance1979)
Add multiDiscrete to discrete gym action space wrapper (#664, @BFAnas)

Enhancement

Use envpool in vizdoom example (#634, @Trinkle23897)
Add Atari (discrete) SAC examples (#657, @nuance1979)

v0.4.8

1 year ago

Bug fix

Fix action scaling bug in SAC (#591, @ChenDRAG)

Enhancement

Add write_flush in two loggers, fix argument passing in WandbLogger (#581, @Trinkle23897)
Update Multi-agent RL docs and upgrade pettingzoo (#595, @ycheng517)
Add learning rate scheduler to BasePolicy (#598, @alexnikulkov)
Add Jupyter notebook tutorials using Google Colaboratory (#599, @ChenDRAG)
Unify utils.network: change action_dim to action_shape (#602, @Squeemos)
Update Mujoco bemchmark's webpage (#606, @ChenDRAG)
Add Atari results (#600, @gogoduan) (#616, @ChenDRAG)
Convert RL Unplugged Atari datasets to tianshou ReplayBuffer (#621, @nuance1979)
Implement REDQ (#623, @Jimenius)
Improve data loading from D4RL and convert RL Unplugged to D4RL format (#624, @nuance1979)
Add vecenv wrappers for obs_norm to support running mujoco experiment with envpool (#628, @Trinkle23897)

v0.4.7

2 years ago

Bug Fix

Add map_action_inverse for fixing the error of storing random action (#568)

API Change

Update WandbLogger implementation and update Atari examples, use Tensorboard SummaryWritter as core with wandb.init(..., sync_tensorboard=True) (#558, #562)
Rename save_fn to save_best_fn to avoid ambiguity (#575)
(Internal) Add tianshou.utils.deprecation for a unified deprecation wrapper. (#575)

New Features

Implement Generative Adversarial Imitation Learning (GAIL), add Mujoco examples (#550)
Add Trainers as generators: OnpolicyTrainer, OffpolicyTrainer, and OfflineTrainer; remove duplicated code and merge into base trainer (#559)

Enhancement

Add imitation baselines for offline RL (#566)

v0.4.6.post1

2 years ago

This release is to fix the conda pkg publish, support more gym version instead of only the newest one, and keep compatibility of internal API. See #536.

v0.4.6

2 years ago

Bug Fix

Fix casts to int by to_torch_as(...) calls in policies when using discrete actions (#521)

API Change

Change venv internal API name of worker: send_action -> send, get_result -> recv (align with envpool) (#517)

New Features

Add Intrinsic Curiosity Module (#503)
Implement CQLPolicy and offline_cql example (#506)
Pettingzoo environment support (#494)
Enable venvs.reset() concurrent execution (#517)

Enhancement

Remove reset_buffer() from reset method (#501)
Add atari ppo example (#523, #529)
Add VizDoom PPO example and results (#533)
Upgrade gym version to >=0.21 (#534)
Switch atari example to use EnvPool by default (#534)

Documentation

Update dqn tutorial and add envpool to docs (#526)

v0.4.5

2 years ago

Bug Fix

Fix tqdm issue (#481)
Fix atari wrapper to be deterministic (#467)
Add writer.flush() in TensorboardLogger to ensure real-time logging result (#485)

Enhancement

Implements set_env_attr and get_env_attr for vector environments (#478)
Implement BCQPolicy and offline_bcq example (#480)
Enable test_collector=None in 3 trainers to turn off testing during training (#485)
Fix an inconsistency in the implementation of Discrete CRR. Now it uses Critic class for its critic, following conventions in other actor-critic policies (#485)
Update several offline policies to use ActorCritic class for its optimizer to eliminate randomness caused by parameter sharing between actor and critic (#485)
Move Atari offline RL examples to examples/offline and tests to test/offline (#485)