Tianshou Versions Save

An elegant PyTorch deep reinforcement learning library.

v0.4.4

2 years ago

API Change

  1. add a new class DataParallelNet for multi-GPU training (#461)
  2. add ActorCritic for deterministic parameter grouping for share-head actor-critic network (#458)
  3. collector.collect() now returns 4 extra keys: rew/rew_std/len/len_std (previously this work is done in logger) (#459)
  4. rename WandBLogger -> WandbLogger (#441)

Bug Fix

  1. fix logging in atari examples (#444)

Enhancement

  1. save_fn() will be called at the beginning of trainer (#459)
  2. create a new page for logger (#463)
  3. add save_data and restore_data in wandb, allow more input arguments for wandb init, and integrate wandb into test/modelbase/test_psrl.py and examples/atari/atari_dqn.py (#441)

v0.4.3

2 years ago

Bug Fix

  1. fix a2c/ppo optimizer bug when sharing head (#428)
  2. fix ppo dual clip implementation (#435)

Enhancement

  1. add Rainbow (#386)
  2. add WandbLogger (#427)
  3. add env_id in preprocess_fn (#391)
  4. update README, add new chart and bibtex (#406)
  5. add Makefile, now you can use make commit-checks to automatically perform almost all checks (#432)
  6. add isort and yapf, apply to existing codebase (#432)
  7. add spelling check by using make spelling (#432)
  8. update contributing.rst (#432)

v0.4.2

2 years ago

Enhancement

  1. Add model-free dqn family: IQN (#371), FQF (#376)
  2. Add model-free on-policy algorithm: NPG (#344, #347), TRPO (#337, #340)
  3. Add offline-rl algorithm: CQL (#359), CRR (#367)
  4. Support deterministic evaluation for onpolicy algorithms (#354)
  5. Make trainer resumable (#350)
  6. Support different state size and fix exception in venv.__del__ (#352, #384)
  7. Add vizdoom example (#384)
  8. Add numerical analysis tool and interactive plot (#335, #341)

v0.4.1

3 years ago

API Change

  1. Add observation normalization in BaseVectorEnv (norm_obs, obs_rms, update_obs_rms and RunningMeanStd) (#308)
  2. Add policy.map_action to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (#313)
  3. Add lr_scheduler in on-policy algorithms, typically for LambdaLR (#318)

Note

To adapt with this version, you should change the action_range=... to action_space=env.action_space in policy initialization.

Bug Fix

  1. Fix incorrect behaviors (error when n/ep==0 and reward shown in tqdm) with on-policy algorithm (#306, #328)
  2. Fix q-value mask_action error for obs_next (#310)

Enhancement

  1. Release SOTA Mujoco benchmark (DDPG/TD3/SAC: #305, REINFORCE: #320, A2C: #325, PPO: #330) and add corresponding notes in /examples/mujoco/README.md
  2. Fix numpy>=1.20 typing issue (#323)
  3. Add cross-platform unittest (#331)
  4. Add a test on how to deal with finite env (#324)
  5. Add value normalization in on-policy algorithms (#319, #321)
  6. Separate advantage normalization and value normalization in PPO (#329)

v0.4.0

3 years ago

This release contains several API and behavior changes.

API Change

Buffer

  1. Add ReplayBufferManager, PrioritizedReplayBufferManager, VectorReplayBuffer, PrioritizedVectorReplayBuffer, CachedReplayBuffer (#278, #280);
  2. Change buffer.add API from buffer.add(obs, act, rew, done, obs_next, info, policy, ...) to buffer.add(batch, buffer_ids) in order to add data more efficient (#280);
  3. Add set_batch method in buffer (#278);
  4. Add sample_index method, same as sample but only return index instead of both index and batch data (#278);
  5. Add prev (one-step previous transition index), next (one-step next transition index) and unfinished_index (the last modified index whose done==False) (#278);
  6. Add internal method _alloc_by_keys_diff in batch to support any form of keys pop up (#280);

Collector

  1. Rewrite the original Collector, split the async function to AsyncCollector: Collector only supports sync mode, AsyncCollector support both modes (#280);
  2. Drop collector.collect(n_episode=List[int]) because the new collector can collect episodes without bias (#280);
  3. Move reward_metric from Collector to trainer (#280);
  4. Change Collector.collect logic: AsyncCollector.collect's semantic is the same as previous version, where collect(n_step or n_episode) will not collect exact n_step or n_episode transitions; Collector.collect(n_step or n_episode)'s semantic now changes to exact n_step or n_episode collect (#280);

Policy

  1. Add policy.exploration_noise(action, batch) -> action method instead of implemented in policy.forward() (#280);
  2. Add Timelimit.truncate handler in compute_*_returns (#296);
  3. remove ignore_done flag (#296);
  4. remove reward_normalization option in offpolicy-algorithm (will raise Error if set to True) (#298);

Trainer

  1. Change collect_per_step to step_per_collect (#293);
  2. Add update_per_step and episode_per_collect (#293);
  3. onpolicy_trainer now supports either step_collect or episode_collect (#293)
  4. Add BasicLogger and LazyLogger to log data more conveniently (#295)

Bug Fix

  1. Fix VectorEnv action_space seed randomness -- when call env.seed(seed), it will call env.action_space.seed(seed); otherwise using Collector.collect(..., random=True) will produce different result each time (#300, #303).

v0.3.2

3 years ago

Bug Fix

  1. fix networks under utils/discrete and utils/continuous cannot work well under CUDA+torch<=1.6.0 (#289)
  2. fix 2 bugs of Batch: creating keys in Batch.__setitem__ now throws ValueError instead of KeyError; _create_value now allows placeholder with stack=False option (#284)

Enhancement

  1. Add QR-DQN algorithm (#276)
  2. small optimization for Batch.cat and Batch.stack (#284), now it is almost as fast as v0.2.3

v0.3.1

3 years ago

API Change

  1. change utils.network args to support any form of MLP by default (#275), remove layer_num and hidden_layer_size, add hidden_sizes (a list of int indicate the network architecture)
  2. add HDF5 save/load method for ReplayBuffer (#261)
  3. add offline_trainer (#263)
  4. move Atari-related network to examples/atari/atari_network.py (#275)

Bug Fix

  1. fix a potential bug in discrete behavior cloning policy (#263)

Enhancement

  1. update SAC mujoco result (#246)
  2. add C51 algorithm with benchmark result (#266)
  3. enable type checking in utils.network (#275)

v0.3.0.post1

3 years ago

Several bug fix (trainer, test and docs)

v0.3.0

3 years ago

Since at this point, the code has largely changed from v0.2.0, we release version 0.3 from now on.

API Change

  1. add policy.updating and clarify collecting state and updating state in training (#224)
  2. change train_fn(epoch) to train_fn(epoch, env_step) and test_fn(epoch) to test_fn(epoch, env_step) (#229)
  3. remove out-of-the-date API: collector.sample, collector.render, collector.seed, VectorEnv (#210)

Bug Fix

  1. fix a bug in DDQN: target_q could not be sampled from np.random.rand (#224)
  2. fix a bug in DQN atari net: it should add a ReLU before the last layer (#224)
  3. fix a bug in collector timing (#224)
  4. fix a bug in the converter of Batch: deepcopy a Batch in to_numpy and to_torch (#213)
  5. ensure buffer.rew has a type of float (#229)

Enhancement

  1. Anaconda support: conda install -c conda-forge tianshou (#228)
  2. add PSRL (#202)
  3. add SAC discrete (#216)
  4. add type check in unit test (#200)
  5. format code and update function signatures (#213)
  6. add pydocstyle and doc8 check (#210)
  7. several documentation fix (#210)

v0.3.0rc0

3 years ago

This is a pre-release for testing anaconda.