An elegant PyTorch deep reinforcement learning library.
make commit-checks
to automatically perform almost all checks (#432)make spelling
(#432)norm_obs
, obs_rms
, update_obs_rms
and RunningMeanStd
) (#308)policy.map_action
to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (#313)lr_scheduler
in on-policy algorithms, typically for LambdaLR
(#318)To adapt with this version, you should change the action_range=...
to action_space=env.action_space
in policy initialization.
n/ep==0
and reward shown in tqdm) with on-policy algorithm (#306, #328)numpy>=1.20
typing issue (#323)This release contains several API and behavior changes.
buffer.add
API from buffer.add(obs, act, rew, done, obs_next, info, policy, ...)
to buffer.add(batch, buffer_ids)
in order to add data more efficient (#280);set_batch
method in buffer (#278);sample_index
method, same as sample
but only return index instead of both index and batch data (#278);prev
(one-step previous transition index), next
(one-step next transition index) and unfinished_index
(the last modified index whose done==False
) (#278);_alloc_by_keys_diff
in batch to support any form of keys pop up (#280);collector.collect(n_episode=List[int])
because the new collector can collect episodes without bias (#280);reward_metric
from Collector to trainer (#280);Collector.collect
logic: AsyncCollector.collect
's semantic is the same as previous version, where collect(n_step or n_episode)
will not collect exact n_step or n_episode transitions; Collector.collect(n_step or n_episode)
's semantic now changes to exact n_step or n_episode collect (#280);policy.exploration_noise(action, batch) -> action
method instead of implemented in policy.forward()
(#280);Timelimit.truncate
handler in compute_*_returns
(#296);ignore_done
flag (#296);reward_normalization
option in offpolicy-algorithm (will raise Error if set to True) (#298);collect_per_step
to step_per_collect
(#293);update_per_step
and episode_per_collect
(#293);onpolicy_trainer
now supports either step_collect or episode_collect (#293)env.seed(seed)
, it will call env.action_space.seed(seed)
; otherwise using Collector.collect(..., random=True)
will produce different result each time (#300, #303).utils/discrete
and utils/continuous
cannot work well under CUDA+torch<=1.6.0 (#289)Batch.__setitem__
now throws ValueError
instead of KeyError
; _create_value
now allows placeholder with stack=False
option (#284)Batch.cat
and Batch.stack
(#284), now it is almost as fast as v0.2.3utils.network
args to support any form of MLP by default (#275), remove layer_num
and hidden_layer_size
, add hidden_sizes
(a list of int indicate the network architecture)examples/atari/atari_network.py
(#275)utils.network
(#275)Several bug fix (trainer, test and docs)
Since at this point, the code has largely changed from v0.2.0, we release version 0.3 from now on.
train_fn(epoch)
to train_fn(epoch, env_step)
and test_fn(epoch)
to test_fn(epoch, env_step)
(#229)conda install -c conda-forge tianshou
(#228)This is a pre-release for testing anaconda.