ChainerRL is a deep reinforcement learning library built on top of Chainer.
This release will probably be the final major update under the name of ChainerRL. The development team is planning to switch its backend from Chainer to PyTorch and continue its development as OSS.
chainerrl.agents.SoftActorCritic
chainerrl.agents.DoubleIQN
.CategoricalDoubleDQN
is same as that of CategoricalDQN
is fixed.PrioritizedReplayBuffer
with normalize_by_max == 'batch'
is wrong is fixed.batch_recurrent_experiences
regarding next_action (#528)chainerrl.agents.CategoricalDoubleDQN
chainerrl.agents.TD3
--recurrent
option)env.seed
is fixed.examples/ale/train_dqn_ale.py
uses LinearDecayEpsilonGreedy
even when NoisyNet is used is fixed.examples/ale/train_dqn_ale.py
does not use the value specified by --noisy-net-sigma
is fixed.chainerrl.links.to_factorized_noisy
does not work correctly with chainerrl.links.Sequence
is fixed.chainerrl.experiments.train_agent_async
now requires eval_n_steps
(number of timesteps for each evaluation phase) and eval_n_episodes
(number of episodes for each evaluation phase) to be explicitly specified, with one of them being None.examples/ale/dqn_phi.py
is removed.chainerrl.initializers.LeCunNormal
is removed. Use chainer.initializers.LeCunNormal
instead.gym>=0.12.2
by stopping to use underscore methods in gym wrappers (#462)examples/atari
(#437)noisy_net_sigma
parameter (#465)noisy_net_sigma
parameter (#465)to_factorized_noisy
work with sequential links (#489)chainerrl.agents.IQN
.done=False
via info
dict is supported. When env.step
returns a info
dict with info['needs_reset']=True
, env is reset. This feature is useful for implementing a continuing env.examples/atari/dqn
now implements the same evaluation protocol as the Nature DQN paper.examples/grasping
.obs_normalizer
was not saved is fixed.argv
argument was ignored by chainerrl.experiments.prepare_output_dir
is fixed.train_agent_with_evaluation
and train_agent_batch_with_evaluation
now require eval_n_steps
(number of timesteps for each evaluation phase) and eval_n_episodes
(number of episodes for each evaluation phase) to be explicitly specified, with one of them being None
.train_agent_with_evaluation
's max_episode_len
argument is renamed to train_max_episode_len
.ReplayBuffer.sample
now returns a list of lists of N experiences to support N-step returns.TestTrainAgentAsync
(#363)chainerrl.agents.A2C
)examples/ale/train_dqn_ale.py
now follows "Tuned DoubleDQN" setting by default, and supports prioritized experience replay as an optionexamples/atari/train_dqn.py
is added as a basic example of applying DQN to Atari.chainerrl.agents.CategoricalDQN
that deteriorates performance is fixedatari_wrappers.LazyFrame
that unnecessarily increases memory usage is fixedchainerrl.replay_buffer.PrioritizedReplayBuffer
and chainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer
are updated:
eval_explorer
argument of chainerrl.experiments.train_agent_*
is dropped (use chainerrl.wrappers.RandomizeAction
for evaluation-time epsilon-greedy)chainerrl.agents.PPO
has changed a lotchainerrl.agents.TRPO
.chainerrl.agents.CategoricalDQN
.chainerrl.links.FactorizedNoisyLinear
and chainerrl.links.to_factorized_noisy
.async
module is renamed async_
for Python 3.7 support.async
to async_
to support Python 3.7 (#286, thanks @mmilk1231!)chainerrl.agents.PPO
__len__
now counts the number of transitions, not episodesoptimizers/__init__.py
(#113)train_dqn_ale.py
(#192)train_dqn_gym.py
(#195)train_a3c_ale.py
(#197)__len__
(#155)Enhancements:
Fixes:
Dependency changes:
Changes that can affect performance:
Enhancements:
Fixes: