An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
PR #131
np.exp
of SoftmaxPolicy
, BoltzmannPolicy
by casting to float64
instead of float32
np.isfinite
asserts*analysis.csv
PR: #131
EpsilonGreedyNoisePolicy
PR: #131
memory.update(errors)
throughout all agentsmax_mem_len
as max_timestep * max_epis/3
if not specifiedabs
for init rewardPR: #118
ActorCritic
agentArgmaxPolicy, SoftmaxPolicy
; Continuous: BoundedPolicy, GaussianPolicy
Cartpole-v0
, Cartpole-v1
, yet to solve the othersPR: #118
DDPG
agent with custom tensorflow opsNoNoisePolicy, LinearNoisePolicy, GaussianWhiteNoisePolicy, OUNoisePolicy
Pendulum-v0
PR: #118
logger.warn
instead of raise error when component locks are violatedDoubleDQN
as it breaks; instead revert to the single-model recompile from DQN
PR: #120
We have a lot of components, and not all of them are compatible with another. When scheduling experiments and designing specs it is hard to keep all of them in check. This adds a component locks that does automatic checking of all specs when importing, by using the specified locks in rl/spec/component_locks.json
. Uses the minimum description length design principle. When adding new components, be sure to update this file.
PR: #121 Solves: #113, #114, #115
requirements.txt, environment.yml
python, virtualenv, conda
, integrate into Gruntquickstart_dqn
for example quickstart in docPR: #119
recompile_model
call to the second model in DoubleDQN.PR: #108
error = reward
by adding a bump min_priority = abs(10 * SOLVED_MEAN_REWARD)
min_priority
for all problems since they may have negative rewards. We cannot do error = abs(reward)
because it is sign sensitive for priority calculationpriority
is not nan
PR: #106
DQN, double-DQN, SARSA, PER
CartPole-v0, CartPole-v1, Acrobot-v1, LunarLander-v2
fitness_score
as evaluation metrics