Openai Lab Versions Save

An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.

v1.0.5

7 years ago

Improvements/Bug Fixes

Misc

PR #131

fix overflow error in np.exp of SoftmaxPolicy, BoltzmannPolicy by casting to float64 instead of float32
improve overall np.isfinite asserts
remove index after reset in *analysis.csv
remove unused specs
reorganize and expand test specs
guard continuous action value range in continuous policies
fix analytics param variable sourcing

DDPG

PR: #131

add EpsilonGreedyNoisePolicy

PER

PR: #131

add memory.update(errors) throughout all agents
add shape assert for Q values and errors throughout
auto max_mem_len as max_timestep * max_epis/3 if not specified
put the missing abs for init reward

v1.0.4

7 years ago

New Algorithms

ActorCritic

PR: #118

add ActorCritic agent
add its policies, Discrete: ArgmaxPolicy, SoftmaxPolicy; Continuous: BoundedPolicy, GaussianPolicy
add basic specs, solve Cartpole-v0, Cartpole-v1, yet to solve the others

DDPG

PR: #118

add DDPG agent with custom tensorflow ops
add its policies (only Continuous now): NoNoisePolicy, LinearNoisePolicy, GaussianWhiteNoisePolicy, OUNoisePolicy
add basic specs, solve Pendulum-v0

Improvements/Bug Fixes

PR: #118

use logger.warn instead of raise error when component locks are violated
fix #114, #115 matplotlib backend setting issue. now single trial will live-plot and render
mute DoubleDQN as it breaks; instead revert to the single-model recompile from DQN

v1.0.3

7 years ago

Component Locks

PR: #120

We have a lot of components, and not all of them are compatible with another. When scheduling experiments and designing specs it is hard to keep all of them in check. This adds a component locks that does automatic checking of all specs when importing, by using the specified locks in rl/spec/component_locks.json. Uses the minimum description length design principle. When adding new components, be sure to update this file.

add double-network component lock
add discrete-action component lock; assume continuous agent can handle discrete action spaces as a generalization

Improved Installation

PR: #121 Solves: #113, #114, #115

fix broken gym installation. See gym PR 558
layout installation steps in doc, use binaries for server setup
introduce version lock for dependencies with requirements.txt, environment.yml
support installation by system python, virtualenv, conda, integrate into Grunt
add quickstart_dqn for example quickstart in doc

Bug Fixes

DoubleDQN

PR: #119

restore missing recompile_model call to the second model in DoubleDQN.

v1.0.2

7 years ago

Bug Fixes

BoltzmannPolicy

PR: #109

fix state reshape with dimension > 1 using np.expand_dims
guard underflow by doing np.clip before np.exp

Misc

rename class from DoubleDQNPolicy to DoubleDQNEpsilonGreedyPolicy for clarity
refactor useless RENDER key from rl/spec/problems.json into rl/experiment.py

v1.0.1

7 years ago

Bug Fixes

PER

PR: #108

fix PER breakage on negative error = reward by adding a bump min_priority = abs(10 * SOLVED_MEAN_REWARD)
add a positive min_priority for all problems since they may have negative rewards. We cannot do error = abs(reward) because it is sign sensitive for priority calculation
add assert guard to ensure priority is not nan

v1.0.0

7 years ago

First stable release of OpenAI Lab

PR: #106

stable and generalized RL components design
implement discrete agents: DQN, double-DQN, SARSA, PER
run dozens of experiments as Lab tests. Solve numerous discrete environments on Fitness Matrix, with PR submissions. Mainly CartPole-v0, CartPole-v1, Acrobot-v1, LunarLander-v2
complete documentation page
complete analytics framework and generalized fitness_score as evaluation metrics
stable system design after many iterations
ready for more implementations and new research