Batch Ppo Versions Save

Efficient Batched Reinforcement Learning in TensorFlow

v1.4.0

6 years ago

Features:

Split episodes into chunks for training. This reduces memory requirements when training from pixels and in some cases increases data efficiency.
Use lambda variable initializers everywhere to support embedding the simulation into a larger graph.
Upgrade to newest Gym version, including new environment names and dtypes for spaces.
Support regularization losses returned by the network.

Improvements:

Remove MuJoCo dependency from tests.
Speed up smoke tests for faster iteration times.
Enable continuous integration.

Bugs:

Fix off-by-one bug in FrameHistory environment wrapper.

v1.3.0

6 years ago

Features:

Represent policies as tf.distribution objects, so that the algorithms are independent of the action distribution.

Improvements:

Move reusable components into agents.parts package.
Add nesting tools to handle nested tuples, lists, and dicts.

Bugs:

Fix PPO not learning on GPU by placing the optimizer on the GPU.

v1.2.0

6 years ago

Features:

Use single optimizer for PPO to train shared feature layers better.
Allow calling methods of the process environment.

Improvements:

Improve default and MuJoCo configs.
Report both training and evaluation scores.

Bugs:

Likelihood calculation halved gradients for the action standard deviation.

v1.1.0

6 years ago

Features:

Policy networks are now defined as functions mapping sequences of observations to sequences of actions. As a result, feed forward policies are faster now, and memory based agents are easier to implement. Previously, networks were restricted to be defined as RNNCells.
All functions of the agent interface receive a tensor of agent indices now. This adds the flexibility to process observations in smaller batches. Previously, perform() and experience() was defined on data from all the environments.

v1.0.0

6 years ago

Initial release.