Batch Ppo Versions Save

Efficient Batched Reinforcement Learning in TensorFlow

v1.4.0

6 years ago

Features:

  • Split episodes into chunks for training. This reduces memory requirements when training from pixels and in some cases increases data efficiency.
  • Use lambda variable initializers everywhere to support embedding the simulation into a larger graph.
  • Upgrade to newest Gym version, including new environment names and dtypes for spaces.
  • Support regularization losses returned by the network.

Improvements:

  • Remove MuJoCo dependency from tests.
  • Speed up smoke tests for faster iteration times.
  • Enable continuous integration.

Bugs:

  • Fix off-by-one bug in FrameHistory environment wrapper.

v1.3.0

6 years ago

Features:

  • Represent policies as tf.distribution objects, so that the algorithms are independent of the action distribution.

Improvements:

  • Move reusable components into agents.parts package.
  • Add nesting tools to handle nested tuples, lists, and dicts.

Bugs:

  • Fix PPO not learning on GPU by placing the optimizer on the GPU.

v1.2.0

6 years ago

Features:

  • Use single optimizer for PPO to train shared feature layers better.
  • Allow calling methods of the process environment.

Improvements:

  • Improve default and MuJoCo configs.
  • Report both training and evaluation scores.

Bugs:

  • Likelihood calculation halved gradients for the action standard deviation.

v1.1.0

6 years ago

Features:

  • Policy networks are now defined as functions mapping sequences of observations to sequences of actions. As a result, feed forward policies are faster now, and memory based agents are easier to implement. Previously, networks were restricted to be defined as RNNCells.
  • All functions of the agent interface receive a tensor of agent indices now. This adds the flexibility to process observations in smaller batches. Previously, perform() and experience() was defined on data from all the environments.

v1.0.0

6 years ago

Initial release.