DeepRL Versions Save

Modularized Implementation of Deep RL Algorithms in PyTorch

v1.4

4 years ago

v1.1

4 years ago

v1.0

5 years ago

The main update is to use TensorBoard to fully replace the plotting system from Open AI baselines. The latter has turned out to be a bad choice.

Previously, there is an internal version for me to run experiments on servers, including many helping scripts. I find recently it is intractable to maintain two versions at the same time. So from now on they are merged together.

I really don't like shell. So my philosophy is to use python as much as possible. I don't like the common style that we pass a loooooong arg list to a script to specify hyper-parameters.

Currently I cannot upgrade to Pytorch v1.0.1 as many of my ongoing projects are still based on v0.4.0. I will do this as soon as possible.

v0.4

5 years ago

v0.3

5 years ago

After this release, there is no official support for Python 2, although I expect most of the code will still work well in Python 2.

v0.2

5 years ago

After this release, all the codes are incompatible with PyTorch v0.3.x

v0.1

6 years ago

I found the current Atari wrapper I used is not fully compatible with the one in OpenAI baselines, resulting a dropped performance for most games (except for Pong). So I plan to do a major update to fix this issue. (To be more specific, OpenAI baselines track the return of the original episode which usually has more than one lives, however I track the return of the episode that only has one life)

Moreover, asynchronous methods are getting deprecated nowadays, so I will remove them and switch to A2C style algorithms in next version.

I made this tag in case someone may still want some old stuff.

To be more specific, following are implemented algorithms in this release:

  • Deep Q-Learning (DQN)
  • Double DQN
  • Dueling DQN
  • (Async) Advantage Actor Critic (A3C / A2C)
  • Async One-Step Q-Learning
  • Async One-Step Sarsa
  • Async N-Step Q-Learning
  • Continuous A3C
  • Distributed Deep Deterministic Policy Gradient (Distributed DDPG, aka D3PG)
  • Parallelized Proximal Policy Optimization (P3O, similar to DPPO)
  • Action Conditional Video Prediction
  • Categorical DQN (C51, Distributional DQN with KL Distance)
  • Quantile Regression DQN (Distributional DQN with Wasserstein Distance)
  • N-Step DQN (similar to A2C)

Most of them are compatible with both Python2 and Python3, however almost all the async methods can only work in Python2.