DeepRL Versions Save

Modularized Implementation of Deep RL Algorithms in PyTorch

v1.4

4 years ago

v1.1

4 years ago

v1.0

5 years ago

The main update is to use TensorBoard to fully replace the plotting system from Open AI baselines. The latter has turned out to be a bad choice.

Previously, there is an internal version for me to run experiments on servers, including many helping scripts. I find recently it is intractable to maintain two versions at the same time. So from now on they are merged together.

I really don't like shell. So my philosophy is to use python as much as possible. I don't like the common style that we pass a loooooong arg list to a script to specify hyper-parameters.

Currently I cannot upgrade to Pytorch v1.0.1 as many of my ongoing projects are still based on v0.4.0. I will do this as soon as possible.

v0.4

5 years ago

v0.3

5 years ago

After this release, there is no official support for Python 2, although I expect most of the code will still work well in Python 2.

v0.2

5 years ago

After this release, all the codes are incompatible with PyTorch v0.3.x

v0.1

6 years ago

I found the current Atari wrapper I used is not fully compatible with the one in OpenAI baselines, resulting a dropped performance for most games (except for Pong). So I plan to do a major update to fix this issue. (To be more specific, OpenAI baselines track the return of the original episode which usually has more than one lives, however I track the return of the episode that only has one life)

Moreover, asynchronous methods are getting deprecated nowadays, so I will remove them and switch to A2C style algorithms in next version.

I made this tag in case someone may still want some old stuff.

To be more specific, following are implemented algorithms in this release:

Deep Q-Learning (DQN)
Double DQN
Dueling DQN
(Async) Advantage Actor Critic (A3C / A2C)
Async One-Step Q-Learning
Async One-Step Sarsa
Async N-Step Q-Learning
Continuous A3C
Distributed Deep Deterministic Policy Gradient (Distributed DDPG, aka D3PG)
Parallelized Proximal Policy Optimization (P3O, similar to DPPO)
Action Conditional Video Prediction
Categorical DQN (C51, Distributional DQN with KL Distance)
Quantile Regression DQN (Distributional DQN with Wasserstein Distance)
N-Step DQN (similar to A2C)

Most of them are compatible with both Python2 and Python3, however almost all the async methods can only work in Python2.