A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.
This repo will contain PyTorch implementation of various fundamental RL algorithms.
It's aimed at making it easy to start playing and learning about RL.
The problem I came across investigating other DQN projects is that they either:
This repo will aim to solve these problems.
This was the project that started the revolution in the RL world - deep Q-network (:link: Mnih et al.),
aka "Human-level control through deep RL".
DQN model learned to play 29 Atari games (out of 49 they it tested on) on a super-human/comparable-to-humans level. Here is the schematic of it's CNN architecture:
The fascinating part is that it learned only from "high-dimensional" (84x84) images and (usually sparse) rewards. The same architecture was used for all of the 49 games - although the model has to be retrained, from scratch, every single time.
Since it takes lots of compute and time to train all of the 49 models I'll consider the DQN project completed once I succeed in achieving the published results on:
Having said that the experiments are still in progress, so feel free to contribute!
Important note: please follow the coding guidelines of this repo before you submit a PR so that we can minimize the back-and-forth. I'm a decently busy guy as I assume you are.
As you can see the model did learn something although it's far from being really good.
todo
Let's get this thing running! Follow the next steps:
git clone https://github.com/gordicaleksa/pytorch-learn-reinforcement-learning
cd path_to_repo
conda env create
from project directory (this will create a brand new conda environment).activate pytorch-rl-env
(for running scripts from your console or setup the interpreter in your IDE)If you're on Windows you'll additionally need to install this:
pip install https://github.com/Kojoley/atari-py/releases atary_py
to install gym's Atari dependencies.
Otherwise this should do it pip install 'gym[atari]'
, if it's not working check out this and this.
That's it! It should work out-of-the-box executing environment.yml file which deals with dependencies.
PyTorch pip package will come bundled with some version of CUDA/cuDNN with it, but it is highly recommended that you install a system-wide CUDA beforehand, mostly because of the GPU drivers. I also recommend using Miniconda installer as a way to get conda on your system. Follow through points 1 and 2 of this setup and use the most up-to-date versions of Miniconda and CUDA/cuDNN for your system.
Coming soon.
You just need to link the Python environment you created in the setup section.
To run with default settings just run python train_DQN_script.py
.
Settings you'll want to experiment with:
--seed
- it may just so happen that I've chosen a bad one (RL is very sensitive)--learning_rate
- DQN originally used RMSProp, I saw that Adam with 1e-4 worked for stable baselines 3--grad_clipping_value
- there was a lot of noise in the gradients so I used this to control itLess important settings for getting DQN to work:
--env_id
- depending on which game you want to train on (I'd focus on the easiest one for now - Breakout)--replay_buffer_size
- hopefully you can train DQN with 1M, as in the original paper, if not make it smaller--dont_crash_if_no_mem
- add this flag if you want to run with 1M replay buffer even if you don't have enough RAMThe training script will:
models/checkpoints/
models/binaries/
<- TODOruns/
, to use it check out the visualization section
You can visualize the metrics during the training, by calling tensorboard --logdir=runs
from your console
and pasting the http://localhost:6006/
URL into your browser.
I'm currently visualizing the Huber loss (and you can see there is something weird going on):
Rewards and steps taken per episode (there is a fair bit of correlation between these 2):
And gradient L2 norms of weights and biases of every CNN/FC layer as well as the complete grad vector:
As well as epsilon (from the epsilon-greedy algorithm) but that plot is not that informative so I'll omit it here.
As you can see the plots are super noisy! As I could have expected, but the progress just stagnates from certain point onwards and that's what I'm trying to debug atm.
To enter the debug mode add the --debug
flag to your console or IDE's list of script arguments.
It'll visualize the current state that's being fed into the RL agent. Sometimes the state will have some black frames prepended since there aren't enough frames experienced in the current episode:
But mostly all of the 4 frames will be in there:
And it will start rendering the game frames (Pong
and Breakout
showed here from left to right):
You'll need some decent hardware to train the DQN in reasonable time so that you can iterate fast:
With 16 GB RAM and RTX 2080 it takes ~5 days to train DQN on my machine - I'm experiencing some slowdowns which I haven't debugged yet. Here is the FPS (frames-per-second) metric I'm logging:
The shorter, green one is the current experiment I'm running, the red one took over 5 days to train.
Here are some videos I made on RL which may help you to better understand how DQN and other RL algorithms work:
And some other ones:
And in this one I tried to film through the process while the project was not nearly as polished as it is now:
I'll soon create a blog on how to get started with RL - so stay tuned for that!
I found these resources useful while developing this project, sorted (approximately) by usefulness:
If you find this code useful, please cite the following:
@misc{Gordić2021PyTorchLearnReinforcementLearning,
author = {Gordić, Aleksa},
title = {pytorch-learn-reinforcement-learning},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/gordicaleksa/pytorch-learn-reinforcement-learning}},
}