Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
action_std
; to make training more stable for complex environments.csv
filesPPO_colab.ipynb
combining all the files to train / test / plot graphs / make gifs on google colab in a convenient jupyter-notebookPPO_colab.ipynb
in Google Colab This repository provides a Minimal PyTorch implementation of Proximal Policy Optimization (PPO) with clipped objective for OpenAI gym environments. It is primarily intended for beginners in Reinforcement Learning for understanding the PPO algorithm. It can still be used for complex environments but may require some hyperparameter-tuning or changes in the code. A concise explaination of PPO algorithm can be found here and a thorough explaination of all the details for implementing best performing PPO can be found here (All are not implemented in this repo yet).
To keep the training procedure simple :
train.py
test.py
plot_graph.py
make_gif.py
.py
filePPO_colab.ipynb
combines all the files in a jupyter-notebookREADME.md
in PPO_preTrained directory
Please use this bibtex if you want to cite this repository in your publications :
@misc{pytorch_minimal_ppo,
author = {Barhate, Nikhil},
title = {Minimal PyTorch Implementation of Proximal Policy Optimization},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/nikhilbarhate99/PPO-PyTorch}},
}
PPO Continuous RoboschoolHalfCheetah-v1 | PPO Continuous RoboschoolHalfCheetah-v1 |
---|---|
PPO Continuous RoboschoolHopper-v1 | PPO Continuous RoboschoolHopper-v1 |
---|---|
PPO Continuous RoboschoolWalker2d-v1 | PPO Continuous RoboschoolWalker2d-v1 |
---|---|
PPO Continuous BipedalWalker-v2 | PPO Continuous BipedalWalker-v2 |
---|---|
PPO Discrete CartPole-v1 | PPO Discrete CartPole-v1 |
---|---|
PPO Discrete LunarLander-v2 | PPO Discrete LunarLander-v2 |
---|---|
Trained and Tested on:
Python 3
PyTorch
NumPy
gym
Training Environments
Box-2d
Roboschool
pybullet
Graphs and gifs
pandas
matplotlib
Pillow