Lnpalmer PPO Save

PyTorch implementation of Proximal Policy Optimization

Project README

PPO

PyTorch implementation of Proximal Policy Optimization

live agents

Usage

Example command line usage:

python main.py BreakoutNoFrameskip-v0 --num-workers 8 --render

This will run PPO with 8 parallel training environments, which will be rendered on the screen. Run with -h for usage information.

Performance

Results are comparable to those of the original PPO paper. The horizontal axis here is labeled by environment steps, whereas the graphs in the paper label it with frames, with 4 frames per step.

Training episode reward versus environment steps for BreakoutNoFrameskip-v3:

Breakout training curve

References

Proximal Policy Optimization Algorithms

OpenAI Baselines

This code uses some environment utilities such as SubprocVecEnv and VecFrameStack from OpenAI's Baselines.

Open Source Agenda is not affiliated with "Lnpalmer PPO" Project. README Source: lnpalmer/PPO

Stars

Open Issues

Last Commit

6 years ago

Repository

lnpalmer/PPO

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/lnpalmer-ppo"><img src="https://www.opensourceagenda.com/projects/lnpalmer-ppo/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022