Jihoonerd Human Level Control Through Deep Reinforcement Learning Save

📖 Paper: Human-level control through deep reinforcement learning 🕹ī¸

Project README

Human-level control through deep reinforcement learning

atlantis boxing breakout Pong

This repository implements the notable paper: Human-level control through deep reinforcement learning.

This paper is widely known for a famous video clip, which surpasses human's playing by a large gap. The paper uses deep neural networks to map from complex visual information to optimal actions, known as Deep Q network.

Features

  • Employed TensorFlow 2 with performance optimization
  • Simple structure
  • Easy to reproduce

Model Structure

nn

Requirements

Default running environment is assumed to be CPU-ONLY. If you want to run this repo on GPU machine, just replace tensorflow to tensorflow-gpu in package lists.

How to install

virtualenv

$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

How to run

You can run Atari 2600 game with main.py. Running environment needs to be NoFrameskip from gym package.

$ python main.py --help
usage: main.py [-h] [--env ENV] [--train] [--play PLAY]
               [--log_interval LOG_INTERVAL]
               [--save_weight_interval SAVE_WEIGHT_INTERVAL]

Atari: DQN
optional arguments:
  -h, --help            show this help message and exit
  --env ENV             Should be NoFrameskip environment
  --train               Train agent with given environment
  --play PLAY           Play with a given weight directory
  --log_interval LOG_INTERVAL
                        Interval of logging stdout
  --save_weight_interval SAVE_WEIGHT_INTERVAL
                        Interval of saving weights

Example 1: Train BreakoutNoFrameskip-v4

$ python main.py --env BreakoutNoFrameskip-v4 --train

Example 2: Play PongNoFrameskip-v4 with trained weights

$ python main.py --env PongNoFrameskip-v4 --play ./log/[LOGDIR]/weights

Example 3: Control log & save interval

$ python main.py --env BreakoutNoFrameskip-v4 --train --log_interval 100 --save_weight_interval 1000

Results

This implementation is guaranteed to work well for Atlantis, Boxing, Breakout and Pong. Tensorboard summary is located at ./archive. Tensorboard will show following information:

  • Average Q value
  • Epsilon (for exploration)
  • Latest 100 avg reward (clipped)
  • Loss
  • Reward (clipped)
  • Test score
  • Total frames
$ tensorboard --logdir=./archive/

Single RTX 2080 Ti is used for the results below. (Thanks to @JKeun for allowing his computation resources)

Atalntis

atlantis

Boxing

boxing

Breakout

breakout

Pong

Pong

BibTeX

@article{mnih2015humanlevel,
  added-at = {2015-08-26T14:46:40.000+0200},
  author = {Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A. and Veness, Joel and Bellemare, Marc G. and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K. and Ostrovski, Georg and Petersen, Stig and Beattie, Charles and Sadik, Amir and Antonoglou, Ioannis and King, Helen and Kumaran, Dharshan and Wierstra, Daan and Legg, Shane and Hassabis, Demis},
  biburl = {https://www.bibsonomy.org/bibtex/2fb15f4471c81dc2b9edf2304cb2f7083/hotho},
  description = {Human-level control through deep reinforcement learning - nature14236.pdf},
  interhash = {eac59980357d99db87b341b61ef6645f},
  intrahash = {fb15f4471c81dc2b9edf2304cb2f7083},
  issn = {00280836},
  journal = {Nature},
  keywords = {deep learning toread},
  month = feb,
  number = 7540,
  pages = {529--533},
  publisher = {Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.},
  timestamp = {2015-08-26T14:46:40.000+0200},
  title = {Human-level control through deep reinforcement learning},
  url = {http://dx.doi.org/10.1038/nature14236},
  volume = 518,
  year = 2015
}

Author

Jihoon Kim (@jihoonerd)

Open Source Agenda is not affiliated with "Jihoonerd Human Level Control Through Deep Reinforcement Learning" Project. README Source: jihoonerd/Human-level-control-through-deep-reinforcement-learning

Open Source Agenda Badge

Open Source Agenda Rating