Using Asynchronous Deep Reinforcement Learning to play Flappy Bird from pixel input.

Asynchronous Deep ReinFlappyBird

This repository contains an implementation of Asynchronous Advantage Actor-Critic (A3C) that teaches an agent to play Flappy Bird.


Coming soon!

Technical Details

For my tests, these are the training speeds when using a CPU (Intel Xeon E5620 2.40 GHz) or GPU (NVIDIA GTX1070).

CPU 57 steps/s TBA steps/s
GPU 400 steps/s 300 steps/s


Here are some of the available flags you can set when you train an agent. For the full list, see

Agent settings

  • mode / [train, display, visualize] - Which mode you want to activate when you start a session.
  • use_gpu / [True, False] - If you have a/want to use GPU to speed up the training process.
  • parallel_agent_size - Number of parallel agents to use during training.
  • action_size - Numbers of available actions.
  • agent_type / [FF, LSTM] - What type of A3C to train the agent with.

Training and Optimizer settings

The current settings are based on or borrowed from the [implemenentation] ( by @miyosuda. They have not yet been optimized for Flappy Bird but rather used as is for now. Tell me settings that perform better than the current ones!

  • max_time_step - 40 000 000 - Maximum training steps.
  • initial_alpha_low - -5 - LogUniform low limit for learning rate (represents x in 10^x).
  • initial_alpha_high - -3 - LogUniform high limit for learning rate (represents x in 10^x).
  • gamma - 0.99 - Discount factor for rewards.
  • entropy_beta - 0.01 - Entropy regularization constant.
  • grad_norm_clip - 40.0- Gradient norm clipping.
  • rmsp_alpha - 0.99 - Decay parameter for RMSProp.
  • rmsp_epsilon - 0.1 - Epsilon parameter for RMSProp.
  • local_t_max - 5- Repeat step size.


  • log_level - Log level [NONE, FULL]
  • average_summary - How many episodes to average summary over.


  • display_episodes - Numbers of episodes to display.
  • display_log_level - Display log level - NONE prints end summary, MID prints episode summary and FULL prints the π-values, state value and reward for every state. [NONE, MID, FULL]

Getting started

To start a training session with the default parameters, run:

$ python

To check your progress and possibly compare different experiments in real time, navigate to your async-deep-flappybird folder and start tensorboard by running:

$ tensorboard --logdir summaries/



A3C - The A3C implementation used is a modified version by @miyosuda.

Flappy Bird - The Flappy Bird implementation is based on a version by @yenchenlin with som minor adjustments.

2016, Babak Toghiani-Rizi

