Tensorflow implementation of the asynchronous advantage actor-critic (a3c) reinforcement learning algorithm for continuous action space
Tensorflow implementation of the asynchronous advantage actor-critic (A3C) reinforcement learning algorithm (paper) for continuous action space. Code is mostly based on Morvan Zhou (github).
Pendulum environment before training:
After 1500 episodes: