Reinforcement learning algorithms implemented for Tensorflow 2.0+ [DQN, DDPG, AE-DDPG, SAC, PPO, Primal-Dual DDPG]
Implemented for Tensorflow 2.0+
python3 TF2_DDPG_LSTM.py
tensorboard --logdir=DDPG/logs
hyperparam_tune.py
python3 hyperparam_tune.py
Agents tested using CartPole env.
Name | On/off policy | Model | Action space support |
---|---|---|---|
DQN | off-policy | Dense, LSTM | discrete |
DDPG | off-policy | Dense, LSTM | discrete, continuous |
AE-DDPG | off-policy | Dense | discrete, continuous |
SAC:bug: | off-policy | Dense | continuous |
PPO | on-policy | Dense | discrete, continuous |
Name | On/off policy | Model | Action space support |
---|---|---|---|
Primal-Dual DDPG | off-policy | Dense | discrete, continuous |
Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs
DQN Basic, time step = 4, 500 reward | DQN LSTM, time step = 4, 500 reward |
---|---|
DDPG Basic, 500 reward | DDPG LSTM, time step = 5, 500 reward |
---|---|
AE-DDPG Basic, 500 reward | PPO Basic, 500 reward |
---|---|