start learning rate = 0.0006
end learning rate = 0
learning frame = 1e6
gradient clip norm = 40
trajectory = 20
batch size = 32
reward clipping = -1 ~ 1
tensorflow==1.14.0
gym[atari]
numpy
tensorboardX
opencv-python
Breakout | Pong | Seaquest | Space-Invader |
Boxing | Star-Gunner | Kung-Fu | Demon |
abs_one | soft_asymmetric |
abs_one |
soft_asymmetric |