Advantage Actor critic [1]
Parallel Advantage Actor critic [2]
Noisy Networks for Exploration [3]
Proximal Policy Optimization Algorithms [4]
Curiosity-driven Exploration by Self-supervised Prediction [5] (WIP)
Modify the parameters in mario_a2c.py
as you like.
python3 mario_a2c.py
or
python3 mario_ppo.py
Modify the is_load_model
, is_render
parameters in mario_a2c.py
as you like.
python3 mario_a2c.py
or
python3 mario_ppo.py
It use just A2C(PAAC)
It use just ICM and no ext reward.(Curiosity-driven)
[1] Actor-Critic Algorithms
[2] Efficient Parallel Methods for Deep Reinforcement Learning
[3] Noisy Networks for Exploration
[4] Proximal Policy Optimization Algorithms
[5] Curiosity-driven Exploration by Self-supervised Prediction