A gallery for reinforcement learning, including frameworks, tutorials, papers, implementations, applications, etc.
Technique | Benefit | Mentioned Key Algorithm |
---|---|---|
Target network | Stabilize the training process | DQN, 2015 |
Memory buffer | Breaking data relevance | DQN, 2015 |
KL-constrained update | Optimize update step size | TRPO, 2015 |
Advantage function | Stabilize learning | A3C, 2015 |
Importance sampling | Data efficient | PER,2016 |
Entropy-regularized | Better exploration | Soft Q-Learning, 2018 |
Boltzmann policy | Richer mathematical meaning | Soft Q-Learning, 2018 |
Target policy smoothing | Avert Q-function incorrect sharp peak | TD3, 2018 |
Clipped double-Q learning | Fend off overestimation in the Q-function | TD3, 2018 |
Reparameterize the policy | Lower variance estimate | SAC, 2018 |
PS: "Mentioned Key Algorithm" may not be the first algorithm that uses this technique, but makes a detailed explanation