Policy Optimization with Penalized Point Probability Distance: an Altern...
Implementation of a Deep Reinforcement Learning algorithm, Proximal Poli...
Model-based Policy Gradients