Studying Reinforcement Learning Guide
Introduction to Reinforcement Learning by Joelle Pineau, McGill University:
Applications of RL.
When to use RL?
RL vs supervised learning
What is MDP? Markov Decision Process
Components of an RL agent:
+-----------------+
+--------------------- | |
| | Agent |
| | | +---------------------+
| +----------> | | |
| | +-----------------+ |
| | |
tate | | reward | action
(t) | | r(t) | a(t)
| | |
| | + |
| | | r(t+1) +----------------------------+ |
| +-----------+ | |
| | | | <-----------+
| | | Environment |
| | S(t+1) | |
+---------------------+ |
| +----------------------------+
+
Sutton and Barto (1998)
Explanation of the Markov Property:
Why Maximizing utility in:
What is the policy & what to do with it?
Value functions:
Optimal policies and optimal value functions.
Key challenges in RL:
The RL lingo.
In large state spaces: Need approximation:
Deep Q-network (DQN) and tips.
Deep Reinforcement Learning by Pieter Abbeel, EE & CS, UC Berkeley
Why Policy Optimization?
Cross Entropy Method (CEM) / Finite Differences / Fixing Random Seed
Likelihood Ratio (LR) Policy Gradient
Natural Gradient / Trust Regions (-> TRPO)
Actor-Critic (-> GAE, A3C)
Path Derivatives (PD) (-> DPG, DDPG, SVG)
Stochastic Computation Graphs (generalizes LR / PD)
Guided Policy Search (GPS)
Inverse Reinforcement Learning
Explanation with Implementation for some of the topics mentioned in the Deep Reinforcement Learning talk, written by Arthur Juliani
Reinforcement Learning by David Silver.
CS 294: Deep Reinforcement Learning, Spring 2017 by John Schulman and Pieter Abbeel.