self-studying the Sutton & Barto the hard way
To reproduce a figure, say figure 2.2, do:
cd chapter2
python figures.py 2.2
To reproduce the results of an exercise, say exercise 2.5 do:
cd chapter2
python figures.py ex2.5
Exercise2.5: Difficulties that sample-average methods have for nonstationary problems
Exercise2.11: Figure analogous to Figure 2.6 for the nonstationary case
Exercise 4.7: Modified Jack's car rental problem (value function, policy)
Exercise 4.9: Gambler’s problem with ph = 0.25 (value function, policy) and ph = 0.55 (value function, policy)
numpy
matplotlib
seaborn
All of the code and answers are mine, except for mountain car's tile coding (url in the book).
This README is inspired from ShangtongZhang's repo.
s = env.reset()
s_p, r, d, dict = env.step(a)
The entire thing (plots, exercises, anki cards (including reviewing)) took about 400h of focused work.