Rl Book Challenge Save

self-studying the Sutton & Barto the hard way

Project README

In this repo

Python replication of all the plots from Reinforcement Learning: An Introduction
Solution for all of the exercises
Anki flashcards summary of the book

1. Replicate all the figures

To reproduce a figure, say figure 2.2, do:

cd chapter2
python figures.py 2.2

Chapter 2

Chapter 4

Figure 4.2: Jack’s car rental problem (value function, policy)
Figure 4.3: The solution to the gambler’s problem (value function, policy)

Chapter 5

Figure 5.1: Approximate state-value functions for the blackjack policy
Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES
Figure 5.3: Weighted importance sampling
Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates
Figure 5.5: A couple of right turns for the racetrack task (1, 2, 3)

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Figure 10.1: The cost-to-go function for Mountain Car task in one run (428 steps; 12, 104, 1000, 9000 episodes)
Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task
Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task
Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa
Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task

Chapter 11

Chapter 12

Chapter 13

2. Solution to all of the exercises (text answers)

To reproduce the results of an exercise, say exercise 2.5 do:

cd chapter2
python figures.py ex2.5

Chapter 2

Chapter 4

Exercise 4.7: Modified Jack's car rental problem (value function, policy)
Exercise 4.9: Gambler’s problem with ph = 0.25 (value function, policy) and ph = 0.55 (value function, policy)

Chapter 5

Exercise 5.14: Modified MC Control on the racetrack (1, 2)

Chapter 6

Chapter 7

Chapter 8

Chapter 11

Exercise11.3: One-step semi-gradient Q-learning to Baird’s counterexample

3. Anki flashcards (cf. this blog)

Appendix

Dependencies

numpy
matplotlib
seaborn

Credits

All of the code and answers are mine, except for mountain car's tile coding (url in the book).

This README is inspired from ShangtongZhang's repo.

Design choices

All of the chapters are self-contained.
The environments use a gym-like API with methods:

s = env.reset()
s_p, r, d, dict = env.step(a)

How long did it take

The entire thing (plots, exercises, anki cards (including reviewing)) took about 400h of focused work.

Open Source Agenda is not affiliated with "Rl Book Challenge" Project. README Source: mtrazzi/rl-book-challenge

Stars

183

Open Issues

Last Commit

2 years ago

Repository

mtrazzi/rl-book-challenge

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/rl-book-challenge"><img src="https://www.opensourceagenda.com/projects/rl-book-challenge/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022