Wumo Reinforcement Learning An Introduction Save

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)

Project README

Reinforcement Learning: An Introduction

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition). The purpose of this project is to help understanding RL algorithms and experimenting easily.

Inspired by ShangtongZhang/reinforcement-learning-an-introduction (Python) and idsc-frazzoli/subare (Java 8)

Features:

Algorithms and problems are separated. So you can experiment with various combination of <algorithm, problem> or <algorithm,function approximator, problem>
Implementation is very close to the pseudo code in the book. So reading source code will help you understand the original algorithm.

Implemented algorithms:

Model-based (Dynamic Programming):

Policy Iteration (Action-Value Iteration) (p.65)
Value Iteration (p.67)

Monte Carlo (episode backup):

Temporal Difference (one-step backup):

Tabular TD(0) (p.98)
Sarsa (p.106)
Q-learning (p.107)
Expected Sarsa (p.109)
Double Q-Learning (p.111)

n-step Temporal Difference (unify MC and TD):

Dyna (Integrate Planning, Acting, and Learning):

Random-sample one-step tabular Q-planning (p.133)
Tabular Dyna-Q (p.135)
Tabular Dyna-Q+ (p.138)
Prioritized Sweeping (p.140)
Prioritized Sweeping Stochastic Environment (p.141)

On-policy Prediction with Function Approximation

On-policy Control with Function Approximation

Episodic semi-gradient Sarsa (p.198)
Episodic semi-gradient n-step Sarsa (p.200)
Differential semi-gradient Sarsa (p.203)
Differential semi-gradient n-step Sarsa (p.206)

Off-policy Methods with Approximation

Semi-gradient off-policy TD(0) (p.210)
Semi-gradient Expected Sarsa (p.210)
n-step semi-gradient off-policy Sarsa (p.211)
n-step semi-gradient off-policy Q(σ) (p.211)

Eligibility Traces

Policy Gradient Methods

REINFORCE, A Monte-Carlo Policy-Gradient Method (episodic) (p.271)
REINFORCE with Baseline (episodic) (p.273)
One-step Actor-Critic (episodic) (p.274)
Actor-Critic with Eligibility Traces (episodic) (p.275)
Actor-Critic with Eligibility Traces (continuing) (p.277)

Implemented problems:

Grid world (p.61)
Jack's Car Rental and exercise 4.4 (p.65)
Gambler's Problem (p.68)
Blackjack (p.76)
Random Walk (p.102)
Windy Gridworld and King's Moves (p.106)
Cliff Walking (p.108)
Maximization Bias Example (p.110)
19-state Random Walk (p.118)
Dyna Maze (p.136)
Rod Maneuvering (p.141)
1000-state Random Walk (p.166)
Mountain Car (p.198)
Access-Control Queuing Task (p.204)

Build

Built with Maven

Test cases

Try Testcases

Figure 7.2

Figure 7.2: Performance of n-step TD methods as acc function of α, for various values of n, on acc 19-state random walk task

Figure 10.1

Figure 10.1: The Mountain Car task and the cost-to-go function learned during one run

Figure 10.4

Figure 10.4: Effect of the α and n on early performance of n-step semi-gradient Sarsa and tile-coding function approximation on the Mountain Car task

Figure 12.3

Figure 12.3: 19-state Random walk results: Performance of the offline λ-return algorithm .

Figure 12.6

Figure 12.6: 19-state Random walk results: Performance of TD(λ) .

Figure 12.8

Figure 12.8: 19-state Random walk results: Performance of online λ-return algorithms

Figure 12.10

Figure 12.10: Early performance on the Mountain Car task of Sarsa(λ) with replacing traces

Figure 12.11

Figure 12.11: Summary comparison of Sarsa(λ) algorithms on the Mountain Car task.

Open Source Agenda is not affiliated with "Wumo Reinforcement Learning An Introduction" Project. README Source: wumo/Reinforcement-Learning-An-Introduction

Stars

Open Issues

Last Commit

3 years ago

Repository

wumo/Reinforcement-Learning-An-Introduction

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/wumo-reinforcement-learning-an-introduction"><img src="https://www.opensourceagenda.com/projects/wumo-reinforcement-learning-an-introduction/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022