Training (hopefully) safe agents in gridworlds
Training (hopefully) safe agents in gridworlds.
Emphasizing extensibility, modularity, and accessibility.
safe_grid_agents/common
: Core codebase. Includes abstract base
classes for a variety of agents, their associated warmup/learn/eval
functions, and a utilities file.main.py
: Python executable for composing training jobs.safe_grid_agents/parsing
: Helpers that construct a flexible CLI
for main.py
.safe_grid_agents/ssrl
: Agents that implement semi-supervised
reinforcement learning and their associated warmup functions.When installing with pip, make sure to use the
process-dependency-links
flag:
pip install . --process-dependency-links
URL-based dependencies are available for audit at the following repositories and forks: - safe-grid-gym - ai-safety-gridworlds
If you plan on developing this library, make sure to add an -e
flag to
the above pip install command.
This repo requires tensorboardX for monitoring and visualizing agent learning, as well as PyTorch for implementation of certain agents. Currently, tensorboardX does not function properly without Tensorflow installed. Since the installation process of these packages can vary system to system, we exclude them from our build process. There are multiple tutorials online for installing both of these online. For example, on OS X without CUDA support I'd go with:
# Replace `tensorflow` with `tensorflow-gpu` if you have a GPU.
pip install torch torchvision tensorflow
You can use the CLI to main.py
to modularly drop agents into arbitrary
safety gridworlds. For example, python main.py boat tabular-q --lr .5
will train a TabularQAgent on the BoatRaceEnvironment with a learning
rate of 0.5.
There are a number of customizable parameters to modify training runs.
These parameters are split into three groups: - Core arguments: args
that are shared across all agents/environments. Found in
parsing/core_parser_configs.yaml
.
parsing/env_parser_configs.yaml
.parsing/agent_parser_configs.yaml
.The generalized form for the CLI is
python main.py <core_args> env <env_args> agent <agent_args>
We support using Ray Tune to configure hyperparameters. Look at
TUNE_DEFAULT_CONFIG
in main.py
to see which are currently supported.
If you specify a tunable parameter on the CLI with the -t
or --tune
flag, it will be automatically set.
This will automatically set parameters for the learning rate lr
and
discount rate discount
.
# `-t` and `--tune` are equivalent, and can be used interchangeably.
python3 main.py -t lr --tune discount boat tabular-q
You can use the --log-dir
/-L
flag to the main.py script to specify a
directory for saving training and evaluation metrics across runs. I
suggest a pattern similar to
logs/sokoban/deep-q/lr5e-4
# that is, <logdir>/<env_alias>/<agent_alias>/<uniqueid_or_hparams>
If no log-dir is specified for main.py, logging defaults to the runs/
directory, which can be helpful to separate debugging runs from training
runs.
Given a log directory <logs>
, simply run tensorboard --logdir <logs>
to visualize an agent's learning.
We use black for auto-formatting
according to a consistent style guide. To auto format, run black .
from inside the repo folder. To make this more convenient, you can
install plugins for your preferred text editor that auto-format on every
save.
Steps to take when adding a new agent.
common
, but
if you're adding a new SSRL agent, add it to ssrl
. We'll refer to
this folder as <top>
.<top>
for it (using an informative
abbreviation). You should also create an abstract base class
establishing the distinguishing functionality of your agent class in
<top>/base.py
. For example:
query_H
method for each agent.learn_C
method to learn the probability of the state being corrupt.<top>/warmup.py
, and
make sure it's importable from common/warmup.py
. The noop
default warmup function works for agents that don't require any
special functionality.<top>/learn.py
. See
common/learn.py
for an example distinguishing DQN from a tabular Q-learning agent.<top>/eval.py
describing the
evaluation feedback loop. The default_eval
function in
common/eval.py
should cover most cases, so you may not need to add
anything for evaluation.parsing/agent_parser_configs.yaml
. Follow the existing pattern and
check for previously implemented YAML anchors that cover the
arguments you need (e.g. learnrate
, epsilon-anneal
, etc.). These
configs should be organized by where they appear in the folder
structure of the repository.