Tensorflow implementation of generative adversarial imitation learning
Tensorflow implementation of Generative Adversarial Imitation Learning (and behavior cloning)
disclaimers: some code is borrowed from @openai/baselines
I separate the code into two parts: (1) Sampling expert data, (2) Imitation learning with GAIL/BC
Ensure that $GAILTF
is set to the path to your gail-tf repository, and
$ENV_ID
is any valid OpenAI gym environment (e.g. Hopper-v1, HalfCheetah-v1,
etc.)
export GAILTF=/path/to/your/gail-tf
export ENV_ID="Hopper-v1"
export BASELINES_PATH=$GAILTF/gailtf/baselines/ppo1 # use gailtf/baselines/trpo_mpi for TRPO
export SAMPLE_STOCHASTIC="False" # use True for stochastic sampling
export STOCHASTIC_POLICY="False" # use True for a stochastic policy
export PYTHONPATH=$GAILTF:$PYTHONPATH # as mentioned below
cd $GAILTF
python3 $BASELINES_PATH/run_mujoco.py --env_id $ENV_ID
The trained model will save in ./checkpoint
, and its precise name will
vary based on your optimization method and environment ID. Choose the last
checkpoint in the series.
export PATH_TO_CKPT=./checkpoint/trpo.Hopper.0.00/trpo.Hopper.00-900
python3 $BASELINES_PATH/run_mujoco.py --env_id $ENV_ID --task sample_trajectory --sample_stochastic $SAMPLE_STOCHASTIC --load_model_path $PATH_TO_CKPT
This will generate a pickle file that store the expert trajectories in
./XXX.pkl
(e.g. deterministic.ppo.Hopper.0.00.pkl)
export PICKLE_PATH=./stochastic.trpo.Hopper.0.00.pkl
python3 main.py --env_id $ENV_ID --expert_path $PICKLE_PATH
Usage:
--env_id: The environment id
--num_cpu: Number of CPU available during sampling
--expert_path: The path to the pickle file generated in the [previous section]()
--traj_limitation: Limitation of the exerpt trajectories
--g_step: Number of policy optimization steps in each iteration
--d_step: Number of discriminator optimization steps in each iteration
--num_timesteps: Number of timesteps to train (limit the number of timesteps to interact with environment)
To view the summary plots in TensorBoard, issue
tensorboard --logdir $GAILTF/log
python3 main.py --env_id $ENV_ID --task evaluate --stochastic_policy $STOCHASTIC_POLICY --load_model_path $PATH_TO_CKPT --expert_path $PICKLE_PATH
python3 main.py --env_id $ENV_ID --algo bc --expert_path $PICKLE_PATH
python3 main.py --env_id $ENV_ID --algo bc --task evalaute --stochastic_policy $STOCHASTIC_POLICY --load_model_path $PATH_TO_CKPT --expert_path $PICKLE_PATH
Note: The following hyper-parameter setting is the best that I've tested (simple grid search on setting with 1500 trajectories), just for your information.
The different curves below correspond to different expert size (1000,100,10,5).
python3 main.py --env_id Hopper-v1 --expert_path baselines/ppo1/deterministic.ppo.Hopper.0.00.pkl --g_step 3 --adversary_entcoeff 0
python3 main.py --env_id Walker2d-v1 --expert_path baselines/ppo1/deterministic.ppo.Walker2d.0.00.pkl --g_step 3 --adversary_entcoeff 1e-3
For HalfCheetah-v1 and Ant-v1, using behavior cloning is needed:
python3 main.py --env_id HalfCheetah-v1 --expert_path baselines/ppo1/deterministic.ppo.HalfCheetah.0.00.pkl --pretrained True --BC_max_iter 10000 --g_step 3 --adversary_entcoeff 1e-3
You can find more details here, GAIL policy here, and BC policy here
We don't have a pip package yet, so you'll need to add this repo to your PYTHONPATH manually.
export PYTHONPATH=/path/to/your/repo/with/gailtf:$PYTHONPATH
error: Cannot compile MPI programs. Check your configuration!!!
or the systme complain about mpi/h
sudo apt install libopenmpi-dev