Implementation of Tsallis Actor Critic method
This repository provides the implementation of Tsallis actor critic (TAC) method based on Spinningup packages which is educational resource produced by OpenAI. TAC generalizes the standard Shannon-Gibbs entropy maximization in RL to the Tsallis entropy.
Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning \and Its Application to Soft Mobile Robots," in Proc. of the Robotics: Science and System (RSS), 2020.
sudo apt-get update && sudo apt-get install libopenmpi-dev
virtualenv tacenv --python=python3.5 (--system-site-packages)
You can change "tacenv". If your machine already has tensorflow-gpu package, I reconmmend the option --system-site-packages to use tensorflow-gpu.
pip install gym[mujoco,robotics]
cd tsallis_actor_critic_mujoco
pip install -e .
cd tsallis_actor_critic_mujodo/custom_gym/
pip install -e .
If you want to add a customized environment, see https://github.com/openai/gym/tree/master/gym/envs#how-to-create-new-environments-for-gym
cd tsallis_actor_critic_mujoco
cd spinup/algos/tac
ls
The following files will be shown
tac
├── core.py
├── tac.py
├── tf_tsallis_statistics.py
├── Example_Tsallis_MDPs.ipynb
└── Example_Tsallis_statistics.ipynb
cd tsallis_actor_critic_mujoco
python -m spinup.run tac --env HalfCheetah-v2
cd tsallis_actor_critic_mujoco
python -m spinup.run tac --env HalfCheetah-v2 --exp_name half_tac_alpha_cst_q_1.5_cst_gaussian_q_log --epochs 200 --lr 1e-3 --q 1.5 --pdf_type gaussian --log_type q-log --alpha_schedule constant --q_schedule constant --seed 0 10 20 30 40 50 60 70 80 90
Results will be saved in data folder
[env]_[algorithm]_alpha_[alpha_schedule]_q_[entropic_index]_[q_schedule]_[distribution]_[entropy_type]
This convention will help you not forget a parameter setting. Usage of convention
python -m spinup.run tac --env HalfCheetah-v2 --exp_name [experiment_name]
cd tsallis_actor_critic_mujoco
./shell_scripts/tsallis_half_cheetah.sh
To run mulitple experiments at once, we employ a simple and easy way as follows:
run program_1 & program_2 & ... & program_n