A Simple, Distributed and Asynchronous Multi-Agent Reinforcement Learning Framework for Google Research Football AI.
This repo provides a simple, distributed and asynchronous multi-agent reinforcement learning framework for the Google Research Football environment. Currently, it is dedicated for Google Research Football environment with the cooperative part implemented in IPPO/MAPPO and the competitive part implemented in PSRO/Simple League. In the future, we will also release codes for other related algorithms and environments.
Our codes are based on Light-MALib, which is a simplified version of MALib with restricted algorithms and environments but certain enhancements, like distributed async-training, league-like multiple population training, detailed tensorboard logging. If you are also interested in other Multi-Agent Learning algorithms and environments, you may also refer to MALib for more details.
Citation
Song, Y., Jiang, H., Tian, Z. et al. An Empirical Study on Google Research Football Multi-agent Scenarios. Mach. Intell. Res. (2024). https://doi.org/10.1007/s11633-023-1426-8
@article{song2024empirical,
title={An Empirical Study on Google Research Football Multi-agent Scenarios},
author={Song, Yan and Jiang, He and Tian, Zheng and Zhang, Haifeng and Zhang, Yingping and Zhu, Jiangcheng and Dai, Zonghong and Zhang, Weinan and Wang, Jun},
journal={Machine Intelligence Research},
pages={1--22},
year={2024},
publisher={Springer}
}
For experiment on academy scenario, please see our new repository : GRF_MARL
You can use any tool to manage your python environment. Here, we use conda as an example.
conda create -n light-malib python==3.9
to create a new conda env.conda activate light-malib
when you want to use it or you can add this line to your .bashrc
file to enable it everytime you login into the bash.setup.py
file), run pip install -r requirement.txt
to install dependencies of Light-MALib.setup.py
file), run pip install .
or pip install -e .
to install Light-MALib.python -c "import gfootball;print(gfootball.__file__)"
or other methods to locate where gfootball
pacakage is.gfootball
pacakage, for example, /home/username/miniconda3/envs/light-malib/lib/python3.8/site-packages/gfootball/
..py
files under scenarios
folder in our repo to scenarios
folder in the gfootball
pacakage.ray start --head
on the master, then connect other machines to the master following the hints from command line output.python light_malib/main_pbt.py --config <config_file_path>
to run a training experiment. An example is given by train_light_malib.sh
.python light_malib/scripts/play_gr_football.py
to run a competition between two models.Beats 1.0 hard bot under multi-agent 11v11 full-game scenraios within 10 hours using IPPO, taking advantage of glitches in built-in logics.
Currently, we provide the following tools for better study in the field of Football AI.
At this stage, we release some of our trained model for use as initializations or opponents. Model files are available on Google Drive and Baidu Wangpan.
DataServer:
alive_usage_mean/std
: mean/std usage of data samples in buffer;mean_wait_time
: total reading waiting time divided reading counts;sample_per_minute_read
: number of samples read per minute;sample_per_minute_write
: number of samples written per minute;PSRO:
Elo
: Elo-rate during PBT;Payoff Table
: plot of payoff table;Rollout:
bad_pass,bad_shot,get_intercepted,get_tackled,good_pass,good_shot,interception,num_pass,num_shot,tackle, total_move,total_pass,total_possession,total_shot
: detailed football statistics;goal_diff
: goal difference of the training agent (positive indicates more goals);lose/win
: expected lose/win rate during rollout;score
: expected scores durig rollout, score for a single game has value 0 if lose, 1 if win and 0.5 if draw;RolloutTimer
batch
: timer for getting a rollout batch;env_core_step
: timer for simulator stepping time;env_step
: total timer for an enviroment step;feature
: timer for feature encoding;inference
: timer for policy inference;policy_update
: timer for pulling policies from remote;reward
: timer for reward calculation;rollout
: total timer for one rollout;sample
: timer for policy sampling;stats
: timer for collecting statistics;Training:
Old_V_max/min/mean/std
: value estimate at rollout;V_max/min/mean/std
: current value estimate;advantage_max/min/mean/std
: Advantage value;approx_kl
: KL divergence between old and new action distributions;clip_ratio
: proportion of clipped entries;delta_max/min/mean/std
: TD error;entropy
: entropy value;imp_weights_max/min/mean/std
: importance weights;kl_diff
: variation of approx_kl
;lower_clip_ratio
: proportion of up-clipping entries;upper_clip_ratio
: proportion of down-clipping entries;policy_loss
: policy loss;training_epoch
: number of training epoch at each iteration;value_loss
: value lossTrainingTimer:
compute_return
: timer for GAE compute;data_copy
: timer for data copy when processing data;data_generator
: timer for generating data;loss
: total timer for loss computing;move_to_gpu
: timer for sending data to GPU;optimize
: total timer for an optimization step;push_policy
: timer for pushing trained policies to the remote;train_step
: total timer for a training step;trainer_data
: timer for get data from local_queue
;trainer_optimize
: timer for a optimization step in the trainer;Under construction, stay tuned :)
If you have any questions about this repo, feel free to leave an issue. You can also contact current maintainers, YanSong97 and DiligentPanda, by email.
Get Interested in our project? Or have great passions in:
Welcome! Why not take a look at https://digitalbrain.cn/talents?
With the leading scientists, enginneers and field experts, we are going to provide Better Decisions for Better World!
Digital Brain Laboratory, Shanghai, is co-founded by the founding partner and chairman of CMC Captital, Mr. Ruigang Li, and world-renowned scientist in the field of decision intelligence, Prof. Jun Wang.