An asynchronous/parallel method of AlphaGo Zero algorithm with Gomoku
This repo is based on junxiaosong/AlphaZero_Gomoku, sincerely grateful for it.
I do these things:
Strength
References
Blog
tensorflow/tensorlayer/pygame install :
pip install tensorflow
pip install tensorlayer
pip install pygame
mpi4py install click here
mpi4py on windows click here
python human_play.py
mpiexec -np 3 python -u human_play_mpi.py
python train.py
mpiexec -np 43 python -u train_mpi.py
It's almost no difference between AlphaGo Zero except APV-MCTS. A PPT can be found in dir demo/slides
Most settings are the same with AlphaGo Zero, details as follow :
Network Structure
Current model uses 19 residual blocks, more blocks means more accurate prediction but also slower speed
The number of filters in convolutional layer shows in the follow picture
Feature Planes
game_board.py
Dirichlet Noise
Parameters in Detail
I try to maintain the original parameters in AlphaGo Zero paper, so as to testify it's generalization. Besides, I also take training time and computer configuration into consideration.
Parameters Setting | Gomoku | AlphaGo Zero |
---|---|---|
MPI num | 43 | - |
c_puct | 5 | 5 |
n_playout | 400 | 1600 |
blocks | 19 | 19/39 |
buffer size | 500,000(data) | 500,000(games) |
batch_size | 512 | 2048 |
lr | 0.001 | annealed |
optimizer | Adam | SGD with momentum |
dirichlet noise | 0.3 | 0.03 |
weight of noise | 0.25 | 0.25 |
first n move | 12 | 30 |
Training detials