Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, accepted by AAAI-2020.
Also this repo serves as the Part B of our paper. The Part A is available at this link.
Update
An improved project is released at this link.
A Chinese note at Zhihu
More powerful backbone with higher-res outputs
A bottom-up approach for the problem of multi-person pose estimation.
Install packages:
Python=3.6, Pytorch>1.0, Nvidia Apex and other packages needed.
Download the COCO dataset.
Download the pre-trained models (default configuration: download the pretrained model snapshotted at epoch 52 provided as follow).
Download Link: BaiduCloud
Alternatively, download the pre-trained model without optimizer checkpoint only for the default configuration via GoogleDrive
Change the paths in the code according to your environment.
python demo_image.py
The speed of our system is tested on the MS-COCO test-dev dataset.
The corresponding code is in pure python without multiprocess for now.
python evaluate.py
Results on MSCOCO 2017 test-dev subset (focal L2 loss with gamma=2):
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.685
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.867
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.749
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.664
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.719
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.728
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.892
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.782
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.688
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.784
Before training, prepare the training data using ''SimplePose/data/coco_masks_hdf5.py''.
Multiple GUPs are recommended to use to speed up the training process, but we support different training options.
Most code has been provided already, you can train the model with.
python -m torch.distributed.launch --nproc_per_node=4 train_distributed.py
Note: The loss_model_parrel.py is for train.py and train_parallel.py, while the loss_model.py is for train_distributed.py and train_distributed_SWA.py. They are different in dividing the batch size. Please refer to the code about the different choices.
For distributed training, the real batch_size = batch_size_in_config* × GPU_Num (world_size actually). For others, the real batch_size = batch_size_in_config*. The differences come from the different mechanisms of data parallel training and distributed training.
Faster Version: Chun-Ming Su has rebuilt and improved the post-processing speed of this repo using C++, and the improved system can run up to 7~8 FPS using a single scale with flipping on a 2080 TI GPU. Many thanks to Chun-Ming Su.
Please kindly cite this paper in your publications if it helps your research.
@inproceedings{li2020simple,
title={Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation.},
author={Li, Jia and Su, Wen and Wang, Zengfu},
booktitle={AAAI},
pages={11354--11361},
year={2020}
}