[CVPR 2020 Oral] PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition"
PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition", CVPR 2020 Oral.
[PDF][Demo][Abstract/Supp]
Disk usage warning: after preprocessing, the total sizes of datasets are around 38GB, 77GB, 63GB for NTU RGB+D 60, NTU RGB+D 120, and Kinetics 400, respectively. The raw/intermediate sizes may be larger.
There are 3 datasets to download:
Request dataset here: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp
Download the skeleton-only datasets:
nturgbd_skeletons_s001_to_s017.zip
(NTU RGB+D 60)nturgbd_skeletons_s018_to_s032.zip
(NTU RGB+D 120, on top of NTU RGB+D 60)Download missing skeletons lookup files from the authors' GitHub repo:
NTU RGB+D 60 Missing Skeletons:
wget https://raw.githubusercontent.com/shahroudy/NTURGB-D/master/Matlab/NTU_RGBD_samples_with_missing_skeletons.txt
NTU RGB+D 120 Missing Skeletons:
wget https://raw.githubusercontent.com/shahroudy/NTURGB-D/master/Matlab/NTU_RGBD120_samples_with_missing_skeletons.txt
Remember to remove the first few lines of text in these files!
wget
the dataset from Google DrivePut downloaded data into the following directory structure:
- data/
- kinetics_raw/
- kinetics_train/
...
- kinetics_val/
...
- kinetics_train_label.json
- keintics_val_label.json
- nturgbd_raw/
- nturgb+d_skeletons/ # from `nturgbd_skeletons_s001_to_s017.zip`
...
- nturgb+d_skeletons120/ # from `nturgbd_skeletons_s018_to_s032.zip`
...
- NTU_RGBD_samples_with_missing_skeletons.txt
- NTU_RGBD120_samples_with_missing_skeletons.txt
NTU RGB+D
cd data_gen
python3 ntu_gendata.py
python3 ntu120_gendata.py
Kinetics
python3 kinetics_gendata.py
Generate the bone data with:
python gen_bone_data.py --dataset ntu
python gen_bone_data.py --dataset ntu120
python gen_bone_data.py --dataset kinetics
Download pretrained models for producing the final results on NTU RGB+D 60, NTU RGB+D 120, Kinetics Skeleton 400: [Dropbox][GoogleDrive][WeiYun]
Put the folder of pretrained models at repo root:
- MS-G3D/
- pretrained-models/
- main.py
- ...
bash eval_pretrained.sh
python3 main.py
--config <config file>
--work-dir <place to keep things (weights, checkpoints, logs)>
--device <GPU IDs to use>
--half # Mixed precision training with NVIDIA Apex (default O1) for GPUs ~11GB memory
[--base-lr <base learning rate>]
[--batch-size <batch size>]
[--weight-decay <weight decay>]
[--forward-batch-size <batch size during forward pass, useful if using only 1 GPU>]
[--eval-start <which epoch to start evaluating the model>]
python3 main.py
--config <config file>
--work-dir <place to keep things>
--device <GPU IDs to use>
--weights <path to model weights>
[--test-batch-size <...>]
python3 ensemble.py
--dataset <dataset to ensemble, e.g. ntu120/xsub>
--joint-dir <work_dir of your test command for joint model>
--bone-dir <work_dir of your test command for bone model>
Use the corresponding config files from ./config
to train/test different datasets
Examples
python3 main.py --config ./config/nturgbd120-cross-subject/train_joint.yaml
python3 main.py --config ./config/nturgbd120-cross-subject/train_joint.yaml --batch-size 32 --forward-batch-size 32 --device 0 1
python3 main.py --config ./config/nturgbd120-cross-setup/test_bone.yaml
python3 main.py --config <...> --batch-size 32 --forward-batch-size 16 --device 0
Resume training from checkpoint
python3 main.py
... # Same params as before
--start-epoch <0 indexed epoch>
--weights <weights in work_dir>
--checkpoint <checkpoint in work_dir>
It's recommended to linearly scale up base LR with > 2 GPUs (https://arxiv.org/pdf/1706.02677.pdf, Section 2.1) to use 16 samples per worker during training; e.g.
--base-lr 0.05 --device 0 --batch-size 32 --forward-batch-size 16
--base-lr 0.05 --device 0 1 --batch-size 32 --forward-batch-size 32
--base-lr 0.1 --device 0 1 2 3 --batch-size 64 --forward-batch-size 64
Unfortunately, different PyTorch/CUDA versions & GPU setups can cause different levels of memory usage, and so you may experience out of memory (OOM) on some machines but not others
--half
and --amp-opt-level 1
(default) are relatively more stableIf OOM occurs, try using Apex O2 by setting --amp-opt-level 2
. However, note that
nn.DataParallel
for O2
Default hyperparameters are stored in the config files; you can tune them & add extra training techniques to boost performance
The best joint-bone fusion result may not come from the best single stream models; for example, we provided 3 pretrained models for NTU RGB+D 60 XSub joint stream where the best fusion performance comes from the slightly underperforming model (~89.3%) instead of the reported (~89.4%) and the slightly better retrained model (~89.6%).
This repo is based on
Thanks to the original authors for their work!
Please cite this work if you find it useful:
@inproceedings{liu2020disentangling,
title={Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition},
author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={143--152},
year={2020}
}
Please email kenziyuliu AT outlook.com
for further questions