The Official PyTorch implementation of "3D Human Action Representation Learning via Cross-View Consistency Pursuit" in CVPR 2021
The Official PyTorch implementation of "3D Human Action Representation Learning via Cross-View Consistency Pursuit" in CVPR 2021. The arXiv version of our paper is coming soon.
We only test our code on the following environment:
# Install python environment
$ conda create -n crossclr python=3.8.2
$ conda activate crossclr
# Install PyTorch
$ pip install torch==1.4.0
# Download our code
$ git clone https://github.com/LinguoLi/CrosSCLR.git
$ cd CrosSCLR
# Install torchlight
$ cd torchlight
$ python setup.py install
$ cd ..
# Install other python libraries
$ pip install -r requirements.txt
We use NTU RGB+D and NTU RGB+D 120 as our datasets.
Please click here for more information about accessing the "NTU RGB+D" and "NTU RGB+D 120" datasets.
Only the 3D skeleton modality is required in our experiments, you could also obtain it via NTURGB-D.
Please put the raw data in the directory <path to nturgbd+d_skeletons>
and build the NTU RGB+D database as:
# generate raw database for NTU-RGB+D
$ python tools/ntu_gendata.py --data_path <path to nturgbd+d_skeletons>
# preprocess the above data for our method (for limited computing power, we resize the data to 50 frames)
$ python feeder/preprocess_ntu.py
.yaml
files in config/
folder.
# train on NTU-RGB+D xview
$ python main.py pretrain_crossclr_3views --config config/CrosSCLR/crossclr_3views_xview.yaml
weights/
.yaml
files in config/linear_eval
folder.
# evaluate pre-trained model on NTU-RGB+D xview
$ python main.py linear_evaluation --config config/linear_eval/linear_eval_crossclr_3views_xview.yaml --weights <path to weights>
# evaluate the provided pre-trained model
$ python main.py linear_evaluation --config config/linear_eval/linear_eval_crossclr_3views_xview.yaml --weights weights/crossclr_3views_xview_frame50_channel16_cross150_epoch300.pt
The Top-1 accuracy results on two datasets for the linear evaluation of our methods are shown here:
Model | NTU 60 xsub (%) | NTU 60 xview (%) | NTU 120 xsub (%) | NTU 120 xset (%) |
---|---|---|---|---|
SkeletonCLR | 68.3 | 76.4 | - | - |
2s-CrosSCLR | 74.5 | 82.1 | - | - |
3s-CrosSCLR | 77.8 | 83.4 | 67.9 | 66.7 |
The t-SNE visualization of the embeddings during SkeletonCLR and CrosSCLR pre-training.
Please cite our paper if you find this repository useful in your resesarch:
@inproceedings{li2021crossclr,
Title = {3D Human Action Representation Learning via Cross-View Consistency Pursuit},
Author = {Linguo, Li and Minsi, Wang and Bingbing, Ni and Hang, Wang and Jiancheng, Yang and Wenjun, Zhang},
Booktitle = {CVPR},
Year = {2021}
}