[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
[Mar 24, 2022] We present VideoMAE, a new SOTA on Kinetics, Something-Something, and AVA.
[Dec 1, 2021] We update the TDN-ResNet101 on SSV2 in model zoo.
[Mar 5, 2021] TDN has been accepted by CVPR 2021.
[Dec 26, 2020] We have released the PyTorch code of TDN.
We release the PyTorch code of the TDN(Temporal Difference Networks). This code is based on the TSN and TSM codebase. The core code to implement the Temporal Difference Module are ops/base_module.py
and ops/tdn_net.py
.
TL; DR. We generalize the idea of RGB difference to devise an efficient temporal difference module (TDM) for motion modeling in videos, and provide an alternative to 3D convolutions by systematically presenting principled and detailed module design.
The code is built with following libraries:
We have successfully trained TDN on Kinetics400, UCF101, HMDB51, Something-Something-V1 and V2 with this codebase.
The processing of Something-Something-V1 & V2 can be summarized into 3 steps:
dataset_root/frames/video_1 num_frames label_1
dataset_root/frames/video_2 num_frames label_2
dataset_root/frames/video_3 num_frames label_3
...
dataset_root/frames/video_N num_frames label_N
ops/dataset_configs.py
.The processing of Kinetics400 can be summarized into 3 steps:
dataset_root/video_1.mp4 label_1
dataset_root/video_2.mp4 label_2
dataset_root/video_3.mp4 label_3
...
dataset_root/video_N.mp4 label_N
ops/dataset_configs.py
.Note: We use decord to decode the Kinetics videos on the fly.
Here we provide some off-the-shelf pretrained models. The accuracy might vary a little bit compared to the paper, since the raw video of Kinetics downloaded by users may have some differences.
Model | Frames x Crops x Clips | Top-1 | Top-5 | checkpoint |
---|---|---|---|---|
TDN-ResNet50 | 8x1x1 | 52.3% | 80.6% | link |
TDN-ResNet50 | 16x1x1 | 53.9% | 82.1% | link |
Model | Frames x Crops x Clips | Top-1 | Top-5 | checkpoint |
---|---|---|---|---|
TDN-ResNet50 | 8x1x1 | 64.0% | 88.8% | link |
TDN-ResNet50 | 16x1x1 | 65.3% | 89.7% | link |
TDN-ResNet101 | 8x1x1 | 65.8% | 90.2% | link |
8x3x1 | 67.1% | 90.5% | - | |
TDN-ResNet101 | 16x1x1 | 66.9% | 90.9% | link |
16x3x1 | 68.2% | 91.6% | - | |
TDN-ResNet101 | (8+16)x1x1 | 68.2% | 91.6% | - |
(8+16)x3x1 | 69.6% | 92.2% | - |
Model | Frames x Crops x Clips | Top-1 (30 view) | Top-5 (30 view) | checkpoint |
---|---|---|---|---|
TDN-ResNet50 | 8x3x10 | 76.6% | 92.8% | link |
TDN-ResNet50 | 16x3x10 | 77.5% | 93.2% | link |
TDN-ResNet101 | 8x3x10 | 77.5% | 93.6% | link |
TDN-ResNet101 | 16x3x10 | 78.5% | 93.9% | link |
CUDA_VISIBLE_DEVICES=0 python3 test_models_center_crop.py something \
--archs='resnet50' --weights <your_checkpoint_path> --test_segments=8 \
--test_crops=1 --batch_size=16 --gpus 0 --output_dir <your_pkl_path> -j 4 --clip_index=0
python3 pkl_to_results.py --num_clips 1 --test_crops 1 --output_dir <your_pkl_path>
CUDA_VISIBLE_DEVICES=0 python3 test_models_three_crops.py kinetics \
--archs='resnet50' --weights <your_checkpoint_path> --test_segments=8 \
--test_crops=3 --batch_size=16 --full_res --gpus 0 --output_dir <your_pkl_path> \
-j 4 --clip_index <your_clip_index>
python pkl_to_results.py --num_clips 10 --test_crops 3 --output_dir <your_pkl_path>
This implementation supports multi-gpu, DistributedDataParallel
training, which is faster and simpler.
python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
main.py something RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.01 \
--lr_scheduler step --lr_steps 30 45 55 --epochs 60 --batch-size 8 \
--wd 5e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb
python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
main.py kinetics RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
--lr_scheduler step --lr_steps 50 75 90 --epochs 100 --batch-size 16 \
--wd 1e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb
We especially thank the contributors of the TSN and TSM codebase for providing helpful code.
This repository is released under the Apache-2.0. license as found in the LICENSE file.
If you think our work is useful, please feel free to cite our paper 😆 :
@InProceedings{Wang_2021_CVPR,
author = {Wang, Limin and Tong, Zhan and Ji, Bin and Wu, Gangshan},
title = {TDN: Temporal Difference Networks for Efficient Action Recognition},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {1895-1904}
}