Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)
Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
Pilhyeon Lee (Yonsei Univ.), Hyeran Byun (Yonsei Univ.)Paper: https://arxiv.org/abs/2108.05029
Abstract: We tackle the problem of localizing temporal intervals of actions with only a single frame label for each action instance for training. Owing to label sparsity, existing work fails to learn action completeness, resulting in fragmentary action predictions. In this paper, we propose a novel framework, where dense pseudo-labels are generated to provide completeness guidance for the model. Concretely, we first select pseudo background points to supplement point-level action labels. Then, by taking the points as seeds, we search for the optimal sequence that is likely to contain complete action instances while agreeing with the seeds. To learn completeness from the obtained sequence, we introduce two novel losses that contrast action instances with background ones in terms of action score and feature similarity, respectively. Experimental results demonstrate that our completeness guidance indeed helps the model to locate complete action instances, leading to large performance gains especially under high IoU thresholds. Moreover, we demonstrate the superiority of our method over existing state-of-the-art methods on four benchmarks: THUMOS'14, GTEA, BEOID, and ActivityNet. Notably, our method even performs comparably to recent fully-supervised methods, at the 6 times cheaper annotation cost.
You can set up the environments by using $ pip3 install -r requirements.txt
.
Prepare THUMOS'14 dataset.
Extract features with two-stream I3D networks
Place the features inside the dataset
folder.
├── dataset
└── THUMOS14
├── gt.json
├── split_train.txt
├── split_test.txt
├── fps_dict.json
├── point_gaussian
└── point_labels.csv
└── features
├── train
├── rgb
├── video_validation_0000051.npy
├── video_validation_0000052.npy
└── ...
└── flow
├── video_validation_0000051.npy
├── video_validation_0000052.npy
└── ...
└── test
├── rgb
├── video_test_0000004.npy
├── video_test_0000006.npy
└── ...
└── flow
├── video_test_0000004.npy
├── video_test_0000006.npy
└── ...
You can easily train and evaluate the model by running the script below.
If you want to try other training options, please refer to options.py
.
$ bash run.sh
The pre-trained model can be found here. You can evaluate the model by running the command below.
$ bash run_eval.sh
We note that this repo was built upon our previous models.
We referenced the repos below for the code.
In addition, we referenced a part of code in the following repo for the greedy algorithm implementation.
If you find this code useful, please cite our paper.
@inproceedings{lee2021completeness,
title={Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization},
author={Pilhyeon Lee and Hyeran Byun},
booktitle={IEEE/CVF International Conference on Computer Vision},
year={2021},
}
If you have any question or comment, please contact the first author of the paper - Pilhyeon Lee ([email protected]).