TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization
This is the official implementation of the paper TemporalMaxer.
We release the training and testing code for THUMOS, EPIC-Kitchen 100 (verb, noun), and MultiTHUMOS datasets.
Recent studies have emphasized the importance of applying long-term temporal context modeling (TCM) blocks to the extracted video clip features such as employing complex self-attention mechanisms. In this paper, we present the simplest method ever to address this task and argue that the extracted video clip features are already informative to achieve outstanding performance without sophisticated architectures. To this end, we introduce TemporalMaxer, which minimizes long-term temporal context modeling while maximizing information from the extracted video clip features with a basic, parameter-free, and local region operating max-pooling block. Picking out only the most critical information for adjacent and local clip embeddings, this block results in a more efficient TAL model. We demonstrate that TemporalMaxer outperforms other state-of-the-art methods that utilize long-term TCM such as self-attention on various TAL datasets while requiring significantly fewer parameters and computational resources.
conda create -n TemporalMaxer python=3.9
conda activate TemporalMaxer
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y
python -m pip install -r requirements.txt
pip install -e ./
Part of NMS is implemented in C++. The code can be compiled by
cd ./libs/utils; python setup.py install; cd ../..
The code should be recompiled every time you update PyTorch.
6cbf312eb5025c0abbbf5d0eaa61e556
.data
folder in the current code directory.# This folder
├── configs
│ ├── temporalmaxer_epic_slowfast_noun.yaml
│ └── temporalmaxer_epic_slowfast_verb.yaml
│ └── ........
├── data
│ ├── epic_kitchens
│ ├── annotations
│ └── features
├── eval.py
├── figures
├── libs
........
VERB
# training
./scripts/epic_verb/train.sh
# testing
./scripts/epic_verb/test.sh
NOUN
# training
./scripts/epic_noun/train.sh
# testing
./scripts/epic_noun/test.sh
Method | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | Avg |
---|---|---|---|---|---|---|
TemporalMaxer (verb) | 27.8 | 26.6 | 25.3 | 23.1 | 19.9 | 24.5 |
TemporalMaxer (noun) | 26.3 | 25.2 | 23.5 | 21.3 | 17.6 | 22.8 |
1f71c37dba55d549e4b02841d0dcf603
.data
folder in the current code directory.# This folder
├── configs
│ └── ........
│ └── temporalmaxer_thumos_i3d.yaml
├── data
│ ├── thumos
│ ├── annotations
│ └── i3d_features
├── eval.py
├── figures
├── libs
........
# training
./scripts/thumos/train.sh
# testing
./scripts/thumos/test.sh
Method | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Avg |
---|---|---|---|---|---|---|
TemporalMaxer | 82.8 | 78.9 | 71.8 | 60.5 | 44.7 | 67.7 |
5b2477678abd612440b6099e349442ad
.data
folder in the current code directory.# This folder
├── configs
│ └── ........
│ └── temporalmaxer_multithumos_i3d.yaml
│ └── ........
├── data
│ ├── thumos
│ │ ├── annotations
│ │ └── i3d_features
│ ├── multithumos
│ ├── ........
│ └── multithumos.json
│ └── ........
├── eval.py
├── figures
├── libs
........
# training
./scripts/multithumos/train.sh
# testing
./scripts/multithumos/test.sh
Method | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | Avg |
---|---|---|---|---|---|---|---|---|---|---|
TemporalMaxer | 49.1 | 47.5 | 44.3 | 39.4 | 33.4 | 26.5 | 17.4 | 9.1 | 2.24 | 29.9 |
Please cite the paper in your publications if it helps your research:
@article{tang2023temporalmaxer,
title={TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization},
author={Tang, Tuan N and Kim, Kwonyoung and Sohn, Kwanghoon},
journal={arXiv preprint arXiv:2303.09055},
year={2023}
}