Self Supervised Temporal Discriminative Representation Learning For Video Action Recognition Save

[Arxiv2020] The code for our paper 《Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition》 https://arxiv.org/abs/2008.02129

Project README

Self-Supervised Temporal-Discriminative Representation Learning

The source code for our paper

"Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition" paper

Overview

Without one label available, our method learn to focus on motion region powerful!

example

Our self-supervised VTDL signifcantly outperforms existing self-supervised learning method in video action recognition, even achieve better result than fully-supervised methods on UCF101 and HMDB51 when a small-scale video dataset (with only thousands of videos) is used for pre-training!

Requirements

Python3
pytorch1.1+
PIL

Structure

datasets
- list
  - hmdb51: the train/val lists of HMDB51
  - ucf101: the train/val lists of UCF101
  - kinetics-400: the train/val lists of kinetics-400
experiments
- logs: experiments record in detials
- TemporalDis
  - hmdb51
  - ucf101
  - kinetics
- gradientes:
- visualization
src
- data: load data
- loss: the loss evluate in this paper
- model: network architectures
- scripts: train/eval scripts
- TC: detail implementation of Spatio-temporal consistency
- utils
- feature_extract.py
- main.py
- trainer.py
- option.py

Dataset

Look dataset.md. Prepare dataset in txt file, and each row of txt is as below: The split of hmdb51/ucf101/kinetics-400 can be download from google driver.

Each item include

video_path class frames_num

VTDL

Network Architecture

The network is in the folder src/model/[backbone].py

Method	#logits_channel
C3D	512
R2P1D	2048
I3D	1024
R3D	2048

Step1: self-supervised learning

HMDB51

bash scripts/TemporalDisc/hmdb51.sh

UCF101

bash scripts/TemporalDisc/ucf101.sh

Kinetics-400

bash scripts/TemporalDisc/kinetics.sh

Notice: More Training Options and ablation study Can be find in scripts

Step2: Transfer to action recognition

HMDB51

#!/usr/bin/env bash
python main.py \
--method ft \
--train_list ../datasets/lists/hmdb51/hmdb51_rgb_train_split_1.txt \
--val_list ../datasets/lists/hmdb51/hmdb51_rgb_val_split_1.txt \
--dataset hmdb51 \
--arch i3d \
--mode rgb \
--lr 0.001 \
--lr_steps 10 20 25 30 35 40 \
--epochs 45 \
--batch_size 4 \
--data_length 64 \
--workers 8 \
--dropout 0.5 \
--gpus 2 \
--logs_path ../experiments/logs/hmdb51_i3d_ft \
--print-freq 100 \
--weights ../experiments/TemporalDis/hmdb51/models/04-16-2328_aug_CJ/ckpt_epoch_48.pth

UCF101

#!/usr/bin/env bash
python main.py \
--method ft \
--train_list ../datasets/lists/ucf101/ucf101_rgb_train_split_1.txt \
--val_list ../datasets/lists/ucf101/ucf101_rgb_val_split_1.txt \
--dataset ucf101 \
--arch i3d \
--mode rgb \
--lr 0.0005 \
--lr_steps 10 20 25 30 35 40 \
--epochs 45 \
--batch_size 4 \
--data_length 64 \
--workers 8 \
--dropout 0.5 \
--gpus 2 \
--logs_path ../experiments/logs/ucf101_i3d_ft \
--print-freq 100 \
--weights ../experiments/TemporalDis/ucf101/models/04-18-2208_aug_CJ/ckpt_epoch_45.pth

Notice: More Training Options and ablation study Can be find in scripts

Results

Step2:Transfer

With same experiment setting, the result is reported below:

Method	UCF101	HMDB51
Baseline	60.3	22.6
+ BA	63.3	26.2
+ Temporal Discriminative	72.7	41.2
+ TCA	82.3	52.9

trained models/logs/performance

We provided trained models/logs/performance in google driver.

Baseline + BA

performance;

trained_model;

logs

Baseline + BA + Temporal Discriminative

performance;

trained_model;

logs

Baseline + BA + Temporal Discriminative + TCA

(a). Pretrain

Loss curve:

Ins Prob:

pretrained_weight

This pretrained model can achieve 52.7% on HMDB51.

(b). Finetune

performance;

trained_model;

logs

The result is report with single video clip. In the test, we will average ten clips as final predictions. Will lead to around 2-3% improvement.

python test.py

Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim]

Citation

Please cite our paper if you find this code useful for your research.

@Article{wang2020self,
  author  = {Jinpeng Wang and Yiqi Lin and Andy J. Ma and Pong C. Yuen},
  title   = {Self-supervised Temporal Discriminative Learning for Video Representation Learning},
  journal = {arXiv preprint arXiv:2008.02129},
  year    = {2020},
}

Others

The project is partly based on Unsupervised Embedding Learning and MOCO.

Open Source Agenda is not affiliated with "Self Supervised Temporal Discriminative Representation Learning For Video Action Recognition" Project. README Source: FingerRec/Self-Supervised-Temporal-Discriminative-Representation-Learning-for-Video-Action-Recognition

Stars

Open Issues

Last Commit

3 years ago

Repository

FingerRec/Self-Supervised-Temporal-Discriminative-Representation-Learning-for-Video-Action-Recognition

Homepage

https://zhuanlan.zhihu.com/p/176774543

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/self-supervised-temporal-discriminative-representation-learning-for-video-action-recognition"><img src="https://www.opensourceagenda.com/projects/self-supervised-temporal-discriminative-representation-learning-for-video-action-recognition/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022