Self Supervised Temporal Discriminative Representation Learning For Video Action Recognition Save

[Arxiv2020] The code for our paper 《Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition》 https://arxiv.org/abs/2008.02129

Project README

Self-Supervised Temporal-Discriminative Representation Learning

The source code for our paper

"Self-Supervised Temporal-Discriminative Representation Learning for Video Action Recognition" paper

Overview

Without one label available, our method learn to focus on motion region powerful!

example

Our self-supervised VTDL signifcantly outperforms existing self-supervised learning method in video action recognition, even achieve better result than fully-supervised methods on UCF101 and HMDB51 when a small-scale video dataset (with only thousands of videos) is used for pre-training!

sample_acc.png

Requirements

  • Python3
  • pytorch1.1+
  • PIL

Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
  • experiments
    • logs: experiments record in detials
    • TemporalDis
      • hmdb51
      • ucf101
      • kinetics
    • gradientes:
    • visualization
  • src
    • data: load data
    • loss: the loss evluate in this paper
    • model: network architectures
    • scripts: train/eval scripts
    • TC: detail implementation of Spatio-temporal consistency
    • utils
    • feature_extract.py
    • main.py
    • trainer.py
    • option.py

Dataset

Look dataset.md. Prepare dataset in txt file, and each row of txt is as below: The split of hmdb51/ucf101/kinetics-400 can be download from google driver.

Each item include

video_path class frames_num

VTDL

Network Architecture

The network is in the folder src/model/[backbone].py

Method #logits_channel
C3D 512
R2P1D 2048
I3D 1024
R3D 2048

Step1: self-supervised learning

HMDB51

bash scripts/TemporalDisc/hmdb51.sh

UCF101

bash scripts/TemporalDisc/ucf101.sh

Kinetics-400

bash scripts/TemporalDisc/kinetics.sh

Notice: More Training Options and ablation study Can be find in scripts

Step2: Transfer to action recognition

HMDB51

#!/usr/bin/env bash
python main.py \
--method ft \
--train_list ../datasets/lists/hmdb51/hmdb51_rgb_train_split_1.txt \
--val_list ../datasets/lists/hmdb51/hmdb51_rgb_val_split_1.txt \
--dataset hmdb51 \
--arch i3d \
--mode rgb \
--lr 0.001 \
--lr_steps 10 20 25 30 35 40 \
--epochs 45 \
--batch_size 4 \
--data_length 64 \
--workers 8 \
--dropout 0.5 \
--gpus 2 \
--logs_path ../experiments/logs/hmdb51_i3d_ft \
--print-freq 100 \
--weights ../experiments/TemporalDis/hmdb51/models/04-16-2328_aug_CJ/ckpt_epoch_48.pth

UCF101

#!/usr/bin/env bash
python main.py \
--method ft \
--train_list ../datasets/lists/ucf101/ucf101_rgb_train_split_1.txt \
--val_list ../datasets/lists/ucf101/ucf101_rgb_val_split_1.txt \
--dataset ucf101 \
--arch i3d \
--mode rgb \
--lr 0.0005 \
--lr_steps 10 20 25 30 35 40 \
--epochs 45 \
--batch_size 4 \
--data_length 64 \
--workers 8 \
--dropout 0.5 \
--gpus 2 \
--logs_path ../experiments/logs/ucf101_i3d_ft \
--print-freq 100 \
--weights ../experiments/TemporalDis/ucf101/models/04-18-2208_aug_CJ/ckpt_epoch_45.pth

Notice: More Training Options and ablation study Can be find in scripts

Results

Step2:Transfer

With same experiment setting, the result is reported below:

Method UCF101 HMDB51
Baseline 60.3 22.6
+ BA 63.3 26.2
+ Temporal Discriminative 72.7 41.2
+ TCA 82.3 52.9

trained models/logs/performance

We provided trained models/logs/performance in google driver.

Baseline + BA

BA_fine_tune_performance.png

performance;

trained_model;

logs

Baseline + BA + Temporal Discriminative

wo_TCA_fine_tune_performance.png

performance;

trained_model;

logs

Baseline + BA + Temporal Discriminative + TCA

(a). Pretrain

Loss curve:

loss.png

Ins Prob:

prob.png

pretrained_weight

This pretrained model can achieve 52.7% on HMDB51.

(b). Finetune

VTDL_fine_tune_performance.png

performance;

trained_model;

logs

The result is report with single video clip. In the test, we will average ten clips as final predictions. Will lead to around 2-3% improvement.

python test.py

Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim]

Citation

Please cite our paper if you find this code useful for your research.

@Article{wang2020self,
  author  = {Jinpeng Wang and Yiqi Lin and Andy J. Ma and Pong C. Yuen},
  title   = {Self-supervised Temporal Discriminative Learning for Video Representation Learning},
  journal = {arXiv preprint arXiv:2008.02129},
  year    = {2020},
}

Others

The project is partly based on Unsupervised Embedding Learning and MOCO.

Open Source Agenda is not affiliated with "Self Supervised Temporal Discriminative Representation Learning For Video Action Recognition" Project. README Source: FingerRec/Self-Supervised-Temporal-Discriminative-Representation-Learning-for-Video-Action-Recognition

Open Source Agenda Badge

Open Source Agenda Rating