[ICCV 2021, Oral 3%] Official repository of XVFI
This is the official repository of XVFI (eXtreme Video Frame Interpolation)
[ArXiv_ver.] [ICCV2021_ver.] [Supp.] [Demo(YouTube)] [Oral12mins(YouTube)] [Flowframes(GUI)] [Poster]
Last Update: 20211130 - We provide extended input sequences for X-TEST. Please refer to X4K1000FPS
We provide the training and test code along with the trained weights and the dataset (train+test) used for XVFI. If you find this repository useful, please consider citing our paper.
The 4K@30fps input frames are interpolated to be 4K@240fps frames. All results are encoded at 30fps to be played as x8 slow motion and spatially down-scaled due to the limit of file sizes. All methods are trained on X-TRAIN.
Some examples of X4K1000FPS dataset, which are frames of 1000-fps and 4K-resolution. Our dataset contains the various scenes with extreme motions. (Displayed in spatiotemporally subsampled .gif files)
We provide our X4K1000FPS dataset which consists of X-TEST and X-TRAIN. Please refer to our main/suppl. paper for the details of the dataset. You can download the dataset from this dropbox link.
X-TEST
consists of 15 video clips with 33-length of 4K-1000fps frames. It follows the below directory format:
├──── YOUR_DIR/
├──── test/
├──── Type1/
├──── TEST01/
├──── 0000.png
├──── ...
└──── 0032.png
├──── TEST02/
├──── 0000.png
├──── ...
└──── 0032.png
├──── ...
├──── ...
Extended version of X-TEST
issue#9.
As described in our paper, we assume that the number of input frames for VFI is fixed to 2 in X-TEST. However, for the VFI methods that require more than 2 input frames, we provide an extended version of X-TEST which contains 8 input frames (in a temporal distance of 32 frames) for each test seqeuence. The middle two adjacent frames among the 8 frames are the same input frames in the original X-TEST. To sort .png files properly by their file names, we added 1000 to the frame indices (e.g. '0000.png' and '0032.png' in the original version of X-TEST correspond to '1000.png' and '1032.png', respectively, in the extended version of X-TEST). Please note that the extended one consists of input frames only, without the ground truth intermediate frames ('1001.png'~'1031.png'). In addition, for the sequence 'TEST11_078_f4977', '1064.png', '1096.png' and '1128.png' are replicated frames since '1064.png' is the last frame of the raw video file.
The extended version of X-TEST can be downloaded from the link.
X-TRAIN
consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames. Each frame is the size of 768x768 cropped from 4K frame. It follows the below directory format:
├──── YOUR_DIR/
├──── train/
├──── 002/
├──── occ008.320/
├──── 0000.png
├──── ...
└──── 0064.png
├──── occ008.322/
├──── 0000.png
├──── ...
└──── 0064.png
├──── ...
├──── ...
After downloading the files from the link, decompress the encoded_test.tar.gz
and encoded_train.tar.gz
. The resulting .mp4 files can be decoded into .png files via running mp4_decoding.py
. Please follow the instruction written in mp4_decoding.py
.
Our code is implemented using PyTorch1.7, and was tested under the following setting:
Caution: since there is "align_corners" option in "nn.functional.interpolate" and "nn.functional.grid_sample" in PyTorch1.7, we recommend you to follow our settings. Especially, if you use the other PyTorch versions, it may lead to yield a different performance.
XVFI
└── checkpoint_dir
└── XVFInet_X4K1000FPS_exp1
├── XVFInet_X4K1000FPS_exp1_latest.pt
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8
==> It would yield (PSNR/SSIM/tOF) = (30.12/0.870/2.15).
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 3 --multiple 8
==> It would yield (PSNR/SSIM/tOF) = (28.86/0.858/2.67).
XVFI
└── vimeo_triplet
├── sequences
readme.txt
tri_testlist.txt
tri_trainlist.txt
XVFI
└── checkpoint_dir
└── XVFInet_Vimeo_exp1
├── XVFInet_Vimeo_exp1_latest.pt
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 2
==> It would yield PSNR = 35.07 on Vimeo90K.
XVFI
└── custom_path
├── scene1
├── 'xxx.png'
├── ...
└── 'xxx.png'
...
├── sceneN
├── 'xxxxx.png'
├── ...
└── 'xxxxx.png'
Download the pre-trained weights trained on X-TRAIN or Vimeo90K as decribed above.
Run main.py with the following options in parse_args (ex) x8 Multi-Frame Interpolation):
# For the model trained on X-TRAIN
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8 --custom_path './custom_path'
# For the model trained on Vimeo90K
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 8 --custom_path './custom_path'
XVFI
└── X4K1000FPS
├── train
├── 002
├── ...
└── 172
├── val
├── Type1
├── Type2
├── Type3
├── test
├── Type1
├── Type2
├── Type3
python main.py --phase 'train' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_trn 3 --S_tst 5
XVFI
└── vimeo_triplet
├── sequences
readme.txt
tri_testlist.txt
tri_trainlist.txt
python main.py --phase 'train' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_trn 1 --S_tst 1
Hyeonjun Sim*, Jihyong Oh*, and Munchurl Kim "XVFI: eXtreme Video Frame Interpolation", In ICCV, 2021. (* equal contribution)
BibTeX
@inproceedings{sim2021xvfi,
title={XVFI: eXtreme Video Frame Interpolation},
author={Sim, Hyeonjun and Oh, Jihyong and Kim, Munchurl},
booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
year={2021}
}
If you have any question, please send an email to either
[Hyeonjun Sim] - [email protected] or
[Jihyong Oh] - [email protected].
The source codes and datasets can be freely used for research and education only. Any commercial use should get formal permission first.