NVP Save

Official PyTorch implementation of Scalable Neural Video Representations with Learnable Positional Features (NeurIPS 2022).

Project README

Scalable Neural Video Representations with Learnable Positional Features (NVP)

Official PyTorch implementation of "Scalable Neural Video Representations with Learnable Positional Features" (NeurIPS 2022) by Subin Kim*1, Sihyun Yu*1, Jaeho Lee2, and Jinwoo Shin1.

1KAIST, 2POSTECH

TL;DR: We propose a novel neural representation for videos that is the best of both worlds; achieved high-quality encoding and the compute-/parameter- efficiency simultaneously.

Project Page | Paper

1. Requirements

Environments

Required packages are listed in environment.yaml. Also, you should install the following packages:

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

pip install git+https://github.com/subin-kim-cv/tiny-cuda-nn/#subdirectory=bindings/torch

Dataset

Download the UVG-HD dataset from the following link:

Then, extract RGB sequences from the original YUV videos of UVG-HD using ffmpeg. Here, INPUT is the input file name, and OUTPUT is a directory to save decompressed RGB frames.

ffmpeg -f rawvideo -vcodec rawvideo -s 1920x1080 -r 120 -pix_fmt yuv420p -i INPUT.yuv OUTPUT/f%05d.png

2. Training

Run the following script with a single GPU.

CUDA_VISIBLE_DEVICES=0 python experiment_scripts/train_video.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./config/config_nvp_s.json 
  • Option --logging_root denotes the path to save the experiment log.
  • Option --experiment_name denotes the subdirectory to save the log files (results, checkpoints, configuration, etc.) existed under --logging_root.
  • Option --dataset denotes the path of RGB sequences (e.g., ~/data/Jockey).
  • Option --num_frames denotes the number of frames to reconstruct (300 for the ShakeNDry video and 600 for other videos in UVG-HD).
  • To reconstruct videos with 300 frames, please change the values of t_resolution in configuration file to 300.

3. Evaluation

Evaluation without compression of parameters (i.e., qunatization only).

CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json   
  • Option --save denotes whether to save the reconstructed frames.
  • One can specify an option --s_interp for a video superresolution results. It denotes the superresolution scale (e.g., 8).
  • One can specify an option --t_interp for a video frame interpolation results. It denotes the temporal interpolation scale (e.g., 8).

Evaluation with compression of parameters using well-known image and video codecs.

  1. Save the quantized parameters.

    CUDA_VISIBLE_DEVICES=0 python experiment_scripts/compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json  
    
  2. Compress the saved sparse positional image-/video-like features using codecs.

    • Execute compression.ipynb.
    • Please change the logging_root and experiment_name in compression.ipynb appropriately.
    • One can change qscale, crf, framerate which changes the compression ratio of sparse positinal features.
      • qscale ranges from 1 to 31, where larger values mean the worse quality (2~5 recommended).
      • crf ranges from 0 to 51 where larger values mean the worse quality (20~25 recommended).
      • framerate (25 or 40 recommended).
  3. Evaluation with the compressed parameters.

    CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval_compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES>  --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json --qscale 2 3 3 --crf 21 --framerate 25
    
    • Option --save denotes whether to save the reconstructed frames.
    • Please specify the option --qscale, --crf, --framerate as same with the values in the compression.ipynb.

4. Results

Reconstructed video results of NVP on UVG-HD, and other 4K/long/temporally dynamic videos are available at the following project page.

Our model achieves the following performance on UVG-HD with a single NVIDIA V100 32GB GPU:

Encoding Time BPP PSNR (↑) FLIP (↓) LPIPS (↓)
~5 minutes 0.901 34.57 $\pm$ 2.62 0.075 $\pm$ 0.021 0.190 $\pm$ 0.100
~10 minutes 0.901 35.79 $\pm$ 2.31 0.065 $\pm$ 0.016 0.160 $\pm$ 0.098
~1 hour 0.901 37.61 $\pm$ 2.20 0.052 $\pm$ 0.011 0.145 $\pm$ 0.106
~8 hours 0.210 36.46 $\pm$ 2.18 0.067 $\pm$ 0.017 0.135 $\pm$ 0.083
  • The reported values are averaged over the Beauty, Bosphorus, Honeybee, Jockey, ReadySetGo, ShakeNDry, and Yachtride videos in UVG-HD and measured using LPIPS, FLIP repositories.

One can download the pretrained checkpoints from the following link

Citation

@inproceedings{
    kim2022scalable,
    title={Scalable Neural Video Representations with Learnable Positional Features},
    author={Kim, Subin and Yu, Sihyun and Lee, Jaeho and Shin, Jinwoo},
    booktitle={Advances in Neural Information Processing Systems},
    year={2022},
}

References

We used the code from following repositories: SIREN, Modulation, tiny-cuda-nn.

Open Source Agenda is not affiliated with "NVP" Project. README Source: subin-kim-cv/NVP
Stars
71
Open Issues
1
Last Commit
1 month ago
Repository
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating