NVP Save

Official PyTorch implementation of Scalable Neural Video Representations with Learnable Positional Features (NeurIPS 2022).

Project README

Scalable Neural Video Representations with Learnable Positional Features (NVP)

Official PyTorch implementation of "Scalable Neural Video Representations with Learnable Positional Features" (NeurIPS 2022) by Subin Kim*¹, Sihyun Yu*¹, Jaeho Lee², and Jinwoo Shin¹.

¹KAIST, ²POSTECH

TL;DR: We propose a novel neural representation for videos that is the best of both worlds; achieved high-quality encoding and the compute-/parameter- efficiency simultaneously.

Project Page | Paper

1. Requirements

Environments

Required packages are listed in environment.yaml. Also, you should install the following packages:

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

pip install git+https://github.com/subin-kim-cv/tiny-cuda-nn/#subdirectory=bindings/torch

This repository of tiny-cuda-nn is slightly different from original implementation of tiny-cuda-nn.

Dataset

Download the UVG-HD dataset from the following link:

UVG-HD

Then, extract RGB sequences from the original YUV videos of UVG-HD using ffmpeg. Here, INPUT is the input file name, and OUTPUT is a directory to save decompressed RGB frames.

ffmpeg -f rawvideo -vcodec rawvideo -s 1920x1080 -r 120 -pix_fmt yuv420p -i INPUT.yuv OUTPUT/f%05d.png

2. Training

Run the following script with a single GPU.

CUDA_VISIBLE_DEVICES=0 python experiment_scripts/train_video.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./config/config_nvp_s.json

Option --logging_root denotes the path to save the experiment log.
Option --experiment_name denotes the subdirectory to save the log files (results, checkpoints, configuration, etc.) existed under --logging_root.
Option --dataset denotes the path of RGB sequences (e.g., ~/data/Jockey).
Option --num_frames denotes the number of frames to reconstruct (300 for the ShakeNDry video and 600 for other videos in UVG-HD).
To reconstruct videos with 300 frames, please change the values of t_resolution in configuration file to 300.

3. Evaluation

Evaluation without compression of parameters (i.e., qunatization only).

CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json

Option --save denotes whether to save the reconstructed frames.
One can specify an option --s_interp for a video superresolution results. It denotes the superresolution scale (e.g., 8).
One can specify an option --t_interp for a video frame interpolation results. It denotes the temporal interpolation scale (e.g., 8).

Evaluation with compression of parameters using well-known image and video codecs.

Save the quantized parameters.

CUDA_VISIBLE_DEVICES=0 python experiment_scripts/compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json

Compress the saved sparse positional image-/video-like features using codecs.
- Execute compression.ipynb.
- Please change the logging_root and experiment_name in compression.ipynb appropriately.
- One can change qscale, crf, framerate which changes the compression ratio of sparse positinal features.
  - qscale ranges from 1 to 31, where larger values mean the worse quality (2~5 recommended).
  - crf ranges from 0 to 51 where larger values mean the worse quality (20~25 recommended).
  - framerate (25 or 40 recommended).

Evaluation with the compressed parameters.

CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval_compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES>  --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json --qscale 2 3 3 --crf 21 --framerate 25

Option --save denotes whether to save the reconstructed frames.
Please specify the option --qscale, --crf, --framerate as same with the values in the compression.ipynb.

4. Results

Reconstructed video results of NVP on UVG-HD, and other 4K/long/temporally dynamic videos are available at the following project page.

Our model achieves the following performance on UVG-HD with a single NVIDIA V100 32GB GPU:

Encoding Time	BPP	PSNR (↑)	FLIP (↓)	LPIPS (↓)
~5 minutes	0.901	34.57 $\pm$ 2.62	0.075 $\pm$ 0.021	0.190 $\pm$ 0.100
~10 minutes	0.901	35.79 $\pm$ 2.31	0.065 $\pm$ 0.016	0.160 $\pm$ 0.098
~1 hour	0.901	37.61 $\pm$ 2.20	0.052 $\pm$ 0.011	0.145 $\pm$ 0.106
~8 hours	0.210	36.46 $\pm$ 2.18	0.067 $\pm$ 0.017	0.135 $\pm$ 0.083

The reported values are averaged over the Beauty, Bosphorus, Honeybee, Jockey, ReadySetGo, ShakeNDry, and Yachtride videos in UVG-HD and measured using LPIPS, FLIP repositories.

One can download the pretrained checkpoints from the following link

Citation

@inproceedings{
    kim2022scalable,
    title={Scalable Neural Video Representations with Learnable Positional Features},
    author={Kim, Subin and Yu, Sihyun and Lee, Jaeho and Shin, Jinwoo},
    booktitle={Advances in Neural Information Processing Systems},
    year={2022},
}

References

We used the code from following repositories: SIREN, Modulation, tiny-cuda-nn.

Open Source Agenda is not affiliated with "NVP" Project. README Source: subin-kim-cv/NVP

Stars

Open Issues

Last Commit

1 month ago

Repository

subin-kim-cv/NVP

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/nvp"><img src="https://www.opensourceagenda.com/projects/nvp/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022