Source code for Delving Deeper into the Decoder for Video Captioning
This repository is the source code for the paper named Delving Deeper into the Decoder for Video Captioning.
The paper has been accepted by ECAI 2020. The encoder-decoder framework is the most popular paradigm for video captioning task. There still exist some non-negligible problems in the decoder of a video captioning model. We propose three methods to improve the performance of the model.
It is demonstrated in the experiments of MSVD and MSR-VTT datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 11.7% on MSVD and 5% on MSR-VTT compared with the previous state-of-the-art models.
If you need more information about how to generate training, validating and testing data for the datasets, please refer to Semantics-AssistedVideoCaptioning.
cd path_to_directory_of_model; mkdir saves
run_model.sh
is used for training or testing models.
Specify the GPU you want to use by modifying CUDA_VISIBLE_DEVICES
value. name
will be used in the name of saved model during training. Specify the needed data paths by modifying corpus
, ecores
, tag
and ref
values. test
refers to the path of the saved model which is to be tested. Do not give a parameter to test
if you want to train a model.bash run_model.sh
for training or testing.@article{chen2020delving,
title={Delving Deeper into the Decoder for Video Captioning},
author={Haoran Chen and Jianmin Li and Xiaolin Hu},
journal={CoRR},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2001.05614},
eprint={2001.05614},
year={2020}
}