This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
This repository contains my implementation of a video captioning system. This system takes as input a video and generates a caption describing the event in the video.
I took inspiration from Sequence to Sequence -- Video to Text, a video captioning work proposed by researchers at the University of Texas, Austin.
For running my code and reproducing the results, the following packages need to be installed first. I have used Python 2.7 for the whole of this project.
Packages:
Attached below is the architecture diagram of S2VT as given in their paper.
The working of the system while generating a caption for a given video is represented below diagrammatically.
Attached below are a few screenshots from caption generation for videos from the validation set.
Even though S2VT was trained on MSVD, M-VAD and MPII-MD, I have trained my system only on MSVD, which can be downloaded here.
A demo of my system can be found here