Tensorflow implement of paper: Sequence to Sequence: Video to Text
This repository is not being actively maintained due to lack of time and interest. My sincerest apologies to the open source community for allowing this project to stagnate. I hope it was useful for some of you as a jumping-off point.
I modified the code from jazzsaxmafia, and I have fixed some problems in his code.
$ python extract_feats.py
After this operation, you should split the features into two parts:
train_features
test_features
$ CUDA_VISIBLE_DEVICES=0 ipython
When in the ipython environment, then:
>>> import model_rgb
>>> model_rgb.train()
You should change the training parameters and directory path in the model_rgb.py
>>> import model_rgb
>>> model_rgb.test()
After testing, a text file, "S2VT_results.txt" will generated.
We evaluate the generation results with coco-caption tools.
You can run the shell get_coco_tools.sh
get download the coco tools:
$ ./get_coco_tools.sh
After this, generate the reference json file from ground truth CSV file:
$ python create_reference.py
Then, generate the results json file from S2VT_results.txt
file:
$ python create_result_json.py
Finally, you can evaluate the generation results:
$ python eval.py
Model | METEOR |
---|---|
S2VT(ICCV 2015) | |
-RGB(VGG) | 29.2 |
-Optical Flow(AlexNet) | 24.3 |
Our model | |
-RGB(VGG) | 28.1 |
-Optical Flow(AlexNet) | 23.3 |