Implementation of Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
Tensorflow Implementation of Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space, (Nips) In this implementation included VGG16-LSTM baseline with beam search, Normal prior CVAE, GMM prior CVAE and AG-CVAE.
Training:
You will need to download image net weights for VGG16 first:https://yadi.sk/d/V6Rfzfei3TdKCH
Specify your mscoco directory in utils/parameters.py and launch:
python main.py --gpu 'your gpu'
It will train Normal CVAE prior model without fine-tuning, the best achieved result with using cluster vectors without fine-tuning is CIDER~0.8. Better results will be possible with some fine-tuning. If you want to train a model with fine-tuning, you can specify --fine_tune parameter.
Note: train/validation split can be changed simply by setting gen_val_captions parameter. Default is set to 4000 so we will have ~120000 in training set.
Note2: You will need to launch preprocess.py script first to obtain images hdf5 file. It is done for speed up image loading during fine-tuning the model.
Parameters can be set directly in in utils/parameters.py file. (or specify through command line parameters). For example, if you want to train AG-CVAE model, which use cluster vectors as input to encoder and decoder, you can call:
python main.py --gpu 0 --embed_dim 256 --dec_hid 512 --epochs 50 --temperature 0.6 --gen_name ag --dec_drop 0.7 --dec_lstm_drop 0.7 --lr 0.001 --checkpoint ag_cv_test1 --coco_dir "/home/username/mscoco/coco/" --optimizer Adam --sample_gen greedy --c_v --prior AG
Two options:
After some training just launch:
python main.py --gpu 'your gpu' --mode inference
If you used fine-tuning will need just to add --fine_tune to the parameters:
python main.py --gpu 'your gpu' --mode inference --fine_tune
It will produce json file ready to use with mscoco evaluation tool
For list of required parameters:
python gen_caption.py -h
For example:
python -i gen_caption.py --img_path ./images/COCO_val2014_000000233527.jpg --checkpoint ./checkpoints/gaussian_nocv.ckpt --params_path ./pickles/params_Normal_False_gaussian_nocv_False
Where:
Trained CVAE without cluster vectors checkpoint + parameters file can be downloaded at: https://yadi.sk/d/TCyXUmKk3SPVtc