AaronCCWong Show Attend And Tell Save

A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Project README

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

A PyTorch implementation

For a trained model to load into the decoder, use

Some training statistics

BLEU scores for VGG19 (Orange) and ResNet152 (Red) Trained With Teacher Forcing.

BLEU Score Graph Top-K Accuracy Graph
BLEU-1 BLEU-1 Training Top-1 Train TOP-1
BLEU-2 BLEU-2 Training Top-5 Train TOP-5
BLEU-3 BLEU-3 Validation Top-1 Val TOP-1
BLEU-4 BLEU-4 Validation Top-5 Val TOP-5

To Train

This was written in python3 so may not work for python2. Download the COCO dataset training and validation images. Put them in data/coco/imgs/train2014 and data/coco/imgs/val2014 respectively. Put the COCO dataset split JSON file from Deep Visual-Semantic Alignments in data/coco/. It should be named dataset.json.

Run the preprocessing to create the needed JSON files:

python generate_json_data.py

Start the training by running:

python train.py

The models will be saved in model/ and the training statistics will be saved in runs/. To see the training statistics, use:

tensorboard --logdir runs

To Generate Captions

python generate_caption.py --img-path <PATH_TO_IMG> --model <PATH_TO_MODEL_PARAMETERS>

Todo

  • Create image encoder class
  • Create decoder class
  • Create dataset loader
  • Write main function for training and validation
  • Implement attention model
  • Implement decoder feed forward function
  • Write training function
  • Write validation function
  • Add BLEU evaluation
  • Update code to use GPU only when available, otherwise use CPU
  • Add performance statistics
  • Allow encoder to use resnet-152 and densenet-161

Captioned Examples

Correctly Captioned Images

Correctly Captioned Image 1

Correctly Captioned Image 2

Incorrectly Captioned Images

Incorrectly Captioned Image 1

Incorrectly Captioned Image 2

References

Show, Attend and Tell

Original Theano Implementation

Neural Machine Translation By Jointly Learning to Align And Translate

Karpathy's Data splits

Open Source Agenda is not affiliated with "AaronCCWong Show Attend And Tell" Project. README Source: AaronCCWong/Show-Attend-and-Tell
Stars
73
Open Issues
5
Last Commit
4 years ago

Open Source Agenda Badge

Open Source Agenda Rating