CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
This repository contains an AI system for the task of Visual Question Answering: given an image and a question related to the image in natural language, the systems answer the question in natural language from the image scene. The system can be configured to use one of 3 different underlying models:
First download the datasets from http://visualqa.org/download.html - all items under Balanced Real Images except Complementary Pairs List.
python main.py --config <config_file_path>
The system takes its arguments from the config file that it takes as input. Sample config files have been provided in config/.
In order to speed up the training, it's possible to preprocess the images in the dataset and store the image embeddings by setting the emb_dir and preprocess flag.