Shivanshu Gupta Visual Question Answering Save

CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering

Project README

Visual-Question-Answering

This repository contains an AI system for the task of Visual Question Answering: given an image and a question related to the image in natural language, the systems answer the question in natural language from the image scene. The system can be configured to use one of 3 different underlying models:

VQA: This is the baseline model given in the paper VQA: Visual Question Answering. It encodes the image by a CNN and the question by an LSTM and then combines these for VQA task. It uses pretrained vgg16 to get the image embedding (may be further normalised), and a 1 or 2-layered LSTM for the question embedding.
SAN: This is an attention based model described in the paper Stacked Attention Networks for Image Question Answering. It incorporates attention on the input image.
MUTAN: This is a variant of the VQA model where instead of a simple of pointwise-product, the image and question embedding are combined using a a special Multimodal Tucker fusion technique described in the paper MUTAN: Multimodal Tucker Fusion for Visual Question Answering.

Usage

First download the datasets from http://visualqa.org/download.html - all items under Balanced Real Images except Complementary Pairs List.

python main.py --config <config_file_path>

The system takes its arguments from the config file that it takes as input. Sample config files have been provided in config/.

In order to speed up the training, it's possible to preprocess the images in the dataset and store the image embeddings by setting the emb_dir and preprocess flag.

Open Source Agenda is not affiliated with "Shivanshu Gupta Visual Question Answering" Project. README Source: Shivanshu-Gupta/Visual-Question-Answering

Stars

Open Issues

Last Commit

4 years ago

Repository

Shivanshu-Gupta/Visual-Question-Answering

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/shivanshu-gupta-visual-question-answering"><img src="https://www.opensourceagenda.com/projects/shivanshu-gupta-visual-question-answering/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022