Visdial Gnn Save

PyTorch code for Reasoning Visual Dialogs with Structural and Partial Observations

Project README

Reasoning Visual Dialogs with Structural and Partial Observations

Pytorch Implementation for the paper:

Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng^*, Wenguan Wang^*, Siyuan Qi^*, Song-Chun Zhu (* equal contributions)
In CVPR 2019 (Oral)

Getting Started

This codebase is tested using Ubuntu 16.04, Python 3.5 and a single NVIDIA TITAN Xp GPU. Similar configurations are preferred.

Installation

Clone this repo:

git clone https://github.com/zilongzheng/visdial-gnn.git
cd visdial-gnn

Install requirements
- Pytorch 0.4.1
- For other Python dependencies, run:
```
pip install -r requirements.txt
```

Train/Evaluate VisDial v1.0

We use pre-extracted image features as specified here for VisDial v1.0.
We use preprocessed dialog data as specified here
To reproduce our results, you can download preprocessed data and save it to $PROJECT_DIR/data/v1.0/ by

bash ./scripts/download_data_v1.sh faster_rcnn

To train a discriminative model, run:

#!./scripts/train_v1_faster_rcnn.sh
python train.py --dataroot ./data/v1.0/

To evaluate the model using val split, run:

python evaluate.py --dataroot ./data/v1.0/ --split val --ckpt /path/to/checkpoint

Train/Evaluate VisDial v0.9

We use pre-extracted image features from VGG-16 and VGG-19 as specified here
To download preprocessed data (e.g. vgg19) and save it to $PROJECT_DIR/data/v0.9/, run

bash ./scripts/download_data_v09.sh vgg19

To train a discriminative model using vgg19 pretrained image features, run

#!./scripts/train_v09_vgg19.sh
python train.py --dataroot ./data/v0.9/ \
                --version 0.9 \
                --img_train data_img_vgg19_pool5.h5 \
                --visdial_data visdial_data.h5 \
                --visdial_params visdial_params.json \
                --img_feat_size 512

To evaluate the model using val split, run:

python evaluate.py --dataroot ./data/v0.9/ \
                   --version 0.9 \
                   --split val \
                   --ckpt /path/to/checkpoint \
                   --img_val data_img_vgg19_pool5.h5 \
                   --visdial_data visdial_data.h5 \
                   --visdial_params visdial_params.json \
                   --img_feat_size 512

Citation

If you use this code for your research, please cite our paper.

@inproceedings{zheng2019reasoning,
    title={Reasoning Visual Dialogs with Structural and Partial Observations},
    author={Zheng, Zilong and Wang, Wenguan and Qi, Siyuan and Zhu, Song-Chun},
    booktitle={Computer Vision and Pattern Recognition (CVPR), 2019 IEEE Conference on},
    year={2019}
}

Acknowledgments

We use Visual Dialog Challenge Starter Code and GPNN as referenced util code.

Open Source Agenda is not affiliated with "Visdial Gnn" Project. README Source: zilongzheng/visdial-gnn

Stars

Open Issues

Last Commit

2 years ago

Repository

zilongzheng/visdial-gnn

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/visdial-gnn"><img src="https://www.opensourceagenda.com/projects/visdial-gnn/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022