Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)
Code for our paper Semantic Object Accuracy for Generative Text-to-Image Synthesis (Arxiv Version) published in TPAMI 2020.
Summary in our blog post.
Semantic Object Accuracy (SOA) is a score we introduce to evaluate the quality of generative text-to-image models. For this, we provide captions from the MS-COCO data set from which the evaluated model should generate images. We then use a pre-trained object detector to check whether the generated images contain the object that was specified in the caption.
E.g. when an image is generated from the caption a car is driving down the street
we check if the generated image actually contains a car. For more details check section 4 of our paper.
We also perform a user study in which humans rate the images generated by several state-of-the art models trained on the MS-COCO dataset. We then compare the ranking obtained through our user study with the rankings obtained by different quantitative evaluation metrics. We show that popular metrics, such as e.g. the Inception Score, do not correlate with how humans rate the generated images, whereas SOA strongly correlates with human judgement.
Contents:
How to calculate the SOA scores for a model:
Go to SOA
. The captions are in SOA/captions
label_XX_XX.pkl
describing for which labels the captions in the file are import pickle
with open(label_XX_XX.pkl, "rb") as f:
captions = pickle.load(f)
[{'image_id': XX, 'id': XX, 'idx': [XX, XX], 'caption': u'XX'}, ...]
'idx': [XX, XX]
gives the indices for the validation captions in the commonly used captions file from AttnGAN
Use your model to generate images from the specified captions
Once you have generated images for each label you can calculate the SOA scores:
SOA/requirements.txt
(we use Python 3.5.2)SOA/yolov3.weights
python calculate_soa.py --images path/to/folder/created-in-step-2ii --output path/to/folder/where-results-are-saved --gpu 0
If you also want to calculate IoU values check the detailed instructions here
Calculating the SOA scores takes about 30-45 minutes (tested with a NVIDIA GTX 1080TI) depending on your hardware (not including the time it takes to generate the images)
More detailed information (if needed) here
Go to OP-GAN
.
Please add the project folder to PYTHONPATH and install the required dependencies:
conda env create -f environment.yml
data/
and extract
data/train/
and data/test/
models/
and extractsh train.sh gpu-ids
where you choose which gpus to train on
sh train.sh 0,1,2,3
code/cfg/dataset_train.yml
, if you train on more/fewer GPUs or have more VRAM adjust the batch sizes as neededcode/cfg/cfg_file_train.yml
points to the correct pathoutput/
code/cfg/dataset_eval.yml
and adapt the path of NET_G
to point to the model you want to use (default path is to the pretrained model linked below)sh sample.sh gpu-ids
to generate images using the specified model
sh sample.sh 0
models
If you find our model useful in your research please consider citing:
@article{hinz2019semantic,
title = {Semantic Object Accuracy for Generative Text-to-Image Synthesis},
author = {Tobias Hinz and Stefan Heinrich and Stefan Wermter},
journal = {arXiv preprint arXiv:1910.13321},
year = {2019},
}