Synthesizing and manipulating 2048x1024 images with conditional GANs
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translation. It can be used for turning semantic label maps into photo-realistic images or synthesizing portraits from face label maps.
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Ting-Chun Wang1, Ming-Yu Liu1, Jun-Yan Zhu2, Andrew Tao1, Jan Kautz1, Bryan Catanzaro1
1NVIDIA Corporation, 2UC Berkeley
In CVPR 2018.
- Interactive editing results
- Additional streetview results
pip install dominate
git clone https://github.com/NVIDIA/pix2pixHD
cd pix2pixHD
datasets
folder../checkpoints/label2city_1024p/
bash ./scripts/test_1024p.sh
):#!./scripts/test_1024p.sh
python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop none
The test results will be saved to a html file here: ./results/label2city_1024p/test_latest/index.html
.
More example scripts can be found in the scripts
directory.
datasets
folder in the same way the example images are provided.bash ./scripts/train_512p.sh
):#!./scripts/train_512p.sh
python train.py --name label2city_512p
./checkpoints/label2city_512p/web/index.html
.
If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/label2city_512p/logs
by adding --tf_log
to the training scripts.bash ./scripts/train_512p_multigpu.sh
):#!./scripts/train_512p_multigpu.sh
python train.py --name label2city_512p --batchSize 8 --gpu_ids 0,1,2,3,4,5,6,7
Note: this is not tested and we trained our model using single GPU only. Please use at your own discretion.
--fp16
. For example,#!./scripts/train_512p_fp16.sh
python -m torch.distributed.launch train.py --name label2city_512p --fp16
In our test case, it trains about 80% faster with AMP on a Volta machine.
bash ./scripts/train_1024p_24G.sh
), or 16G memory if using mixed precision (AMP).bash ./scripts/train_1024p_12G.sh
), which will crop the images during training. Performance is not guaranteed using this script.--label_nc N
during both training and testing.--label_nc 0
which will directly use the RGB colors as input. The folders should then be named train_A
, train_B
instead of train_label
, train_img
, where the goal is to translate images from A to B.--no_instance
.scale_width
, which will scale the width of all training images to opt.loadSize
(1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop
option. For example, scale_width_and_crop
first resizes the image to have width opt.loadSize
and then does random cropping of size (opt.fineSize, opt.fineSize)
. crop
skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none
, which will do nothing other than making sure the image is divisible by 32.options/train_options.py
and options/base_options.py
for all the training flags; see options/test_options.py
and options/base_options.py
for all the test flags.--no_instance
.If you find this useful for your research, please use the following.
@inproceedings{wang2018pix2pixHD,
title={High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs},
author={Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2018}
}
This code borrows heavily from pytorch-CycleGAN-and-pix2pix.