ReVersion: Diffusion-Based Relation Inversion from Images
This repository contains the implementation of the following paper:
ReVersion: Diffusion-Based Relation Inversion from Images
Ziqi Huangβ, Tianxing Wuβ, Yuming Jiang, Kelvin C.K. Chan, Ziwei Liu
From MMLab@NTU affiliated with S-Lab, Nanyang Technological University
We propose a new task, Relation Inversion: Given a few exemplar images, where a relation co-exists in every image, we aim to find a relation prompt <R> to capture this interaction, and apply the relation to new entities to synthesize new scenes. The above images are generated by our ReVersion framework.
Clone Repo
git clone https://github.com/ziqihuangg/ReVersion
cd ReVersion
Create Conda Environment and Install Dependencies
conda create -n reversion
conda activate reversion
conda install python=3.8 pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch
pip install diffusers["torch"]
pip install -r requirements.txt
Given a set of exemplar images and their entities' coarse descriptions, you can optimize a relation prompt <R> to capture the co-existing relation in these images, namely Relation Inversion.
Prepare the exemplar images (e.g., 0.jpg
- 9.jpg
) and coarse descriptions (text.json
), and put them inside a folder. Feel free to use our ReVersion benchmark, or you can also prepare your own images. An example from our ReVersion benchmark is as follows:
.reversion_benchmark_v1
βββ painted_on
βΒ Β βββ 0.jpg
βΒ Β βββ 1.jpg
βΒ Β βββ 2.jpg
βΒ Β βββ 3.jpg
βΒ Β βββ 4.jpg
βΒ Β βββ 5.jpg
βΒ Β βββ 6.jpg
βΒ Β βββ 7.jpg
βΒ Β βββ 8.jpg
βΒ Β βββ 9.jpg
βΒ Β βββ text.json
Take the relation painted_on
for example, you can start training using this script:
accelerate launch \
--config_file="./configs/single_gpu.yml" \
train.py \
--seed="2023" \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--train_data_dir="./reversion_benchmark_v1/painted_on" \
--placeholder_token="<R>" \
--initializer_token="and" \
--train_batch_size="2" \
--gradient_accumulation_steps="4" \
--max_train_steps="3000" \
--learning_rate='2.5e-04' --scale_lr \
--lr_scheduler="constant" \
--lr_warmup_steps="0" \
--output_dir="./experiments/painted_on" \
--save_steps="1000" \
--importance_sampling \
--denoise_loss_weight="1.0" \
--steer_loss_weight="0.01" \
--num_positives="4" \
--temperature="0.07" \
--only_save_embeds
Where train_data_dir
is the path to the exemplar images and coarse descriptions. output_dir
is the path to save the inverted relation and the experiment logs. To generate relation-specific images, you can follow the next section Generation.
Note that the only_save_embeds
option allows you to only save the relation prompt <R>, without having to save the entire Stable Diffusion model. You can decide whether to turn it on.
We can use the learned relation prompt <R> to generate relation-specific images with new objects, backgrounds, and style.
You can obtain a learned <R> from Relation Inversion using your customized data. You can also download the models from here, where we provide several pre-trained relation prompts for you to play with.
Put the models (i.e., learned relation prompt <R>) under ./experiments/
as follows:
./experiments/
βββ painted_on
β βββ checkpoint-500
β ...
β βββ model_index.json
βββ carved_by
β βββ checkpoint-500
β ...
β βββ model_index.json
βββ inside
β βββ checkpoint-500
β ...
β βββ model_index.json
...
Take the relation painted_on
for example, you can either use the following script to generate images using a single prompt, e.g., "cat <R> stone":
python inference.py \
--model_id ./experiments/painted_on \
--prompt "cat <R> stone" \
--placeholder_string "<R>" \
--num_samples 10 \
--guidance_scale 7.5 \
--only_load_embeds
Or write a list prompts in ./templates/templates.py
with the key name $your_template_name
and generate images for every prompt in the list $your_template_name
:
your_template_name='painted_on_examples'
python inference.py \
--model_id ./experiments/painted_on \
--template_name $your_template_name \
--placeholder_string "<R>" \
--num_samples 10 \
--guidance_scale 7.5 \
--only_load_embeds
Where model_id
is the model directory, num_samples
is the number of images to generate for each prompt, and guidance_scale
is the classifier-free guidance scale.
We provide several example templates for each relation in ./templates/templates.py
, such as painted_on_examples
, carved_by_examples
, etc.
Note that if you saved the entire model during the inversion process, that is, without the only_save_embeds
flag turned on, then you should turn off the only_load_embeds
flag during inference.
The only_load_embeds
option only loads the relation prompt <R> from the experiment folder, and automatically loads the rest of the Stable Diffusion model (including other text token's embeddings) from the default cache location that contains the pre-trained Stable Diffusion model.
We also provide a Gradio Demo to test our method using a UI. This demo supports relation-specific text-to-image generation on the fly. Running the following command will launch the demo:
python app_gradio.py
Alternatively, you can try the online demo here.
You can also specify diverse prompts with the relation prompt <R> to generate images of diverse backgrounds and style. For example, your prompt could be "michael jackson <R> wall, in the desert"
, "cat <R> stone, on the beach"
, etc.
The ReVersion Benchmark consists of diverse relations and entities, along with a set of well-defined text descriptions.
If you find our repo useful for your research, please consider citing our paper:
@article{huang2023reversion,
title={{ReVersion}: Diffusion-Based Relation Inversion from Images},
author={Huang, Ziqi and Wu, Tianxing and Jiang, Yuming and Chan, Kelvin C.K. and Liu, Ziwei},
journal={arXiv preprint arXiv:2303.13495},
year={2023}
}
The codebase is maintained by Ziqi Huang and Tianxing Wu.
This project is built using the following open source repositories: