A Recipe for Watermarking Diffusion Models
Yunqing Zhao1 /
Tianyu Pang 2 /
Chao Du 2 /
Xiao Yang 3 /
Ngai‑Man Cheung 1 /
Min Lin 2
1Singapore University of Technology and Design
2Sea AI Lab
3Tsinghua University
arXiv Pre-print, 2023
A suitable conda environment named string2img
can be created and activated with:
conda env create -f string2img.yaml
conda activate string2img
This string2img
environment will help you embed the predefined binary watermark string to the training data.
A suitable conda environment named edm
can be created and activated with:
conda env create -f edm.yaml -n edm
conda activate edm
This edm
environment will help you train the unconditional/class-conditional diffusion models (from scratch).
Firstly, we activate the edm
conda environment,
conda activate edm
cd ./edm
then we can start to process the data.
We follow EDM to test our models on four datasets. Datasets are stored in the same format as in StyleGAN: uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json
for labels. Custom datasets can be created from a folder containing images; see python dataset_tool.py --help
for more information. Examples for CIFAR-10 and FFHQ, and similarly for Animal Faces-HQ dataset (AFHQv2) and ImageNet Object Localization Challenge (ImageNet):
CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:
python dataset_tool.py --source=downloads/cifar10/cifar-10-python.tar.gz \
--dest=datasets/cifar10-32x32.zip
python fid.py ref --data=datasets/cifar10-32x32.zip --dest=fid-refs/cifar10-32x32.npz
FFHQ: Download the Flickr-Faces-HQ dataset as 1024x1024 images and convert to ZIP archive at 64x64 resolution (Remark: archive version here that can be easily downloaded via wget
):
python dataset_tool.py --source=downloads/ffhq/images1024x1024 \
--dest=datasets/ffhq-64x64.zip --resolution=64x64
python fid.py ref --data=datasets/ffhq-64x64.zip --dest=fid-refs/ffhq-64x64.npz
To embed the watermark in the training data, we should uncompress the zipped file of the dataset:
mkdir datasets/uncompressed
mkdir datasets/uncompressed/cifar10
cd datasets/uncompressed/cifar10
unzip ../../cifar10-32x32.zip
Firstly, we need to activate the string2img
environment (use CIFAR10 as example)
conda activate string2img
cd ../../../../string2img
Then, we can start training.
CUDA_VISIBLE_DEVICES=0 python train_cifar10.py \
--data_dir ../edm/datasets/uncompressed/cifar10 \
--image_resolution 32 \
--output_dir ./_output/cifar10 \
--bit_length 64 \
--batch_size 64 \
--num_epochs 100 \
Typically this can be finished in few hours. In this way, you will obtain the pretrained watermark encoder/decoder with your specified bit length.
CUDA_VISIBLE_DEVICES=0 python embed_watermark_cifar10.py \
--encoder_name ./_output/cifar10/checkpoints/*encoder.pth \
--image_resolution 32 \
--identical_string \
--batch_size 128 \
--bit_length 64 \
Activate the edm
environment
conda activate edm
cd ../edm
and start training (use conditional training on CIFAR10 as example)
torchrun --standalone --nproc_per_node=8 train.py
--outdir=training-runs \
--data=datasets/cifar10-32x32.zip
--cond=1
--arch=ddpmpp
We firstly generate 50,000 random images and then compare them against the dataset reference statistics (i.e., *.npy
file) using fid.py
(CIFAR10 as example):
# Generate 50000 images and save them as cifar10_tmp/*/*.png
torchrun --standalone --nproc_per_node=1 generate.py --outdir=cifar10_tmp --seeds=0-49999 --subdirs \
--network=*.pkl
# Calculate FID
torchrun --standalone --nproc_per_node=8 fid.py calc --images=cifar10_tmp \
--ref=./fid-refs/cifar10-32x32.npz
Activate the string2img
environment
conda activate string2img
cd ../string2img
then running the watermark detector (use CIFAR10 as example):
CUDA_VISIBLE_DEVICES=0 python detect_watermark_cifar10.py
the detection accuracy will be printed (remember to specify the predefined binary watermark string in the script).
A suitable conda environment named ldm
can be created and activated with:
conda env create -f ldm.yaml
conda activate ldm
This ldm
environment will help you obtain the watermarked text-to-image diffusion models.
cd sd_watermark
Firstly, follow the HuggingFace
(link) to download the checkpoints (we use sd-v1-4-full-ema.ckpt
)
Then, specifying the target image and start the watermarking process:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --base configs/stable-diffusion/v1-finetune_unfrozen_watermark.yaml \
--train --gpus 0,1,2,3 \
--actual_resume ../_model_pool/sd-v1-4-full-ema.ckpt \
--data_root ../_target_samples/watermark/*/*.png \
--w_reg_weight 1e-7 \
--name watermark_toy_V_ft_w_reg_l1_1.0e-7 \
where you can tune the coef of reg to get a good trade-off. During training, you can optionally visualize the generated images using different prompts to test if the predefined watermark is properly embedded, while the performance is still good.
If you find this project useful in your research, please consider citing our paper:
@article{zhao2023recipe,
title={A Recipe for Watermarking Diffusion Models},
author={Zhao, Yunqing and Pang, Tianyu and Du, Chao and Yang, Xiao and Cheung, Ngai-Man and Lin, Min},
journal={arXiv preprint arXiv:2303.10137},
year={2023}
}
We use the base implementation from EDM for training diffusion models of unconditional/class-conditional generation. We appreciate the wonderful base implementation from Yu etal. for adding fingerprint to Generative Adversarial Networks. We thank the authors of Stable Diffusion, DreamBooth (implemented by XavierXiao) for sharing their code/checkpoints of text-to-image diffusion models.