Multimodal Garment Designer Save Abandoned

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing.

Project README

Multimodal Garment Designer

Human-Centric Latent Diffusion Models for Fashion Image Editing

Alberto Baldrati*, Davide Morelli*, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

* Equal contribution.

arXiv GitHub Stars

This is the official repository for the paper "Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing".

Overview

Abstract:
Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs.

Citation

If you make use of our work, please cite our paper:

@article{baldrati2023multimodal,
  title={Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing},
  author={Baldrati, Alberto and Morelli, Davide and Cartella, Giuseppe and Cornia, Marcella and Bertini, Marco and Cucchiara, Rita},
  journal={arXiv preprint arXiv:2304.02051},
  year={2023}
}

Inference

To run the inference please use the following:

python eval.py --dataset_path <path> --batch_size <int> --mixed_precision fp16 --output_dir <path> --save_name <string> --num_workers_test <int> --sketch_cond_rate 0.2 --dataset <dresscode|vitonhd> --start_cond_rate 0.0
  • dataset_path is the path to the dataset (change accordingly to the dataset parameter)
  • dataset dataset name to be used
  • output_dir path to the output directory
  • save_name name of the output dir subfolder where the generated images are saved
  • start_cond_rate rate {0.0,1.0} of denoising steps that will be used as offset to start sketch conditioning
  • sketch_cond_rate rate {0.0,1.0} of denoising steps in which sketch cond is applied
  • test_order test setting (paired | unpaired)

Note that we provide few sample images to test MGD simply cloning this repo (i.e., assets/data). To execute the code set

  • Dress Code Multimodal dataset
    • dataset_path to assets/data/dresscode
    • dataset to dresscode
  • Viton-HD Multimodal dataset
    • dataset_path to assets/data/vitonhd
    • dataset to vitonhd

It is possible to run the inference on the whole Dress Code Multimodal or Viton-HD Multimodal dataset simply changing the dataset_path and dataset according with the downloaded and prepared datasets (see sections below).

Pre-trained models

The model and checkpoints are available via torch.hub.

Load the MGD denoising UNet model using the following code:

unet = torch.hub.load(
    dataset=<dataset>, 
    repo_or_dir='aimagelab/multimodal-garment-designer', 
    source='github', 
    model='mgd', 
    pretrained=True
    )
  • dataset dataset name (dresscode | vitonhd)

Use the denoising network with our custom diffusers pipeline as follow:

from pipes.sketch_posemap_inpaint_pipe import StableDiffusionSketchPosemapInpaintPipeline
from diffusers import AutoencoderKL, DDIMScheduler
from transformers import CLIPTextModel, CLIPTokenizer

pretrained_model_name_or_path = "runwayml/stable-diffusion-inpainting"

text_encoder = CLIPTextModel.from_pretrained(
    pretrained_model_name_or_path, 
    subfolder="text_encoder"
    )

vae = AutoencoderKL.from_pretrained(
    pretrained_model_name_or_path, 
    subfolder="vae"
    )

tokenizer = CLIPTokenizer.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="tokenizer",
    )

val_scheduler = DDIMScheduler.from_pretrained(
    pretrained_model_name_or_path,
    subfolder="scheduler"
    )
val_scheduler.set_timesteps(50)

val_pipe = ValPipe(
    text_encoder=text_encoder,
    vae=vae,
    unet=unet,
    tokenizer=tokenizer,
    scheduler=val_scheduler,
    )

For an extensive usage case see the file eval.py in the main repo.

Datasets

We do not hold rights on the original Dress Code and Viton-HD datasets. Please refer to the original papers for more information.

Start by downloading the original datasets from the following links:

Download the Dress Code Multimodal and Viton-HD Multimodal additional data annotations from here.

  • Dress Code Multimodal [link]
  • Viton-HD Multimodal [link]

Dress Code Multimodal Data Preparation

Once data is downloaded prepare the dataset folder as follow:

Dress Code
| fine_captions.json
| coarse_captions.json
| test_pairs_paired.txt
| test_pairs_unpaired.txt
| train_pairs.txt
| test_stitch_map
|---- [category]
|-------- images
|-------- keypoints
|-------- skeletons
|-------- dense
|-------- im_sketch
|-------- im_sketch_unpaired
...

Viton-HD Multimodal Data Preparation

Once data is downloaded prepare the dataset folder as follow:

Viton-HD
| captions.json
|---- Train
|-------- image
|-------- cloth
|-------- image-parse-v3
|-------- openpose_json
|-------- im_sketch
|-------- im_sketch_unpaired
...
|---- Test
...
|-------- im_sketch
|-------- im_sketch_unpaired
...

TODO

  • training code

Acknowledgements

This work has partially been supported by the PNRR project “Future Artificial Intelligence Research (FAIR)”, by the PRIN project “CREATIVE: CRoss-modal understanding and gEnerATIon of Visual and tExtual content” (CUP B87G22000460001), both co-funded by the Italian Ministry of University and Research, and by the European Commission under European Horizon 2020 Programme, grant number 101004545 - ReInHerit.

LICENSE

Creative Commons License
All material is available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes you've made.

Open Source Agenda is not affiliated with "Multimodal Garment Designer" Project. README Source: aimagelab/multimodal-garment-designer

Open Source Agenda Badge

Open Source Agenda Rating