Diffusers Versions Save

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

v0.27.2

1 month ago

All commits

[scheduler] fix a bug in add_noise by @yiyixuxu in https://github.com/huggingface/diffusers/pull/7386
[LoRA] fix cross_attention_kwargs problems and tighten tests by @sayakpaul in https://github.com/huggingface/diffusers/pull/7388
Fix issue with prompt embeds and latents in SD Cascade Decoder with multiple image embeddings for a single prompt. by @DN6 in https://github.com/huggingface/diffusers/pull/7381

v0.27.1

1 month ago

All commits

Release: v0.27.0 by @DN6 (direct commit on v0.27.1-patch)
[LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds by @sayakpaul in #7338
Release: 0.27.1-patch by @sayakpaul (direct commit on v0.27.1-patch)

v0.27.0

1 month ago

Stable Cascade

We are adding support for a new text-to-image model building on Würstchen called Stable Cascade, which comes with a non-commercial license. The Stable Cascade line of pipelines differs from Stable Diffusion in that they are built upon three distinct models and allow for hierarchical compression of image patients, achieving remarkable outputs.

from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline
import torch

prior = StableCascadePriorPipeline.from_pretrained(
    "stabilityai/stable-cascade-prior",
    torch_dtype=torch.bfloat16,
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image_emb = prior(prompt=prompt).image_embeddings[0]

decoder = StableCascadeDecoderPipeline.from_pretrained(
    "stabilityai/stable-cascade",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(image_embeddings=image_emb, prompt=prompt).images[0]
image

📜 Check out the docs here to know more about the model.

Note: You will need a torch>=2.2.0 to use the torch.bfloat16 data type with the Stable Cascade pipeline.

Playground v2.5

PlaygroundAI released a new v2.5 model (playgroundai/playground-v2.5-1024px-aesthetic), which particularly excels at aesthetics. The model closely follows the architecture of Stable Diffusion XL, except for a few tweaks. This release comes with support for this model:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2.5-1024px-aesthetic",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
image

Loading from the original single-file checkpoint is also supported:

from diffusers import StableDiffusionXLPipeline, EDMDPMSolverMultistepScheduler
import torch

url = "https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/playground-v2.5-1024px-aesthetic.safetensors"
pipeline = StableDiffusionXLPipeline.from_single_file(url)
pipeline.to(device="cuda", dtype=torch.float16)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipeline(prompt=prompt, guidance_scale=3.0).images[0]
image.save("playground_test_image.png")

You can also perform LoRA DreamBooth training with the playgroundai/playground-v2.5-1024px-aesthetic checkpoint:

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path="playgroundai/playground-v2.5-1024px-aesthetic"  \
  --instance_data_dir="dog" \
  --output_dir="dog-playground-lora" \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --use_8bit_adam \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

To know more, follow the instructions here.

EDM-style training support

EDM refers to the training and sampling techniques introduced in the following paper: Elucidating the Design Space of Diffusion-Based Generative Models. We have introduced support for training using the EDM formulation in our train_dreambooth_lora_sdxl.py script.

To train stabilityai/stable-diffusion-xl-base-1.0 using the EDM formulation, you just have to specify the --do_edm_style_training flag in your training command, and voila 🤗

If you’re interested in extending this formulation to other training scripts, we refer you to this PR.

New schedulers with the EDM formulation

To better support the Playground v2.5 model and EDM-style training in general, we are bringing support for EDMDPMSolverMultistepScheduler and EDMEulerScheduler. These support the EDM formulations of the DPMSolverMultistepScheduler and EulerDiscreteScheduler, respectively.

Trajectory Consistency Distillation

Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. It was proposed in Trajectory Consistency Distillation.

This release comes with the support of a TCDScheduler that enables this kind of fast sampling. Much like LCM-LoRA, TCD requires an additional adapter for the acceleration. The code snippet below shows a usage:

import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()

prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."

image = pipe(
    prompt=prompt,
    num_inference_steps=4,
    guidance_scale=0,
    eta=0.3, 
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]

tcd_image

📜 Check out the docs here to know more about TCD.

Many thanks to @mhh0318 for contributing the TCDScheduler in #7174 and the guide in #7259.

IP-Adapter image embeddings and masking

All the pipelines supporting IP-Adapter accept a ip_adapter_image_embeds argument. If you need to run the IP-Adapter multiple times with the same image, you can encode the image once and save the embedding to the disk. This saves computation time and is especially useful when building UIs. Additionally, ComfyUI image embeddings for IP-Adapters are fully compatible in Diffusers and should work out-of-box.

We have also introduced support for providing binary masks to specify which portion of the output image should be assigned to an IP-Adapter. For each input IP-Adapter image, a binary mask and an IP-Adapter must be provided. Thanks to @fabiorigano for contributing this feature through #6847.

📜 To know about the exact usage of both of the above, refer to our official guide.

We thank our community members, @fabiorigano, @asomoza, and @cubiq, for their guidance and input on these features.

Guide on merging LoRAs

Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the set_adapters method which concatenates the weights of the LoRAs to merge.

Now, Diffusers also supports the add_weighted_adapter method from the PEFT library, unlocking more efficient merging method like TIES, DARE, linear, and even combinations of these merging methods like dare_ties.

📜 Take a look at the Merge LoRAs guide to learn more about merging in Diffusers.

LEDITS++

We are adding support to the real image editing technique called LEDITS++: Limitless Image Editing using Text-to-Image Models, a parameter-free method, requiring no fine-tuning nor any optimization. To edit real images, the LEDITS++ pipelines first invert the image DPM-solver++ scheduler that facilitates editing with as little as 20 total diffusion steps for inversion and inference combined. LEDITS++ guidance is defined such that it both reflects the direction of the edit (if we want to push away from/towards the edit concept) and the strength of the effect. The guidance also includes a masking term focused on relevant image regions which, for multiple edits especially, ensures that the corresponding guidance terms for each concept remain mostly isolated, limiting interference.

The code snippet below shows a usage:

import torch
import PIL
import requests
from io import BytesIO
from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained(
    base_model_id, 
    vae=vae, 
    torch_dtype=torch.float16
).to(device)

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
image = download_image(img_url)

_ = pipe.invert(
    image = image,
    num_inversion_steps=50,
    skip=0.2
)

edited_image = pipe(
    editing_prompt=["tennis ball","tomato"],
    reverse_editing_direction=[True,False],
    edit_guidance_scale=[5.0,10.0],
    edit_threshold=[0.9,0.85],)

📜 Check out the docs here to learn more about LEDITS++.

Thanks to @manuelbrack for contributing this in #6074.

All commits

Fix flaky IP Adapter test by @DN6 in #6960
Move SDXL T2I Adapter lora test into PEFT workflow by @DN6 in #6965
Allow passing config_file argument to ControlNetModel when using from_single_file by @DN6 in #6959
[PEFT / docs] Add a note about torch.compile by @younesbelkada in #6864
[Core] Harmonize single file ckpt model loading by @sayakpaul in #6971
fix: controlnet inpaint single file. by @sayakpaul in #6975
[docs] IP-Adapter by @stevhliu in #6897
fix IPAdapter unload_ip_adapter test by @yiyixuxu in #6972
[advanced sdxl lora script] - fix #6967 bug when using prior preservation loss by @linoytsaban in #6968
[IP Adapters] feat: allow low_cpu_mem_usage in ip adapter loading by @sayakpaul in #6946
Fix diffusers import prompt2prompt by @ihkap11 in #6927
add: peft to the benchmark workflow by @sayakpaul in #6989
Fix procecss process by @co63oc in #6591
Standardize model card for textual inversion sdxl by @Stepheni12 in #6963
Update textual_inversion.py by @Bhavay-2001 in #6952
[docs] Fix callout by @stevhliu in #6998
[docs] Video generation by @stevhliu in #6701
start depcrecation cycle for lora_attention_proc 👋 by @sayakpaul in #7007
Add documentation for strength parameter in Controlnet_img2img pipelines by @tlpss in #6951
Fixed typos in dosctrings of init() and in forward() of Unet3DConditionModel by @MK-2012 in #6663
[SVD] fix a bug when passing image as tensor by @yiyixuxu in #6999
Fix deprecation warning for torch.utils._pytree._register_pytree_node in PyTorch 2.2 by @zyinghua in #7008
[IP2P] Make text encoder truly optional in InstructPi2Pix by @sayakpaul in #6995
IP-Adapter attention masking by @fabiorigano in #6847
Fix Pixart Slow Tests by @DN6 in #6962
[from_single_file] pass torch_dtype to set_module_tensor_to_device by @yiyixuxu in #6994
[Refactor] FreeInit for AnimateDiff based pipelines by @DN6 in #6874
[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU by @ustcuna in #6683
Add section on AnimateLCM to docs by @DN6 in #7024
IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline by @rootonchair in #6941
Supper IP Adapter weight loading in StableDiffusionXLControlNetInpaintPipeline by @tontan2545 in #7031
Fix alt text and image links in AnimateLCM docs by @DN6 in #7029
Update ControlNet Inpaint single file test by @DN6 in #7022
Fix load_model_dict_into_meta for ControlNet from_single_file by @DN6 in #7034
Remove disable_full_determinism from StableVideoDiffusion xformers test. by @DN6 in #7039
update header by @pravdomil in #6596
fix doc example for fom_single_file by @yiyixuxu in #7015
Fix typos in text_to_image examples by @standardAI in #7050
Update checkpoint_merger pipeline to pass the "variant" argument by @lstein in #6670
allow explicit tokenizer & text_encoder in unload_textual_inversion by @H3zi in #6977
re-add unet refactor PR by @yiyixuxu in #7044
IPAdapterTesterMixin by @a-r-r-o-w in #6862
[Refactor] save_model_card function in text_to_image examples by @standardAI in #7051
Fix typos by @standardAI in #7068
Fix docstring of community pipeline imagic by @chongdashu in #7062
Change images to image. The variable images is not used anywhere by @bimsarapathiraja in #7074
fix: TensorRTStableDiffusionPipeline cannot set guidance_scale by @caiyueliang in #7065
[Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline by @standardAI in #7071
Fix truthy-ness condition in pipelines that use denoising_start by @a-r-r-o-w in #6912
Fix head_to_batch_dim for IPAdapterAttnProcessor by @fabiorigano in #7077
[docs] Minor updates by @stevhliu in #7063
Modularize Dreambooth LoRA SD inferencing during and after training by @rootonchair in #6654
Modularize Dreambooth LoRA SDXL inferencing during and after training by @rootonchair in #6655
[Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet by @a-r-r-o-w in #7086
Pass use_linear_projection parameter to mid block in UNetMotionModel by @Stepheni12 in #7035
Resize image before crop by @jiqing-feng in #7095
Small change to download in dance diffusion convert script by @DN6 in #7070
Fix EMA in train_text_to_image_sdxl.py by @standardAI in #7048
Make LoRACompatibleConv padding_mode work. by @jinghuan-Chen in #6031
[Easy] edit issue and PR templates by @sayakpaul in #7092
FIX [PEFT / Core] Copy the state dict when passing it to load_lora_weights by @younesbelkada in #7058
[Core] pass revision in the loading_kwargs. by @sayakpaul in #7019
[Examples] Multiple enhancements to the ControlNet training scripts by @sayakpaul in #7096
move to uv in the Dockerfiles. by @sayakpaul in #7094
Add tests to check configs when using single file loading by @DN6 in #7099
denormalize latents with the mean and std if available by @patil-suraj in #7111
[Dockerfile] remove uv from docker jax tpu by @sayakpaul in #7115
Add EDMEulerScheduler by @patil-suraj in #7109
add DPM scheduler with EDM formulation by @patil-suraj in #7120
[Docs] Fix typos by @standardAI in #7118
DPMSolverMultistep add rescale_betas_zero_snr by @Beinsezii in #7097
[Tests] make test steps dependent on certain things and general cleanup of the workflows by @sayakpaul in #7026
fix kwarg in the SDXL LoRA DreamBooth by @sayakpaul in #7124
[Diffusers CI] Switch slow test runners by @DN6 in #7123
[stalebot] don't close the issue if the stale label is removed by @yiyixuxu in #7106
refactor: move model helper function in pipeline to a mixin class by @ultranity in #6571
[docs] unet type hints by @a-r-r-o-w in #7134
use uv for installing stuff in the workflows. by @sayakpaul in #7116
limit documentation workflow runs for relevant changes. by @sayakpaul in #7125
add: support for notifying the maintainers about the docker ci status. by @sayakpaul in #7113
Fix setting fp16 dtype in AnimateDiff convert script. by @DN6 in #7127
[Docs] Fix typos by @standardAI in #7131
[ip-adapter] refactor prepare_ip_adapter_image_embeds and skip load image_encoder by @yiyixuxu in #7016
[CI] fix path filtering in the documentation workflows by @sayakpaul in #7153
[Urgent][Docker CI] pin uv version for now and a minor change in the Slack notification by @sayakpaul in #7155
Fix LCM benchmark test by @sayakpaul in #7158
[CI] Remove max parallel flag on slow test runners by @DN6 in #7162
Fix vae_encodings_fn hash in train_text_to_image_sdxl.py by @lhoestq in #7171
fix: loading problem for sdxl lora dreambooth by @sayakpaul in #7166
Map speedup by @kopyl in #6745
[stalebot] fix a bug by @yiyixuxu in #7156
Support EDM-style training in DreamBooth LoRA SDXL script by @sayakpaul in #7126
Fix PixArt 256px inference by @lawrence-cj in #6789
[ip-adapter] fix problem using embeds with the plus version of ip adapters by @asomoza in #7189
feat: add ip adapter benchmark by @sayakpaul in #6936
[Docs] more elaborate example for peft torch.compile by @sayakpaul in #7161
adding callback_on_step_end for StableDiffusionLDM3DPipeline by @rootonchair in #7149
Update requirements.txt to remove huggingface-cli by @sayakpaul in #7202
[advanced dreambooth lora sdxl] add DoRA training feature by @linoytsaban in #7072
FIx torch and cuda version in ONNX tests by @DN6 in #7164
[training scripts] add tags of diffusers-training by @linoytsaban in #7206
fix a bug in from_config by @yiyixuxu in #7192
Fix: UNet2DModel::init type hints; fixes issue #4806 by @fpgaminer in #7175
Fix typos by @standardAI in #7181
Enable PyTorch's FakeTensorMode for EulerDiscreteScheduler scheduler by @thiagocrepaldi in #7151
[docs] Improve SVD pipeline docs by @a-r-r-o-w in #7087
[Docs] Update callback.md code example by @rootonchair in #7150
[Core] errors should be caught as soon as possible. by @sayakpaul in #7203
[Community] PromptDiffusion Pipeline by @iczaw in #6752
add TCD Scheduler by @mhh0318 in #7174
SDXL Turbo support and example launch by @bram-w in #6473
[bug] Fix float/int guidance scale not working in StableVideoDiffusionPipeline by @JinayJain in #7143
[Pipiline] Wuerstchen v3 aka Stable Cascasde pipeline by @kashif in #6487
Update train_dreambooth_lora_sdxl_advanced.py by @landmann in #7227
[Core] move out the utilities from pipeline_utils.py by @sayakpaul in #7234
Refactor Prompt2Prompt: Inherit from DiffusionPipeline by @ihkap11 in #7211
add DoRA training feature to sdxl dreambooth lora script by @linoytsaban in #7235
fix: remove duplicated code in TemporalBasicTransformerBlock. by @AsakusaRinne in #7212
[Examples] fix: prior preservation setting in DreamBooth LoRA SDXL script. by @sayakpaul in #7242
fix: support for loading playground v2.5 single file checkpoint. by @sayakpaul in #7230
Raise an error when trying to use SD Cascade Decoder with dtype bfloat16 and torch < 2.2 by @DN6 in #7244
Remove the line. Using it create wrong output by @bimsarapathiraja in #7075
[docs] Merge LoRAs by @stevhliu in #7213
use self.device by @pravdomil in #6595
[docs] Community tips by @stevhliu in #7137
[Core] throw error when patch inputs and layernorm are provided for Transformers2D by @sayakpaul in #7200
[Tests] fix: VAE tiling tests when setting the right device by @sayakpaul in #7246
[Utils] Improve " # Copied from ..." statements in the pipelines by @sayakpaul in #6917
[Easy] fix: save_model_card utility of the DreamBooth SDXL LoRA script by @sayakpaul in #7258
Make mid block optional for flax UNet by @mar-muel in #7083
Solve missing clip_sample implementation in FlaxDDIMScheduler. by @hi-sushanta in #7017
[Tests] fix config checking tests by @sayakpaul in #7247
[docs] IP-Adapter image embedding by @stevhliu in #7226
Adds denoising_end parameter to ControlNetPipeline for SDXL by @UmerHA in #6175
Add npu support by @MengqingCao in #7144
[Community Pipeline] Skip Marigold depth_colored with color_map=None by @qqii in #7170
update the signature of from_single_file by @yiyixuxu in #7216
[UNet_Spatio_Temporal_Condition] fix default num_attention_heads in unet_spatio_temporal_condition by @Wang-Xiaodong1899 in #7205
[docs/nits] Fix return values based on return_dict and minor doc updates by @a-r-r-o-w in #7105
[Chore] remove tf mention by @sayakpaul in #7245
Fix gmflow_dir by @pravdomil in #6583
Support latents_mean and latents_std by @haofanwang in #7132
Inline InputPadder by @pravdomil in #6582
[Dockerfiles] add: a workflow to check if docker containers can be built in case of modifications by @sayakpaul in #7129
instruct pix2pix pipeline: remove sigma scaling when computing classifier free guidance by @erliding in #7006
Change export_to_video default by @DN6 in #6990
[Chore] switch to logger.warning by @sayakpaul in #7289
[LoRA] use the PyTorch classes wherever needed and start depcrecation cycles by @sayakpaul in #7204
Add single file support for Stable Cascade by @DN6 in #7274
Fix passing pooled prompt embeds to Cascade Decoder and Combined Pipeline by @DN6 in #7287
Fix loading Img2Img refiner components in from_single_file by @DN6 in #7282
[Chore] clean residue from copy-pasting in the UNet single file loader by @sayakpaul in #7295
Update Cascade documentation by @DN6 in #7257
Update Stable Cascade Conversion Scripts by @DN6 in #7271
[Pipeline] Add LEDITS++ pipelines by @manuelbrack in #6074
[PyPI publishing] feat: automate the process of pypi publication to some extent. by @sayakpaul in #7270
add: support for notifying maintainers about the nightly test status by @sayakpaul in #7117
Fix Wrong Text-encoder Grad Setting in Custom_Diffusion Training by @Rbrq03 in #7302
Add Intro page of TCD by @mhh0318 in #7259
Fix typos in UNet2DConditionModel documentation by @alexanderbonnet in #7291
Change step_offset scheduler docstrings by @Beinsezii in #7128
update get_order_list if statement by @kghamilton89 in #7309
add: pytest log installation by @sayakpaul in #7313
[Tests] Fix incorrect constant in VAE scaling test. by @DN6 in #7301
log loss per image by @noskill in #7278
add edm schedulers in doc by @patil-suraj in #7319
[Advanced DreamBooth LoRA SDXL] Support EDM-style training (follow up of #7126) by @linoytsaban in #7182
Update Cascade Tests by @DN6 in #7324
Release: v0.27.0 by @DN6 (direct commit on v0.27.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ihkap11
- Fix diffusers import prompt2prompt (#6927)
- Refactor Prompt2Prompt: Inherit from DiffusionPipeline (#7211)
@ustcuna
- [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU (#6683)
@rootonchair
- IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline (#6941)
- Modularize Dreambooth LoRA SD inferencing during and after training (#6654)
- Modularize Dreambooth LoRA SDXL inferencing during and after training (#6655)
- adding callback_on_step_end for StableDiffusionLDM3DPipeline (#7149)
- [Docs] Update callback.md code example (#7150)
@standardAI
- Fix typos in text_to_image examples (#7050)
- [Refactor] save_model_card function in text_to_image examples (#7051)
- Fix typos (#7068)
- [Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline (#7071)
- Fix EMA in train_text_to_image_sdxl.py (#7048)
- [Docs] Fix typos (#7118)
- [Docs] Fix typos (#7131)
- Fix typos (#7181)
@a-r-r-o-w
- IPAdapterTesterMixin (#6862)
- Fix truthy-ness condition in pipelines that use denoising_start (#6912)
- [Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet (#7086)
- [docs] unet type hints (#7134)
- [docs] Improve SVD pipeline docs (#7087)
- [docs/nits] Fix return values based on return_dict and minor doc updates (#7105)
@ultranity
- refactor: move model helper function in pipeline to a mixin class (#6571)
@iczaw
- [Community] PromptDiffusion Pipeline (#6752)
@mhh0318
- add TCD Scheduler (#7174)
- Add Intro page of TCD (#7259)
@manuelbrack
- [Pipeline] Add LEDITS++ pipelines (#6074)

v0.26.3

3 months ago

All commits

Fix configuring VAE from single file mixin by @DN6 in #6950
[DPMSolverSinglestepScheduler] correct get_order_list for solver_order=2and lower_order_final=True by @yiyixuxu in #6953

v0.26.2

3 months ago

In v0.26.0, we introduced a bug 🐛 in the BasicTransformerBlock by removing some boolean flags. This caused many popular libraries tomesd to break. We have fixed that in this release. Thanks to @vladmandic for bringing this to our attention.

All commits

add self.use_ada_layer_norm_* params back to BasicTransformerBlock by @yiyixuxu in #6841

v0.26.1

3 months ago

In the v0.26.0 release, we slipped in the torchvision library as a required library, which shouldn't have been the case. This is now fixed.

All commits

add is_torchvision_available by @yiyixuxu in #6800

v0.26.0

3 months ago

This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.

I2VGenXL

I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

repo_id = "ali-vilab/i2vgen-xl"
pipeline = I2VGenXLPipeline.from_pretrained(repo_id, torch_dtype=torch.float16).to("cuda")
pipeline.enable_model_cpu_offload()

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0001.jpg"
image = load_image(image_url).convert("RGB")
prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style."
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    generator=generator,
).frames
export_to_gif(frames[0], "i2v.gif")

masterpiece, bestquality, sunset.
library

📜 Check out the docs here.

PIA

PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.

import torch
from diffusers import (
    EulerDiscreteScheduler,
    MotionAdapter,
    PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image

adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)

pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"

generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-animation.gif")

masterpiece, bestquality, sunset.
cat in a field

📜 Check out the docs here.

Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)

IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:

import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from transformers import CLIPVisionModelWithProjection
from diffusers.utils import load_image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    image_encoder=image_encoder,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"])
pipeline.set_ip_adapter_scale([0.7, 0.3])

pipeline.enable_model_cpu_offload()

face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")

style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images =  [load_image(f"{style_folder}/img{i}.png") for i in range(10)]

generator = torch.Generator(device="cpu").manual_seed(0)

image = pipeline(
    prompt="wonderwoman",
    ip_adapter_image=[style_images, face_image],
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50
    generator=generator,
).images[0]

Reference style images:

Reference face Image	Output Image

📜 Check out the docs here.

Single-file checkpoint loading

from_single_file() utility has been refactored for better readability and to follow similar semantics as from_pretrained() . Support for loading single file checkpoints and configs from URLs has also been added.

DPM scheduler fix

We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.

Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.

All commits

[docs] Fix missing API function by @stevhliu in #6604
Fix failing tests due to Posix Path by @DN6 in #6627
Update convert_from_ckpt.py / read checkpoint config yaml contents by @spezialspezial in #6633
[Community] Experimental AnimateDiff Image to Video (open to improvements) by @a-r-r-o-w in #6509
refactor: extract init/forward function in UNet2DConditionModel by @ultranity in #6478
Modularize InstructPix2Pix SDXL inferencing during and after training in examples by @sang-k in #6569
Fixed the bug related to saving DeepSpeed models. by @HelloWorldBeginner in #6628
fix DPM Scheduler with use_karras_sigmas option by @yiyixuxu in #6477
fix SDXL-kdiffusion tests by @yiyixuxu in #6647
add padding_mask_crop to all inpaint pipelines by @rootonchair in #6360
add Sa-Solver by @lawrence-cj in #5975
Add tearDown method to LoRA tests. by @DN6 in #6660
[Diffusion DPO] apply fixes from #6547 by @sayakpaul in #6668
Update README by @standardAI in #6669
[Big refactor] move unets to unets module 🦋 by @sayakpaul in #6630
Standardise outputs for video pipelines by @DN6 in #6626
fix dpm related slow test failure by @yiyixuxu in #6680
[Tests] Test for passing local config file to from_single_file() by @sayakpaul in #6638
[Refactor] Update from single file by @DN6 in #6428
[WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow by @ayushtues in #6057
Add InstantID Pipeline by @haofanwang in #6673
[Docs] update: tutorials ja | AUTOPIPELINE.md by @YasunaCoffee in #6629
[Fix bugs] pipeline_controlnet_sd_xl.py by @haofanwang in #6653
SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) by @brandostrong in #6449
AnimateDiff Video to Video by @a-r-r-o-w in #6328
[docs] UViT2D by @stevhliu in #6643
Correct sigmas cpu settings by @patrickvonplaten in #6708
[docs] AnimateDiff Video-to-Video by @a-r-r-o-w in #6712
fix community README by @a-r-r-o-w in #6645
fix custom diffusion training with concept list by @AIshutin in #6710
Add IP Adapters to slow tests by @DN6 in #6714
Move tests for SD inference variant pipelines into their own modules by @DN6 in #6707
Add Community Example Consistency Training Script by @dg845 in #6717
Add UFOGenScheduler to Community Examples by @dg845 in #6650
[Hub] feat: explicitly tag to diffusers when using push_to_hub by @sayakpaul in #6678
Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator by @thuliu-yt16 in #6307
changed to posix unet by @gzguevara in #6719
Change os.path to pathlib Path by @Stepheni12 in #6737
correct hflip arg by @sayakpaul in #6743
Add unload_textual_inversion method by @fabiorigano in #6656
[Core] move transformer scripts to transformers modules by @sayakpaul in #6747
Update lora.md with a more accurate description of rank by @xhedit in #6724
Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. by @sajadn in #6751
udpate ip-adapter slow tests by @yiyixuxu in #6760
Update export to video to support new tensor_to_vid function in video pipelines by @DN6 in #6715
[DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement. by @woshiyyya in #6704
Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten by @dg845 in #6736
add note about serialization by @sayakpaul in #6764
Update train_diffusion_dpo.py by @viettmab in #6754
Pin torch < 2.2.0 in test runners by @DN6 in #6780
[Kandinsky tests] add is_flaky to test_model_cpu_offload_forward_pass by @sayakpaul in #6762
add ipo, hinge and cpo loss to dpo trainer by @kashif in #6788
Fix setting scaling factor in VAE config by @DN6 in #6779
Add PIA Model/Pipeline by @DN6 in #6698
[docs] Add missing parameter by @stevhliu in #6775
[IP-Adapter] Support multiple IP-Adapters by @yiyixuxu in #6573
[sdxl k-diffusion pipeline]move sigma to device by @yiyixuxu in #6757
[Feat] add I2VGenXL for image-to-video generation by @sayakpaul in #6665
Release: v0.26.0 by @<NOT FOUND> (direct commit on v0.26.0-release)
fix torchvision import by @patrickvonplaten in #6796

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@a-r-r-o-w
- [Community] Experimental AnimateDiff Image to Video (open to improvements) (#6509)
- AnimateDiff Video to Video (#6328)
- [docs] AnimateDiff Video-to-Video (#6712)
- fix community README (#6645)
@ultranity
- refactor: extract init/forward function in UNet2DConditionModel (#6478)
@lawrence-cj
- add Sa-Solver (#5975)
@ayushtues
- [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
@haofanwang
- Add InstantID Pipeline (#6673)
- [Fix bugs] pipeline_controlnet_sd_xl.py (#6653)
@brandostrong
- SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) (#6449)
@dg845
- Add Community Example Consistency Training Script (#6717)
- Add UFOGenScheduler to Community Examples (#6650)
- Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (#6736)

v0.25.1

3 months ago

Make sure diffusers can correctly be used in offline mode again: https://github.com/huggingface/diffusers/pull/1767#issuecomment-1896194917

Respect offline mode when loading pipeline by @Wauplin in #6456
Fix offline mode import by @Wauplin in #6467

v0.25.0

4 months ago

aMUSEd

collage_full

aMUSEd is a lightweight text to image model based off of the MUSE architecture. aMUSEd is particularly useful in applications that require a lightweight and fast model, such as generating many images quickly at once. aMUSEd is currently a research release.

aMUSEd is a VQVAE token-based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with MUSE, it uses the smaller text encoder CLIP-L/14 instead of T5-XXL. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.

Text-to-image generation

import torch
from diffusers import AmusedPipeline

pipe = AmusedPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "cowboy"
image = pipe(prompt, generator=torch.manual_seed(8)).images[0]
image.save("text2image_512.png")

Image-to-image generation

import torch
from diffusers import AmusedImg2ImgPipeline
from diffusers.utils import load_image

pipe = AmusedImg2ImgPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "apple watercolor"
input_image = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/image2image_256_orig.png"
    )
    .resize((512, 512))
    .convert("RGB")
)

image = pipe(prompt, input_image, strength=0.7, generator=torch.manual_seed(3)).images[0]
image.save("image2image_512.png")

Inpainting

import torch
from diffusers import AmusedInpaintPipeline
from diffusers.utils import load_image
from PIL import Image

pipe = AmusedInpaintPipeline.from_pretrained(
    "amused/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "a man with glasses"
input_image = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting_256_orig.png"
    )
    .resize((512, 512))
    .convert("RGB")
)
mask = (
    load_image(
        "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting_256_mask.png"
    )
    .resize((512, 512))
    .convert("L")
)    

image = pipe(prompt, input_image, mask, generator=torch.manual_seed(3)).images[0]
image.save(f"inpainting_512.png")

📜 Docs: https://huggingface.co/docs/diffusers/main/en/api/pipelines/amused

🛠️ Models:

mused-256: https://huggingface.co/amused/amused-256 (603M params)
amused-512: https://huggingface.co/amused/amused-512 (608M params)

Faster SDXL

We’re excited to present an array of optimization techniques that can be used to accelerate the inference latency of text-to-image diffusion models. All of these can be done in native PyTorch without requiring additional C++ code.

SDXL_Batch_Size__1_Steps__30

These techniques are not specific to Stable Diffusion XL (SDXL) and can be used to improve other text-to-image diffusion models too. Starting from default fp32 precision, we can achieve a 3x speed improvement by applying different PyTorch optimization techniques. We encourage you to check out the detailed docs provided below.

Note: Compared to the default way most people use Diffusers which is fp16 + SDPA, applying all the optimization explained in the blog below yields a 30% speed-up.

📜 Docs: https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion 🌠 PyTorch blog post: https://pytorch.org/blog/accelerating-generative-ai-3/

Interruptible pipelines

Interrupting the diffusion process is particularly useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback.

This callback function should take the following arguments: pipe, i, t, and callback_kwargs (this must be returned). Set the pipeline's _interrupt attribute to True to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback.

In this example, the diffusion process is stopped after 10 steps even though num_inference_steps is set to 50.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.enable_model_cpu_offload()
num_inference_steps = 50

def interrupt_callback(pipe, i, t, callback_kwargs):
    stop_idx = 10
    if i == stop_idx:
        pipe._interrupt = True

    return callback_kwargs

pipe(
    "A photo of a cat",
    num_inference_steps=num_inference_steps,
    callback_on_step_end=interrupt_callback,
)

📜 Docs: https://huggingface.co/docs/diffusers/main/en/using-diffusers/callback

`peft` in our LoRA training examples

We incorporated peft in all the officially supported training examples concerning LoRA. This greatly simplifies the code and improves readability. LoRA training hasn't been easier, thanks to peft!

More memory-friendly version of LCM LoRA SDXL training

We incorporated best practices from peft to make LCM LoRA training for SDXL more memory-friendly. As such, you don't have to initialize two UNets (teacher and student) anymore. This version also integrates with the datasets library for quick experimentation. Check out this section for more details.

All commits

[docs] Fix video link by @stevhliu in #5986
Fix LLMGroundedDiffusionPipeline super class arguments by @KristianMischke in #5993
Remove a duplicated line? by @sweetcocoa in #6010
[examples/advanced_diffusion_training] bug fixes and improvements for LoRA Dreambooth SDXL advanced training script by @linoytsaban in #5935
[advanced_dreambooth_lora_sdxl_tranining_script] readme fix by @linoytsaban in #6019
[docs] Fix SVD video by @stevhliu in #6004
[Easy] minor edits to setup.py by @sayakpaul in #5996
[From Single File] Allow Text Encoder to be passed by @patrickvonplaten in #6020
[Community Pipeline] Regional Prompting Pipeline by @hako-mikan in #6015
[logging] Fix assertion bug by @standardAI in #6012
[Docs] Update a link by @standardAI in #6014
added attention_head_dim, attention_type, resolution_idx by @charchit7 in #6011
fix style by @patrickvonplaten (direct commit on v0.25.0)
[Kandinsky 3.0] Follow-up TODOs by @yiyixuxu in #5944
[schedulers] create self.sigmas during init by @yiyixuxu in #6006
Post Release: v0.24.0 by @patrickvonplaten in #5985
LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft by @TonyLianLong in #6023
adapt PixArtAlphaPipeline for pixart-lcm model by @lawrence-cj in #5974
[PixArt Tests] remove fast tests from slow suite by @sayakpaul in #5945
[LoRA serialization] fix: duplicate unet prefix problem. by @sayakpaul in #5991
[advanced dreambooth lora sdxl training script] improve help tags by @linoytsaban in #6035
fix StableDiffusionTensorRT super args error by @gujingit in #6009
Update value_guided_sampling.py by @Parth38 in #6027
Update Tests Fetcher by @DN6 in #5950
Add variant argument to dreambooth lora sdxl advanced by @levi in #6021
[Feature] Support IP-Adapter Plus by @okotaku in #5915
[Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ by @RuoyiDu in #6022
[advanced dreambooth lora training script][bug_fix] change token_abstraction type to str by @linoytsaban in #6040
[docs] Add Kandinsky 3 by @stevhliu in #5988
[docs] #Copied from mechanism by @stevhliu in #6007
Move kandinsky convert script by @DN6 in #6047
Pin Ruff Version by @DN6 in #6059
Ldm unet convert fix by @DN6 in #6038
Fix demofusion by @radames in #6049
[From single file] remove depr warning by @patrickvonplaten in #6043
[advanced_dreambooth_lora_sdxl_tranining_script] save embeddings locally fix by @apolinario in #6058
Device agnostic testing by @arsalanu in #5612
[feat] allow SDXL pipeline to run with fused QKV projections by @sayakpaul in #6030
fix by @DN6 (direct commit on v0.25.0)
Use CC12M for LCM WDS training example by @pcuenca in #5908
Disable Tests Fetcher by @DN6 in #6060
[Advanced Diffusion Training] Cache latents to avoid VAE passes for every training step by @apolinario in #6076
[Euler Discrete] Fix sigma by @patrickvonplaten in #6078
Harmonize HF environment variables + deprecate use_auth_token by @Wauplin in #6066
[docs] SDXL Turbo by @stevhliu in #6065
Add ControlNet-XS support by @UmerHA in #5827
Fix typing inconsistency in Euler discrete scheduler by @iabaldwin in #6052
[PEFT] Adapt example scripts to use PEFT by @younesbelkada in #5388
Fix clearing backend cache from device agnostic testing by @DN6 in #6075
[Community] AnimateDiff + Controlnet Pipeline by @a-r-r-o-w in #5928
EulerDiscreteScheduler add rescale_betas_zero_snr by @Beinsezii in #6024
Add support for IPAdapterFull by @fabiorigano in #5911
Fix a bug in add_noise function by @yiyixuxu in #6085
[Advanced Diffusion Script] Add Widget default text by @apolinario in #6100
[Advanced Training Script] Fix pipe example by @apolinario in #6106
IP-Adapter for StableDiffusionControlNetImg2ImgPipeline by @charchit7 in #5901
IP adapter support for most pipelines by @a-r-r-o-w in #5900
Correct type annotation for VaeImageProcessor.numpy_to_pil by @edwardwli in #6111
[Docs] Fix typos by @standardAI in #6122
[feat: Benchmarking Workflow] add stuff for a benchmarking workflow by @sayakpaul in #5839
[Community] Add SDE Drag pipeline by @Monohydroxides in #6105
[docs] IP-Adapter API doc by @stevhliu in #6140
Add missing subclass docs, Fix broken example in SD_safe by @a-r-r-o-w in #6116
[advanced dreambooth lora sdxl training script] load pipeline for inference only if validation prompt is used by @linoytsaban in #6171
[docs] Add missing \ in lora.md by @pierd in #6174
[Sigmas] Keep sigmas on CPU by @patrickvonplaten in #6173
LoRA test fixes by @DN6 in #6163
Add PEFT to training deps by @DN6 in #6148
Clean Up Comments in LCM(-LoRA) Distillation Scripts. by @dg845 in #6145
Compile test fix by @DN6 in #6104
[LoRA] add an error message when dealing with _best_guess_weight_name ofline by @sayakpaul in #6184
[Core] feat: enable fused attention projections for other SD and SDXL pipelines by @sayakpaul in #6179
[Benchmarks] fix: lcm benchmarking reporting by @sayakpaul in #6198
[Refactor autoencoders] feat: introduce autoencoders module by @sayakpaul in #6129
Fix the test script in examples/text_to_image/README.md by @krahets in #6209
Nit fix to training params by @osanseviero in #6200
[Training] remove depcreated method from lora scripts. by @sayakpaul in #6207
Fix SDXL Inpainting from single file with Refiner Model by @DN6 in #6147
Fix possible re-conversion issues after extracting from safetensors by @d8ahazard in #6097
Fix t2i. blog url by @abinthomasonline in #6205
[Text-to-Video] Clean up pipeline by @patrickvonplaten in #6213
[Torch Compile] Fix torch compile for svd vae by @patrickvonplaten in #6217
Deprecate Pipelines by @DN6 in #6169
Update README.md by @TilmannR in #6191
Support img2img and inpaint in lpw-xl by @a-r-r-o-w in #6114
Update train_text_to_image_lora.py by @haofanwang in #6144
[SVD] Fix guidance scale by @patrickvonplaten in #6002
Slow Test for Pipelines minor fixes by @DN6 in #6221
Add converter method for ip adapters by @fabiorigano in #6150
offload the optional module image_encoder by @yiyixuxu in #6151
fix: init for vae during pixart tests by @sayakpaul in #6215
[T2I LoRA training] fix: unscale fp16 gradient problem by @sayakpaul in #6119
ControlNetXS fixes. by @DN6 in #6228
add peft dependency to fast push tests by @sayakpaul in #6229
[refactor embeddings]pixart-alpha by @yiyixuxu in #6212
[Docs] Fix a code example in the ControlNet Inpainting documentation by @raven38 in #6236
[docs] Batched seeds by @stevhliu in #6237
[Fix] Fix Regional Prompting Pipeline by @hako-mikan in #6188
EulerAncestral add rescale_betas_zero_snr by @Beinsezii in #6187
[Refactor upsamplers and downsamplers] separate out upsamplers and downsamplers. by @sayakpaul in #6128
Bump transformers from 4.34.0 to 4.36.0 in /examples/research_projects/realfill by @dependabot[bot] in #6255
fix: unscale fp16 gradient problem & potential error by @lvzii in #6086)
[Refactor] move diffedit out of stable_diffusion by @sayakpaul in #6260
move attend and excite out of stable_diffusion by @sayakpaul (direct commit on v0.25.0)
Revert "move attend and excite out of stable_diffusion" by @sayakpaul (direct commit on v0.25.0)
[Training] remove depcreated method from lora scripts again by @Yimi81 in #6266
[Refactor] move k diffusion out of stable_diffusion by @sayakpaul in #6267
[Refactor] move gligen out of stable diffusion. by @sayakpaul in #6265
[Refactor] move sag out of stable_diffusion by @sayakpaul in #6264
TST Fix LoRA test that fails with PEFT >= 0.7.0 by @BenjaminBossan in #6216
[Refactor] move attend and excite out of stable_diffusion. by @sayakpaul in #6261
[Refactor] move panorama out of stable_diffusion by @sayakpaul in #6262
[Deprecated pipelines] remove pix2pix zero from init by @sayakpaul in #6268
[Refactor] move ldm3d out of stable_diffusion. by @sayakpaul in #6263
open muse by @williamberman in #5437
Remove ONNX inpaint legacy by @DN6 in #6269
Remove peft tests from old lora backend tests by @DN6 in #6273
Allow diffusers to load with Flax, w/o PyTorch by @pcuenca in #6272
[Community Pipeline] Add Marigold Monocular Depth Estimation by @markkua in #6249
Fix Prodigy optimizer in SDXL Dreambooth script by @apolinario in #6290
[LoRA PEFT] fix LoRA loading so that correct alphas are parsed by @sayakpaul in #6225
LoRA Unfusion test fix by @DN6 in #6291
Fix typos in the ValueError for a nested image list as StableDiffusionControlNetPipeline input. by @celestialphineas in #6286
fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in train_text_to_image_lora.py by @mwkldeveloper in #6259
fix: t2i apdater paper link by @sayakpaul in #6314
fix: lora peft dummy components by @sayakpaul in #6308
[Tests] Speed up example tests by @sayakpaul in #6319
fix: cannot set guidance_scale by @Jannchie in #6326
Change LCM-LoRA README Script Example Learning Rates to 1e-4 by @dg845 in #6304
[Peft] fix saving / loading when unet is not "unet" by @kashif in #6046
[Wuerstchen] fix fp16 training and correct lora args by @kashif in #6245
[docs] fix: animatediff docs by @sayakpaul in #6339
[Training] Add datasets version of LCM LoRA SDXL by @sayakpaul in #5778
[Peft / Lora] Add adapter_names in fuse_lora by @younesbelkada in #5823
[Diffusion fast] add doc for diffusion fast by @sayakpaul in #6311
Add rescale_betas_zero_snr Argument to DDPMScheduler by @dg845 in #6305
Interruptable Pipelines by @DN6 in #5867
Update Animatediff docs by @DN6 in #6341
Add AnimateDiff conversion scripts by @DN6 in #6340
amused other pipelines docs by @williamberman in #6343
[Docs] fix: video rendering on svd. by @sayakpaul in #6330
[SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run by @priprapre in #6270
adding auto1111 features to inpainting pipeline by @yiyixuxu in #6072
Remove unused parameters and fixed FutureWarning by @Justin900429 in #6317
amused update links to new repo by @williamberman in #6344
[LoRA] make LoRAs trained with peft loadable when peft isn't installed by @sayakpaul in #6306
Move ControlNetXS into Community Folder by @DN6 in #6316
fix: use retrieve_latents by @Jannchie in #6337
Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. by @dg845 in #6279
Fix "push_to_hub only create repo in consistency model lora SDXL training script" by @aandyw in #6102
Fix chunking in SVD by @DN6 in #6350
Add PEFT to advanced training script by @apolinario in #6294
Release: v0.25.0 by @sayakpaul (direct commit on v0.25.0)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@hako-mikan
- [Community Pipeline] Regional Prompting Pipeline (#6015)
- [Fix] Fix Regional Prompting Pipeline (#6188)
@TonyLianLong
- LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft (#6023)
@okotaku
- [Feature] Support IP-Adapter Plus (#5915)
@RuoyiDu
- [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ (#6022)
@UmerHA
- Add ControlNet-XS support (#5827)
@a-r-r-o-w
- [Community] AnimateDiff + Controlnet Pipeline (#5928)
- IP adapter support for most pipelines (#5900)
- Add missing subclass docs, Fix broken example in SD_safe (#6116)
- Support img2img and inpaint in lpw-xl (#6114)
@Monohydroxides
- [Community] Add SDE Drag pipeline (#6105)
@dg845
- Clean Up Comments in LCM(-LoRA) Distillation Scripts. (#6145)
- Change LCM-LoRA README Script Example Learning Rates to 1e-4 (#6304)
- Add rescale_betas_zero_snr Argument to DDPMScheduler (#6305)
- Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (#6279)
@markkua
- [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)

v0.24.0

5 months ago

Stable Video Diffusion, SDXL Turbo, IP Adapters, Kandinsky 3.0

Stable Diffusion Video

Stable Video Diffusion is a powerful image-to-video generation model that can generate high resolution (576x1024) 2-4 seconds videos conditioned on the input image.

Image to Video Generation

There are two variants of SVD. SVD and SVD-XT. The SVD checkpoint is trained to generate 14 frames and the SVD-XT checkpoint is further finetuned to generate 25 frames.

You need to condition the generation on an initial image, as follows:

import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

# Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

Since generating videos is more memory intensive, we can use the decode_chunk_size argument to control how many frames are decoded at once. This will reduce the memory usage. It's recommended to tweak this value based on your GPU memory. Setting decode_chunk_size=1 will decode one frame at a time and will use the least amount of memory, but the video might have some flickering.

Additionally, we also use model cpu offloading to reduce the memory usage.

rocket_generated

SDXL Turbo

SDXL Turbo is an adversarial time-distilled Stable Diffusion XL (SDXL) model capable of running inference in as little as 1 step. Also, it does not use classifier-free guidance, further increasing its speed. On a good consumer GPU, you can now generate an image in just 100ms.

Text-to-Image

For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the height and width parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so.

Make sure to set guidance_scale to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images. Increasing the number of steps to 2, 3 or 4 should improve image quality.

from diffusers import AutoPipelineForText2Image
import torch

pipeline_text2image = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipeline_text2image = pipeline_text2image.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipeline_text2image(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images[0]
image

Image-to-image

For image-to-image generation, make sure that num_inference_steps * strength is larger or equal to 1. The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step in our example below.

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid

# use from_pipe to avoid consuming additional memory when loading a checkpoint
pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
init_image = init_image.resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipeline(prompt, image=init_image, strength=0.5, guidance_scale=0.0, num_inference_steps=2).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

Image-to-image generation sample using SDXL Turbo

IP Adapters

IP Adapters have shown to be remarkably powerful at images conditioned on other images.

Thanks to @okotaku, we have added IP adapters to the most important pipelines allowing you to combine them for a variety of different workflows, e.g. they work with Img2Img2, ControlNet, and LCM-LoRA out of the box.

LCM-LoRA

from diffusers import DiffusionPipeline, LCMScheduler
import torch
from diffusers.utils import load_image

model_id =  "sd-dreambooth-library/herge-style"
lcm_lora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

pipe.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "best quality, high quality"
image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
images = pipe(
    prompt=prompt,
    ip_adapter_image=image,
    num_inference_steps=4,
    guidance_scale=1,
).images[0]

yiyi_test_2_out

ControlNet

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
from diffusers.utils import load_image

controlnet_model_path = "lllyasviel/control_v11f1p_sd15_depth"
controlnet = ControlNetModel.from_pretrained(controlnet_model_path, torch_dtype=torch.float16)

pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png")
depth_map = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    image=depth_map,
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    generator=generator,
).images
images[0].save("yiyi_test_2_out.png")

ip_image	condition	output

For more information:

:point_right: https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading_adapters#ip-adapter

Kandinsky 3.0

Kandinsky has released the 3rd version, which has much improved text-to-image alignment thanks to using Flan-T5 as the text encoder.

Text-to-Image

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

Image-to-Image

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "A painting of the inside of a subway train with tiny raccoons."
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]

Check it out:

:point_right: https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky3#kandinsky-3

All commits

LCM-LoRA docs by @patil-suraj in #5782
[Docs] Update and make improvements by @standardAI in #5819
[docs] Fix title by @stevhliu in #5831
Improve setup.py and add dependency check by @patrickvonplaten in #5826
[Docs] add: japanese sdxl as a reference by @sayakpaul in #5844
Set usedforsecurity=False in hashlib methods (FIPS compliance) by @Wauplin in #5790
fix memory consistency decoder test by @williamberman in #5828
[PEFT] Unpin peft by @patrickvonplaten in #5850
Speed up the peft lora unload by @pacman100 in #5741
[Tests/LoRA/PEFT] Test also on PEFT / transformers / accelerate latest by @younesbelkada in #5820
UnboundLocalError in SDXLInpaint.prepare_latents() by @a-r-r-o-w in #5648
[ControlNet] fix import in single file loading by @sayakpaul in #5834
[Styling] stylify using ruff by @kashif in #5841
[Community] [WIP] LCM Interpolation Pipeline by @a-r-r-o-w in #5767
[JAX] Replace uses of jax.devices("cpu") with jax.local_devices(backend="cpu") by @hvaara in #5864
[test / peft] Fix silent behaviour on PR tests by @younesbelkada in #5852
fix an issue that ipex occupy too much memory, it will not impact per… by @linlifan in #5625
Update LCMScheduler Inference Timesteps to be More Evenly Spaced by @dg845 in #5836
Revert "[Docs] Update and make improvements" by @standardAI in #5858
[docs] Loader APIs by @stevhliu in #5813
Update README.md by @co63oc in #5855
Add tests fetcher by @DN6 in #5848
Addition of new callbacks to controlnets by @a-r-r-o-w in #5812
[docs] MusicLDM by @stevhliu in #5854
Add features to the Dreambooth LoRA SDXL training script by @linoytsaban in #5508
[feat] IP Adapters (author @okotaku ) by @yiyixuxu in #5713
[Lora] Seperate logic by @patrickvonplaten in #5809
ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline by @affromero in #5869
Adds an advanced version of the SD-XL DreamBooth LoRA training script supporting pivotal tuning by @linoytsaban in #5883
[bug fix] fix small bug in readme template of sdxl lora training script by @linoytsaban in #5906
[bug fix] fix small bug in readme template of sdxl lora training script by @linoytsaban in #5914
[Docs] add: 8bit inference with pixart alpha by @sayakpaul in #5814
[@cene555][Kandinsky 3.0] Add Kandinsky 3.0 by @patrickvonplaten in #5913
[Examples] Allow downloading variant model files by @patrickvonplaten in #5531
[Fix: pixart-alpha] random 512px resolution bug by @lawrence-cj in #5842
[Core] add support for gradient checkpointing in transformer_2d by @sayakpaul in #5943
Deprecate KarrasVeScheduler and ScoreSdeVpScheduler by @a-r-r-o-w in #5269
Add Custom Timesteps Support to LCMScheduler and Supported Pipelines by @dg845 in #5874
set the model to train state before accelerator prepare by @sywangyi in #5099
Avoid computing min() that is expensive when do_normalize is False in the image processor by @ivanprado in #5896
Fix LCM Stable Diffusion distillation bug related to parsing unet_time_cond_proj_dim by @dg845 in #5893
add LoRA weights load and fuse support for IPEX pipeline by @linlifan in #5920
Replace multiple variables with one variable. by @hi-sushanta in #5715
fix: error on device for lpw_stable_diffusion_xl pipeline if pipe.enable_sequential_cpu_offload() enabled by @VicGrygorchyk in #5885
[Vae] Make sure all vae's work with latent diffusion models by @patrickvonplaten in #5880
[Tests] Make sure that we don't run tests multiple times by @patrickvonplaten in #5949
[Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems by @tongdaxu in #5939
[From_pretrained] Fix warning by @patrickvonplaten in #5948
[load_textual_inversion]: allow multiple tokens by @yiyixuxu in #5837
[docs] Fix space by @stevhliu in #5898
fix: minor typo in docstring by @soumik12345 in #5961
[ldm3d] Ldm3d upscaler to community pipeline by @estelleafl in #5870
[docs] Update pipeline list by @stevhliu in #5952
[Tests] Refactor test_examples.py for better readability by @sayakpaul in #5946
added doc for Kandinsky3.0 by @charchit7 in #5937
[bug fix] Inpainting for MultiAdapter by @affromero in #5922
Rename output_dir argument by @linhqyy in #5916
[LoRA refactor] move several state dict conversion utils out of lora.py by @sayakpaul in #5955
Support of ip-adapter to the StableDiffusionControlNetInpaintPipeline by @juancopi81 in #5887
[docs] LCM training by @stevhliu in #5796
Controlnet ssd 1b support by @MarkoKostiv in #5779
[Pipeline] Add TextToVideoZeroSDXLPipeline by @vahramtadevosyan in #4695
[Wuerstchen] Adapt lora training example scripts to use PEFT by @kashif in #5959
Fixed custom module importing on Windows by @PENGUINLIONG in #5891
Add SVD by @patil-suraj in #5895
[SDXL Turbo] Add some docs by @patrickvonplaten in #5982
Fix SVD doc by @patil-suraj in #5983

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@a-r-r-o-w
- UnboundLocalError in SDXLInpaint.prepare_latents() (#5648)
- [Community] [WIP] LCM Interpolation Pipeline (#5767)
- Addition of new callbacks to controlnets (#5812)
- Deprecate KarrasVeScheduler and ScoreSdeVpScheduler (#5269)
@dg845
- Update LCMScheduler Inference Timesteps to be More Evenly Spaced (#5836)
- Add Custom Timesteps Support to LCMScheduler and Supported Pipelines (#5874)
- Fix LCM Stable Diffusion distillation bug related to parsing unet_time_cond_proj_dim (#5893)
@affromero
- ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline (#5869)
- [bug fix] Inpainting for MultiAdapter (#5922)
@tongdaxu
- [Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems (#5939)
@estelleafl
- [ldm3d] Ldm3d upscaler to community pipeline (#5870)
@vahramtadevosyan
- [Pipeline] Add TextToVideoZeroSDXLPipeline (#4695)

Diffusers Versions Save

v0.27.2

All commits

v0.27.1

All commits

v0.27.0

Stable Cascade

Playground v2.5

EDM-style training support

New schedulers with the EDM formulation

Trajectory Consistency Distillation

IP-Adapter image embeddings and masking

Guide on merging LoRAs

LEDITS++

All commits

Significant community contributions

v0.26.3

All commits

v0.26.2

All commits

v0.26.1

All commits

v0.26.0

I2VGenXL

PIA

Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)

Single-file checkpoint loading

DPM scheduler fix

All commits

Significant community contributions

v0.25.1

v0.25.0

aMUSEd

Faster SDXL

Interruptible pipelines

peft in our LoRA training examples

More memory-friendly version of LCM LoRA SDXL training

All commits

Significant community contributions

v0.24.0

Stable Video Diffusion, SDXL Turbo, IP Adapters, Kandinsky 3.0

Stable Diffusion Video

Image to Video Generation

SDXL Turbo

Text-to-Image

Image-to-image

IP Adapters

LCM-LoRA

ControlNet

Kandinsky 3.0

Text-to-Image

Image-to-Image

All commits

Significant community contributions

`peft` in our LoRA training examples