Releases · huggingface/diffusers

04 Jun 21:36

yiyixuxu

v0.28.2

de9528e

v0.28.2: fix `from_single_file` clip model checkpoint key error 🐞

Change checkpoint key used to identify CLIP models in single file checkpoints by @DN6 in #8319

Contributors

DN6

Assets 2

04 Jun 10:37

sayakpaul

v0.28.1

0091f08

v0.28.1: HunyuanDiT and Transformer2D model class variants

This patch release primarily introduces the Hunyuan DiT pipeline from the Tencent team.

Hunyuan DiT

Hunyuan DiT is a transformer-based diffusion pipeline, introduced in the Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding paper by the Tencent Hunyuan.

import torch
from diffusers import HunyuanDiTPipeline

pipe = HunyuanDiTPipeline.from_pretrained(
    "Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
)
pipe.to("cuda")

# You may also use English prompt as HunyuanDiT supports both English and Chinese
# prompt = "An astronaut riding a horse"
prompt = "一个宇航员在骑马"
image = pipe(prompt).images[0]

🧠 This pipeline has support for multi-linguality.

📜 Refer to the official docs here to learn more about it.

Thanks to @gnobitab, for contributing Hunyuan DiT in #8240.

All commits

Release: v0.28.0 by @sayakpaul (direct commit on v0.28.1-patch)
[Core] Introduce class variants for Transformer2DModel by @sayakpaul in #7647
resolve comflicts by @toshas (direct commit on v0.28.1-patch)
Tencent Hunyuan Team: add HunyuanDiT related updates by @gnobitab in #8240
Tencent Hunyuan Team - Updated Doc for HunyuanDiT by @gnobitab in #8383
[Transformer2DModel] Handle norm_type safely while remapping by @sayakpaul in #8370
Release: v0.28.1 by @sayakpaul (direct commit on v0.28.1-patch)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@gnobitab
- Tencent Hunyuan Team: add HunyuanDiT related updates (#8240)
- Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)

Contributors

gnobitab, toshas, and sayakpaul

Assets 2

27 May 12:08

sayakpaul

v0.28.0

7828d4e

v0.28.0: Marigold, PixArt Sigma, AnimateDiff SDXL, InstantStyle, VQGAN Training Script, and more

Diffusion models are known for their abilities in the space of generative modeling. This release of diffusers introduces the first official pipeline (Marigold) for discriminative tasks such as depth estimation and surface normals’ estimation!

Starting this release, we will also highlight the changes and features from the library that make it easy to integrate community checkpoints, features, and so on. Read on!

Marigold

Proposed in Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, Marigold introduces a diffusion model and associated fine-tuning protocol for monocular depth estimation. It can also be extended to perform surface normals’ estimation.

(Image taken from the official repository)

The code snippet below shows how to use this pipeline for depth estimation:

import diffusers
import torch

pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
    "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
).to("cuda")

image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
depth = pipe(image)

vis = pipe.image_processor.visualize_depth(depth.prediction)
vis[0].save("einstein_depth.png")

depth_16bit = pipe.image_processor.export_depth_to_16bit_png(depth.prediction)
depth_16bit[0].save("einstein_depth_16bit.png")

Check out the API documentation here. We also have a detailed guide about the pipeline here.

Thanks to @toshas, one of the authors of Marigold, who contributed this in #7847.

🌀 Massive Refactor of `from_single_file` 🌀

We have further refactored from_single_file to align its logic more closely to the from_pretrained method. The biggest benefit of doing this is that it allows us to expand single file loading support beyond Stable Diffusion-like pipelines and models. It also makes it easier to load models that are saved and shared in their original format.

Some of the changes introduced in this refactor:

When loading a single file checkpoint, we will attempt to use the keys present in the checkpoint to infer a model repository on the Hugging Face Hub that we can use to configure the pipeline. For example, if you are using a single file checkpoint based on SD 1.5, we would use the configuration files in the runwayml/stable-diffusion-v1-5 repository to configure the model components and pipeline.
Suppose this inferred configuration isn’t appropriate for your checkpoint. In that case, you can override it using the config argument and pass in either a path to a local model repo or a repo id on the Hugging Face Hub.

pipe = StableDiffusionPipeline.from_single_file("...", config=<model repo id or local repo path>)

Deprecation of model configuration arguments for the from_single_file method in Pipelines such as num_in_channels, scheduler_type , image_size and upcast_attention . This is an anti-pattern that we have supported in previous versions of the library when we assumed that it would only be relevant to Stable Diffusion based models. However, given that there is a demand to support other model types, we feel it is necessary for single-file loading behavior to adhere to the conventions set in our other loading methods. Configuring individual model components through a pipeline loading method is not something we support in from_pretrained, and therefore, we will be deprecating support for this behavior in from_single_file as well.

PixArt Sigma

PixArt Simga is the successor to PixArt Alpha. PixArt Sigma is capable of directly generating images at 4K resolution. It can also produce images of markedly higher fidelity and improved alignment with text prompts. It comes with a massive sequence length of 300 (for reference, PixArt Alpha has a maximum sequence length of 120)!

(Taken from the project website.)

import torch
from diffusers import PixArtSigmaPipeline

# You can replace the checkpoint id with "PixArt-alpha/PixArt-Sigma-XL-2-512-MS" too.
pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", torch_dtype=torch.float16
)
# Enable memory optimizations.
pipe.enable_model_cpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]

📃 Refer to the documentation here to learn more about PixArt Sigma.

Thanks to @lawrence-cj, one of the authors of PixArt Sigma, who contributed this in #7857.

AnimateDiff SDXL

@a-r-r-o-w contributed the Stable Diffusion XL (SDXL) version of AnimateDiff in #6721. However, note that this is currently an experimental feature, as only a beta release of the motion adapter checkpoint is available.

import torch
from diffusers.models import MotionAdapter
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
scheduler = DDIMScheduler.from_pretrained(
    model_id,
    subfolder="scheduler",
    clip_sample=False,
    beta_schedule="linear",
    steps_offset=1,
)
pipe = AnimateDiffSDXLPipeline.from_pretrained(
    model_id,
    motion_adapter=adapter,
    scheduler=scheduler,
    torch_dtype=torch.float16,
    variant="fp16",
).enable_model_cpu_offload()

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

output = pipe(
    prompt="a panda surfing in the ocean, realistic, high quality",
    negative_prompt="low quality, worst quality",
    num_inference_steps=20,
    guidance_scale=8,
    width=1024,
    height=1024,
    num_frames=16,
)

frames = output.frames[0]
export_to_gif(frames, "animation.gif")

📜 Refer to the documentation to learn more.

Block-wise LoRA

@UmerHA contributed the support to control the scales of different LoRA blocks in a granular manner in #7352. Depending on the LoRA checkpoint one is using, this granular control can significantly impact the quality of the generated outputs. Following code block shows how this feature can be used while performing inference:

...

adapter_weight_scales = { "unet": { "down": 0, "mid": 1, "up": 0} }
pipe.set_adapters("pixel", adapter_weight_scales)
image = pipe(
		prompt, num_inference_steps=30, generator=torch.manual_seed(0)
).images[0]

✍️ Refer to our documentation for more details and a full-fledged example.

InstantStyle

More granular control of scale could be extended to IP-Adapters too. @DannHuang contributed to the support of InstantStyle, aka granular control of IP-Adapter scales, in #7668. The following code block shows how this feature could be used when performing inference with IP-Adapters:

...

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

This way, one can generate images following only the style or layout from the image prompt, with significantly improved diversity. This is achieved by only activating IP-Adapters to specific parts of the model.

Check out the documentation here.

ControlNetXS

ControlNet-XS was introduced in ControlNet-XS by Denis Zavadski and Carsten Rother. Based on the observation, the control model in the original ControlNet can be made much smaller and still produce good results. ControlNet-XS generates images comparable to a regular ControlNet, but it is 20-25% faster (see benchmark with StableDiffusion-XL) and uses ~45% less memory.

ControlNet-XS is supported for both Stable Diffusion and Stable Diffusion XL.

Thanks to @UmerHA for contributing ControlNet-XS in #5827 and #6772.

Custom Timesteps

We introduced custom timesteps support for some of our pipelines and schedulers. You can now set your scheduler with a list of arbitrary timesteps. For example, you can use the AYS timesteps schedule to achieve very nice results with only 10 denoising steps.

from diffusers.schedulers import AysSchedules
sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"]
pipe = StableDiffusionXLPipeline.from_pretrained(
    "SG16...

Contributors

clinty, dulacp, and 73 other contributors

Assets 2

20 Mar 01:57

sayakpaul

v0.27.2

b69fd99

v0.27.2: Fix scheduler `add_noise` 🐞, embeddings in StableCascade, `scale` when using LoRA

All commits

[scheduler] fix a bug in add_noise by @yiyixuxu in #7386
[LoRA] fix cross_attention_kwargs problems and tighten tests by @sayakpaul in #7388
Fix issue with prompt embeds and latents in SD Cascade Decoder with multiple image embeddings for a single prompt. by @DN6 in #7381

Contributors

DN6, yiyixuxu, and sayakpaul

Assets 2

19 Mar 03:59

sayakpaul

v0.27.1

d7634cc

v0.27.1: Clear `scale` argument confusion for LoRA

All commits

Release: v0.27.0 by @DN6 (direct commit on v0.27.1-patch)
[LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds by @sayakpaul in #7338
Release: 0.27.1-patch by @sayakpaul (direct commit on v0.27.1-patch)

Contributors

DN6 and sayakpaul

Assets 2

14 Mar 16:00

sayakpaul

v0.27.0

cfa7c0a

v0.27.0: Stable Cascade, Playground v2.5, EDM-style training, IP-Adapter image embeds, and more

Stable Cascade

We are adding support for a new text-to-image model building on Würstchen called Stable Cascade, which comes with a non-commercial license. The Stable Cascade line of pipelines differs from Stable Diffusion in that they are built upon three distinct models and allow for hierarchical compression of image patients, achieving remarkable outputs.

from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline
import torch

prior = StableCascadePriorPipeline.from_pretrained(
    "stabilityai/stable-cascade-prior",
    torch_dtype=torch.bfloat16,
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image_emb = prior(prompt=prompt).image_embeddings[0]

decoder = StableCascadeDecoderPipeline.from_pretrained(
    "stabilityai/stable-cascade",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(image_embeddings=image_emb, prompt=prompt).images[0]
image

📜 Check out the docs here to know more about the model.

Note: You will need a torch>=2.2.0 to use the torch.bfloat16 data type with the Stable Cascade pipeline.

Playground v2.5

PlaygroundAI released a new v2.5 model (playgroundai/playground-v2.5-1024px-aesthetic), which particularly excels at aesthetics. The model closely follows the architecture of Stable Diffusion XL, except for a few tweaks. This release comes with support for this model:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2.5-1024px-aesthetic",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]
image

Loading from the original single-file checkpoint is also supported:

from diffusers import StableDiffusionXLPipeline, EDMDPMSolverMultistepScheduler
import torch

url = "https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/playground-v2.5-1024px-aesthetic.safetensors"
pipeline = StableDiffusionXLPipeline.from_single_file(url)
pipeline.to(device="cuda", dtype=torch.float16)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipeline(prompt=prompt, guidance_scale=3.0).images[0]
image.save("playground_test_image.png")

You can also perform LoRA DreamBooth training with the playgroundai/playground-v2.5-1024px-aesthetic checkpoint:

accelerate launch train_dreambooth_lora_sdxl.py \
  --pretrained_model_name_or_path="playgroundai/playground-v2.5-1024px-aesthetic"  \
  --instance_data_dir="dog" \
  --output_dir="dog-playground-lora" \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --use_8bit_adam \
  --report_to="wandb" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub

To know more, follow the instructions here.

EDM-style training support

EDM refers to the training and sampling techniques introduced in the following paper: Elucidating the Design Space of Diffusion-Based Generative Models. We have introduced support for training using the EDM formulation in our train_dreambooth_lora_sdxl.py script.

To train stabilityai/stable-diffusion-xl-base-1.0 using the EDM formulation, you just have to specify the --do_edm_style_training flag in your training command, and voila 🤗

If you’re interested in extending this formulation to other training scripts, we refer you to this PR.

New schedulers with the EDM formulation

To better support the Playground v2.5 model and EDM-style training in general, we are bringing support for EDMDPMSolverMultistepScheduler and EDMEulerScheduler. These support the EDM formulations of the DPMSolverMultistepScheduler and EulerDiscreteScheduler, respectively.

Trajectory Consistency Distillation

Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. It was proposed in Trajectory Consistency Distillation.

This release comes with the support of a TCDScheduler that enables this kind of fast sampling. Much like LCM-LoRA, TCD requires an additional adapter for the acceleration. The code snippet below shows a usage:

import torch
from diffusers import StableDiffusionXLPipeline, TCDScheduler

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
tcd_lora_id = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to(device)
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(tcd_lora_id)
pipe.fuse_lora()

prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."

image = pipe(
    prompt=prompt,
    num_inference_steps=4,
    guidance_scale=0,
    eta=0.3, 
    generator=torch.Generator(device=device).manual_seed(0),
).images[0]

📜 Check out the docs here to know more about TCD.

Many thanks to @mhh0318 for contributing the TCDScheduler in #7174 and the guide in #7259.

IP-Adapter image embeddings and masking

All the pipelines supporting IP-Adapter accept a ip_adapter_image_embeds argument. If you need to run the IP-Adapter multiple times with the same image, you can encode the image once and save the embedding to the disk. This saves computation time and is especially useful when building UIs. Additionally, ComfyUI image embeddings for IP-Adapters are fully compatible in Diffusers and should work out-of-box.

We have also introduced support for providing binary masks to specify which portion of the output image should be assigned to an IP-Adapter. For each input IP-Adapter image, a binary mask and an IP-Adapter must be provided. Thanks to @fabiorigano for contributing this feature through #6847.

📜 To know about the exact usage of both of the above, refer to our official guide.

We thank our community members, @fabiorigano, @asomoza, and @cubiq, for their guidance and input on these features.

Guide on merging LoRAs

Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the set_adapters method which concatenates the weights of the LoRAs to merge.

Now, Diffusers also supports the add_weighted_adapter method from the PEFT library, unlocking more efficient merging method like TIES, DARE, linear, and even combinations of these merging methods like dare_ties.

📜 Take a look at the Merge LoRAs guide to learn more about merging in Diffusers.

LEDITS++

We are adding support to the real image editing technique called LEDITS++: Limitless Image Editing using Text-to-Image Models, a parameter-free method, requiring no fine-tuning nor any optimization.
To edit real images, the LEDITS++ pipelines first invert the image DPM-solver++ scheduler that facilitates editing with as little as 20 total diffusion steps for inversion and inference combined. LEDITS++ guidance is defined such that it both reflects the direction of the edit (if we want to push away from/towards the edit concept) and the strength of the effect. The guidance also includes a masking term focused on relevant image regions which, for multiple edits especially, ensures that the corresponding guidance terms for each concept remain mostly isolated, limiting interference.

The code snippet below shows a usage:

import torch
import PIL
import requests
from io import BytesIO
from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL

device = "cuda"
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained(
    base_model_id, 
    vae=vae, 
    torch_dtype=torch.float16
).to(device)

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
image = download_image(img_url)

_ = pipe.invert(
    image = image,
    num_inversion_steps=50,
    skip=0.2
)

edited_image = pipe(
    editing_prompt...

Contributors

kashif, lstein, and 55 other contributors

Assets 2

13 Feb 09:31

yiyixuxu

v0.26.3

66f94ea

v0.26.3: Patch release to fix DPMSolverSinglestepScheduler and configuring VAE from single file mixin

All commits

Fix configuring VAE from single file mixin by @DN6 in #6950
[DPMSolverSinglestepScheduler] correct get_order_list for solver_order=2and lower_order_final=True by @yiyixuxu in #6953

Contributors

DN6 and yiyixuxu

Assets 2

06 Feb 02:13

sayakpaul

v0.26.2

7d52558

v0.26.2: Patch fix for adding `self.use_ada_layer_norm_*` params back to `BasicTransformerBlock`

In v0.26.0, we introduced a bug 🐛 in the BasicTransformerBlock by removing some boolean flags. This caused many popular libraries tomesd to break. We have fixed that in this release. Thanks to @vladmandic for bringing this to our attention.

All commits

add self.use_ada_layer_norm_* params back to BasicTransformerBlock by @yiyixuxu in #6841

Contributors

yiyixuxu and vladmandic

Assets 2

02 Feb 09:23

sayakpaul

v0.26.1

08e6558

v0.26.1: Patch release to fix `torchvision` dependency

In the v0.26.0 release, we slipped in the torchvision library as a required library, which shouldn't have been the case. This is now fixed.

All commits

add is_torchvision_available by @yiyixuxu in #6800

Contributors

yiyixuxu

Assets 2

01 Feb 02:03

sayakpaul

v0.26.0

674d43f

v0.26.0: New video pipelines, single-file checkpoint revamp, multi IP-Adapter inference with multiple images

This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.

I2VGenXL

I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

repo_id = "ali-vilab/i2vgen-xl"
pipeline = I2VGenXLPipeline.from_pretrained(repo_id, torch_dtype=torch.float16).to("cuda")
pipeline.enable_model_cpu_offload()

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0001.jpg"
image = load_image(image_url).convert("RGB")
prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style."
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    generator=generator,
).frames
export_to_gif(frames[0], "i2v.gif")

masterpiece, bestquality, sunset.

📜 Check out the docs here.

PIA

PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.

import torch
from diffusers import (
    EulerDiscreteScheduler,
    MotionAdapter,
    PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image

adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)

pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"

generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-animation.gif")

masterpiece, bestquality, sunset.

📜 Check out the docs here.

Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)

IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:

import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from transformers import CLIPVisionModelWithProjection
from diffusers.utils import load_image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    image_encoder=image_encoder,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"])
pipeline.set_ip_adapter_scale([0.7, 0.3])

pipeline.enable_model_cpu_offload()

face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")

style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images =  [load_image(f"{style_folder}/img{i}.png") for i in range(10)]

generator = torch.Generator(device="cpu").manual_seed(0)

image = pipeline(
    prompt="wonderwoman",
    ip_adapter_image=[style_images, face_image],
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50
    generator=generator,
).images[0]

Reference style images:

Reference face Image	Output Image

📜 Check out the docs here.

Single-file checkpoint loading

from_single_file() utility has been refactored for better readability and to follow similar semantics as from_pretrained() . Support for loading single file checkpoints and configs from URLs has also been added.

DPM scheduler fix

We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.

Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.

All commits

[docs] Fix missing API function by @stevhliu in #6604
Fix failing tests due to Posix Path by @DN6 in #6627
Update convert_from_ckpt.py / read checkpoint config yaml contents by @spezialspezial in #6633
[Community] Experimental AnimateDiff Image to Video (open to improvements) by @a-r-r-o-w in #6509
refactor: extract init/forward function in UNet2DConditionModel by @ultranity in #6478
Modularize InstructPix2Pix SDXL inferencing during and after training in examples by @sang-k in #6569
Fixed the bug related to saving DeepSpeed models. by @HelloWorldBeginner in #6628
fix DPM Scheduler with use_karras_sigmas option by @yiyixuxu in #6477
fix SDXL-kdiffusion tests by @yiyixuxu in #6647
add padding_mask_crop to all inpaint pipelines by @rootonchair in #6360
add Sa-Solver by @lawrence-cj in #5975
Add tearDown method to LoRA tests. by @DN6 in #6660
[Diffusion DPO] apply fixes from #6547 by @sayakpaul in #6668
Update README by @StandardAI in #6669
[Big refactor] move unets to unets module 🦋 by @sayakpaul in #6630
Standardise outputs for video pipelines by @DN6 in #6626
fix dpm related slow test failure by @yiyixuxu in #6680
[Tests] Test for passing local config file to from_single_file() by @sayakpaul in #6638
[Refactor] Update from single file by @DN6 in #6428
[WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow by @ayushtues in #6057
Add InstantID Pipeline by @haofanwang in #6673
[Docs] update: tutorials ja | AUTOPIPELINE.md by @YasunaCoffee in #6629
[Fix bugs] pipeline_controlnet_sd_xl.py by @haofanwang in #6653
SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) by @brandostrong in #6449
AnimateDiff Video to Video by @a-r-r-o-w in #6328
[docs] UViT2D by @stevhliu in #6643
Correct sigmas cpu settings by @patrickvonplaten in #6708
[docs] AnimateDiff Video-to-Video by @a-r-r-o-w in #6712
fix community README by @a-r-r-o-w in #6645
fix custom diffusion training with concept list by @AIshutin in #6710
Add IP Adapters to slow tests by @DN6 in #6714
Move tests for SD inference variant pipelines into their own modules by @DN6 in #6707
Add Community Example Consistency Training Script by @dg845 in #6717
Add UFOGenScheduler to Community Examples by @dg845 in #6650
[Hub] feat: explicitly tag to diffusers when using push_to_hub by @sayakpaul in #6678
Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator by @thuliu-yt16 in #6307
changed to posix unet by @gzguevara in #6719
Change os.path to pathlib Path by @Stepheni12 in #6737
correct hflip arg by @sayakpaul in #6743
Add unload_textual_inversion method by @fabiorigano in #6656
[Core] move transformer scripts to transformers modules by @sayakpaul in #6747
Update lora.md with a more accurate description of rank by @xhedit in #6724
Fix mixed preci...