v0.3.0: New API, Stable Diffusion pipelines, low-memory inference, MPS backend, ONNX
📚 Shiny new docs!
Thanks to the community efforts for [Docs] and [Type Hints] we've started populating the Diffusers documentation pages with lots of helpful guides, links and API references.
📝 New API & breaking changes
New API
Pipeline, Model, and Scheduler outputs can now be both dataclasses, Dicts, and Tuples:
image = pipe("The red cat is sitting on a chair")["sample"][0]
is now replaced by:
image = pipe("The red cat is sitting on a chair").images[0]
# or
image = pipe("The red cat is sitting on a chair")["image"][0]
# or
image = pipe("The red cat is sitting on a chair")[0]
Similarly:
sample = unet(...).sample
and
prev_sample = scheduler(...).prev_sample
is now possible!
🚨🚨🚨 Breaking change 🚨🚨🚨
This PR introduces breaking changes for the following public-facing methods:
VQModel.encode
-> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to changelatents = model.encode(...)
tolatents = model.encode(...)[0]
orlatents = model.encode(...).latens
VQModel.decode
-> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to changesample = model.decode(...)
tosample = model.decode(...)[0]
orsample = model.decode(...).sample
VQModel.forward
-> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to changesample = model(...)
tosample = model(...)[0]
orsample = model(...).sample
AutoencoderKL.encode
-> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to changelatent_dist = model.encode(...)
tolatent_dist = model.encode(...)[0]
orlatent_dist = model.encode(...).latent_dist
AutoencoderKL.decode
-> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to changesample = model.decode(...)
tosample = model.decode(...)[0]
orsample = model.decode(...).sample
AutoencoderKL.forward
-> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to changesample = model(...)
tosample = model(...)[0]
orsample = model(...).sample
🎨 New Stable Diffusion pipelines
A couple of new pipelines have been added to Diffusers! We invite you to experiment with them, and to take them as inspiration to create your cool new tasks. These are the new pipelines:
- Image-to-image generation. In addition to using a text prompt, this pipeline lets you include an example image to be used as the initial state of the process. 🤗 Diffuse the Rest is a cool demo about it!
- Inpainting (experimental). You can provide an image and a mask and ask Stable Diffusion to replace the mask.
For more details about how they work, please visit our new API documentation.
This is a summary of all the Stable Diffusion tasks that can be easily used with 🤗 Diffusers:
Pipeline | Tasks | Colab | Demo |
---|---|---|---|
pipeline_stable_diffusion.py | Text-to-Image Generation | 🤗 Stable Diffusion | |
pipeline_stable_diffusion_img2img.py | Image-to-Image Text-Guided Generation | 🤗 Diffuse the Rest | |
pipeline_stable_diffusion_inpaint.py | Experimental – Text-Guided Image Inpainting | Coming soon |
🍬 Less memory usage for smaller GPUs
Now the diffusion models can take up significantly less VRAM (3.2 GB for Stable Diffusion) at the expense of 10% of speed thanks to the optimizations discussed in basujindal/stable-diffusion#117.
To make use of the attention optimization, just enable it with .enable_attention_slicing()
after loading the pipeline:
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=True
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()
This will allow many more users to play with Stable Diffusion in their own computers! We can't wait to see what new ideas and results will be created by the community!
🐈⬛ Textual Inversion
Textual Inversion lets you personalize a Stable Diffusion model on your own images with just 3-5 samples.
GitHub: https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion
Training: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb
Inference: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb
🍎 MPS backend for Apple Silicon
🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch mps
device. You need to install PyTorch Preview (Nightly) on a Mac with M1 or M2 CPU, and then use the pipeline as usual:
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
pipe = pipe.to("mps")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
We are seeing great speedups (31s vs 214s in a M1 Max), but there are still a couple of limitations. We encourage you to read the documentation for the details.
🏭 Experimental ONNX exporter and pipeline for Stable Diffusion
We introduce a new (and experimental) Stable Diffusion pipeline compatible with the ONNX Runtime. This allows you to run Stable Diffusion on any hardware that supports ONNX (including a significant speedup on CPUs).
You need to use StableDiffusionOnnxPipeline
instead of StableDiffusionPipeline
. You also need to download the weights from the onnx
branch of the repository, and indicate the runtime provider you want to use (CPU, in the following example):
from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="onnx",
provider="CPUExecutionProvider",
use_auth_token=True,
)
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
To convert your own checkpoint, run the conversion script locally:
python scripts/convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"
After that it can be loaded from the local path:
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="CPUExecutionProvider")
Improvements and bugfixes
- Mark in painting experimental by @patrickvonplaten in #430
- Add config docs by @patrickvonplaten in #429
- [Docs] Models by @kashif in #416
- [Docs] Using diffusers by @patrickvonplaten in #428
- [Outputs] Improve syntax by @patrickvonplaten in #423
- Initial ONNX doc (TODO: Installation) by @pcuenca in #426
- [Tests] Correct image folder tests by @patrickvonplaten in #427
- [MPS] Make sure it doesn't break torch < 1.12 by @patrickvonplaten in #425
- [ONNX] Stable Diffusion exporter and pipeline by @anton-l in #399
- [Tests] Make image-based SD tests reproducible with fixed datasets by @anton-l in #424
- [Docs] Outputs.mdx by @patrickvonplaten in #422
- [Docs] Fix scheduler docs by @patrickvonplaten in #421
- [Docs] DiffusionPipeline by @patrickvonplaten in #418
- Improve unconditional diffusers example by @satpalsr in #414
- Improve latent diff example by @satpalsr in #413
- Inference support for
mps
device by @pcuenca in #355 - [Docs] Minor fixes in optimization section by @patrickvonplaten in #420
- [Docs] Pipelines for inference by @satpalsr in #417
- [Docs] Training docs by @patrickvonplaten in #415
- Docs: fp16 page by @pcuenca in #404
- Add typing to scheduling_sde_ve: init, set_timesteps, and set_sigmas function definitions by @danielpatrickhug in #412
- Docs fix some typos by @natolambert in #408
- [docs sprint] schedulers docs, will update by @natolambert in #376
- Docs: fix undefined in toctree by @natolambert in #406
- Attention slicing by @patrickvonplaten in #407
- Rename variables from single letter to meaningful name fix by @rashmimarganiatgithub in #395
- Docs: Stable Diffusion pipeline by @pcuenca in #386
- Small changes to Philosophy by @pcuenca in #403
- karras-ve docs by @kashif in #401
- Score sde ve doc by @kashif in #400
- [Docs] Finish Intro Section by @patrickvonplaten in #402
- [Docs] Quicktour by @patrickvonplaten in #397
- ddim docs by @kashif in #396
- Docs: optimization / special hardware by @pcuenca in #390
- added pndm docs by @kashif in #391
- Update text_inversion.mdx by @johnowhitaker in #393
- [Docs] Logging by @patrickvonplaten in #394
- [Pipeline Docs] ddpm docs for sprint by @kashif in #382
- [Pipeline Docs] Unconditional Latent Diffusion by @satpalsr in #388
- Docs: Conceptual section by @pcuenca in #392
- [Pipeline Docs] Latent Diffusion by @patrickvonplaten in #377
- [textual-inversion] fix saving embeds by @patil-suraj in #387
- [Docs] Let's go by @patrickvonplaten in #385
- Add colab links to textual inversion by @apolinario in #375
- Efficient Attention by @patrickvonplaten in #366
- Use
expand
instead of ones to broadcast tensor by @pcuenca in #373 - [Tests] Fix SD slow tests by @anton-l in #364
- [Type Hint] VAE models by @daspartho in #365
- [Type hint] scheduling lms discrete by @santiviquez in #360
- [Type hint] scheduling karras ve by @santiviquez in #359
- type hints: models/vae.py by @shepherd1530 in #346
- [Type Hints] DDIM pipelines by @sidthekidder in #345
- [ModelOutputs] Replace dict outputs with Dict/Dataclass and allow to return tuples by @patrickvonplaten in #334
- package
version
on main should have.dev0
suffix by @mishig25 in #354 - [textual_inversion] use tokenizer.add_tokens to add placeholder_token by @patil-suraj in #357
- [Type hint] scheduling ddim by @santiviquez in #343
- [Type Hints] VAE models by @daspartho in #344
- [Type Hint] DDPM schedulers by @daspartho in #349
- [Type hint] PNDM schedulers by @daspartho in #335
- Fix typo in unet_blocks.py by @da03 in #353
- [Commands] Add env command by @patrickvonplaten in #352
- Add transformers and scipy to dependency table by @patrickvonplaten in #348
- [Type Hint] Unet Models by @sidthekidder in #330
- [Img2Img2] Re-add K LMS scheduler by @patrickvonplaten in #340
- Use ONNX / Core ML compatible method to broadcast by @pcuenca in #310
- [Type hint] PNDM pipeline by @daspartho in #327
- [Type hint] Latent Diffusion Uncond pipeline by @santiviquez in #333
- Add contributions to README and re-order a bit by @patrickvonplaten in #316
- [CI] try to fix GPU OOMs between tests and excessive tqdm logging by @anton-l in #323
- README: stable diffusion version v1-3 -> v1-4 by @pcuenca in #331
- Textual inversion by @patil-suraj in #266
- [Type hint] Score SDE VE pipeline by @santiviquez in #325
- [CI] Cancel pending jobs for PRs on new commits by @anton-l in #324
- [train_unconditional] fix gradient accumulation. by @patil-suraj in #308
- Fix nondeterministic tests for GPU runs by @anton-l in #314
- Improve README to show how to use SD without an access token by @patrickvonplaten in #315
- Fix flake8 F401 imported but unused by @anton-l in #317
- Allow downloading of revisions for models. by @okalldal in #303
- Fix more links by @python273 in #312
- Changed variable name from "h" to "hidden_states" by @JC-swEng in #285
- Fix stable-diffusion-seeds.ipynb link by @python273 in #309
- [Tests] Add fast pipeline tests by @patrickvonplaten in #302
- Improve README by @patrickvonplaten in #301
- [Refactor] Remove set_seed by @patrickvonplaten in #289
- [Stable Diffusion] Hotfix by @patrickvonplaten in #299
- Check dummy file by @patrickvonplaten in #297
- Add missing auth tokens for two SD tests by @anton-l in #296
- Fix GPU tests (token + single-process) by @anton-l in #294
- [PNDM Scheduler] format timesteps attrs to np arrays by @NouamaneTazi in #273
- Fix link by @python273 in #286
- [Type hint] Karras VE pipeline by @patrickvonplaten in #288
- Add datasets + transformers + scipy to test deps by @anton-l in #279
- Easily understandable error if inference steps not set before using scheduler by @samedii in #263)
- [Docs] Add some guides by @patrickvonplaten in #276
- [README] Add readme for SD by @patrickvonplaten in #274
- Refactor Pipelines / Community pipelines and add better explanations. by @patrickvonplaten in #257
- Refactor progress bar by @hysts in #242
- Support K-LMS in img2img by @anton-l in #270
- [BugFix]: Fixed add_noise in LMSDiscreteScheduler by @nicolas-dufour in #253
- [Tests] Make sure tests are on GPU by @patrickvonplaten in #269
- Adds missing torch imports to inpainting and image_to_image example by @PulkitMishra in #265
- Fix typo in README.md by @webel in #260
- Fix inpainting script by @patil-suraj in #258
- Initialize CI for code quality and testing by @anton-l in #256
- add inpainting example script by @nagolinc in #241
- Update README.md with examples by @natolambert in #252
- Reproducible images by supplying latents to pipeline by @pcuenca in #247
- Style the
scripts
directory by @anton-l in #250 - Pin black==22.3 to keep a stable --preview flag by @anton-l in #249
- [Clean up] Clean unused code by @patrickvonplaten in #245
- added test workflow and fixed failing test by @kashif in #237
- split tests_modeling_utils by @kashif in #223
- [example/image2image] raise error if strength is not in desired range by @patil-suraj in #238
- Add image2image example script. by @patil-suraj in #231
- Remove dead code in
resnet.py
by @ydshieh in #218
Significant community contributions
The following contributors have made significant changes to the library over the last release: