Apply applicable `quantization_config` to model components when loading a model #10327

vladmandic · 2024-12-20T20:33:20Z

With new improvements to quantization_config, memory requirements of models such as SD35 and FLUX.1 are much lower.
However, user must load each model component that he wants quantized manually and then assemble the pipeline.

For example:

quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)
text_encoder = T5EncoderModel.from_pretrained(repo_id, subfolder="text_encoder_3", quantization_config=quantization_config)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, transformer=transformer, text_encoder=text_encoder)

The ask is to allow pipeline loader itself to process quantization_config and automatically use it on applicable modules if its present
That would allow much simpler use without user needing to know exact internal components of the each model:

quantization_config = BitsAndBytesConfig(...)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, quantization_config=quantization_config)

This is a generic ask that should work for pretty much all models, although primary use case is with the most popular models such as SD35 and FLUX.1

@yiyixuxu @sayakpaul @DN6 @asomoza

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-12-21T01:43:24Z

Yeah this is planned. I thought we had created an issue for it to track, but clearly, it had slipped through the cracks.

We should also have something like exclude_modules to let the users specify the names of the models to not quantize (typically the CLIP text encoder, VAE, or any model that doesn't have too many linear layers to benefit from the classic quantization techniques).

vladmandic · 2024-12-21T04:04:16Z

We should also have something like exclude_modules to let the users specify the names of the models to not quant

Yup! And it can have a default value with exactly the ones you've mentioned.

sayakpaul added the quantization label Dec 21, 2024

sayakpaul assigned sayakpaul and SunMarc and unassigned sayakpaul Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply applicable `quantization_config` to model components when loading a model #10327

Apply applicable `quantization_config` to model components when loading a model #10327

vladmandic commented Dec 20, 2024 •

edited

Loading

sayakpaul commented Dec 21, 2024

vladmandic commented Dec 21, 2024

Apply applicable quantization_config to model components when loading a model #10327

Apply applicable quantization_config to model components when loading a model #10327

Comments

vladmandic commented Dec 20, 2024 • edited Loading

sayakpaul commented Dec 21, 2024

vladmandic commented Dec 21, 2024

Apply applicable `quantization_config` to model components when loading a model #10327

Apply applicable `quantization_config` to model components when loading a model #10327

vladmandic commented Dec 20, 2024 •

edited

Loading