Add `optimum.quanto` as supported load-time `quantization_config` #10328

vladmandic · 2024-12-20T21:54:41Z

Recent additions to diffusers added BitsAndBytesConfig as well as TorchAoConfig options that can be used as quantization_config when loading model components using from_pretrained

for example:

quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)

ask is to also support Huggingface's own Optimum Quanto
right now its possible to use it, but only as post-load on-demand quantization, there is no option to use it like BnB or TorchAO to apply quantization automatically during load itself.

@yiyixuxu @sayakpaul @DN6 @asomoza

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-12-21T01:45:26Z

This should be relatively easy to add. @DN6 @a-r-r-o-w any of you would like to take it up? Since we have three new backends and a guide to add a new backend for quantization, I think this could be opened for community contributions, too.

a-r-r-o-w · 2024-12-21T11:04:53Z

I have a few things on my plate on the optimization and performance side that I would like to get out first in the next 1-2 weeks. Will be happy to look into it if not picked up by then 🔜

sayakpaul added the quantization label Dec 21, 2024

sayakpaul assigned DN6 Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `optimum.quanto` as supported load-time `quantization_config` #10328

Add `optimum.quanto` as supported load-time `quantization_config` #10328

vladmandic commented Dec 20, 2024

sayakpaul commented Dec 21, 2024

a-r-r-o-w commented Dec 21, 2024

Add optimum.quanto as supported load-time quantization_config #10328

Add optimum.quanto as supported load-time quantization_config #10328

Comments

vladmandic commented Dec 20, 2024

sayakpaul commented Dec 21, 2024

a-r-r-o-w commented Dec 21, 2024

Add `optimum.quanto` as supported load-time `quantization_config` #10328

Add `optimum.quanto` as supported load-time `quantization_config` #10328