[Flux] Support for Nunchaku and SVDQuant ? #2516

Giribot · 2024-12-29T09:55:31Z

Giribot
Dec 29, 2024

Hello !

Nunchaku is a new inference engine designed for 4-bit diffusion models, as demonstrated in SVDQuant.

SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.

Here:
https://github.com/mit-han-lab/nunchaku

Thanks !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flux] Support for Nunchaku and SVDQuant ? #2516

{{title}}

Replies: 0 comments

Select a reply

[Flux] Support for Nunchaku and SVDQuant ? #2516

Giribot Dec 29, 2024

Replies: 0 comments

Giribot
Dec 29, 2024