Sana activations explode / clamping issue #10336

Nerogar · 2024-12-21T17:50:48Z

Describe the bug

I'm using the pretrained weights from Efficient-Large-Model/Sana_1600M_1024px_diffusers. I don't know if this is an issue with these weights, or if the implementation is broken.

Things I've observed so far:

using fp16 calculations usually generates good enough results
setting everything to fp32 (weights and autocast contexts) completely breaks the output

The attention output here is very different between fp16 the fp32 version.

The hidden_states are in the +/-5*10^5 range here (sometimes even higher, I've seen values as high as 1.3*10^6).
Using fp16 calculations, they become inf, which is clamped down to (-65504, 65504) (or about 6*10^4, more than an order of magnitude less). Using fp32 calculations, this clamping is not done, which means the output of that attention block is also different.

Enabling this clamping even for fp32 calculations fixes the issue, but this seems like a hack. That clamping operation looks like a safeguard, not like an essential part of the attention calculations. Adding print(f"hidden_states: {hidden_states}") just before and after the clamping operation shows the issue pretty well. You can see

Here are some examples (all using the same prompt/seed/cfg/sampler/etc.)

import torch
from diffusers import SanaPipeline

if __name__ == '__main__':
    generator = torch.Generator(device="cuda")
    generator.manual_seed(42)

    pipe = SanaPipeline.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers")
    pipe.to("cuda")
    pipe.text_encoder.to(torch.bfloat16)
    pipe.transformer = pipe.transformer.to(torch.float32) # <--- change the dtype here

    image = pipe(
        prompt='a water color painting of a bear',
        complex_human_instruction=None,
        generator=generator,
    )[0]
    image[0].save("debug/output.png")

fp16 weights (with clamping)

fp32 weights (without clamping)

fp32 weights (with clamping)

(tagging @lawrence-cj as the original author)

Reproduction

import torch
from diffusers import SanaPipeline

if __name__ == '__main__':
    generator = torch.Generator(device="cuda")
    generator.manual_seed(42)

    pipe = SanaPipeline.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers")
    pipe.to("cuda")
    pipe.text_encoder.to(torch.bfloat16)
    pipe.transformer = pipe.transformer.to(torch.float32) # <--- change the dtype here

    image = pipe(
        prompt='a water color painting of a bear',
        complex_human_instruction=None,
        generator=generator,
    )[0]
    image[0].save("debug/output.png")

Logs

No response

System Info

🤗 Diffusers version: 0.32.0.dev0
Platform: Windows-10-10.0.22631-SP0
Running on Google Colab?: No
Python version: 3.10.8
PyTorch version (GPU?): 2.5.1+cu124 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.26.2
Transformers version: 4.47.0
Accelerate version: 1.0.1
PEFT version: not installed
Bitsandbytes version: 0.44.1
Safetensors version: 0.4.5
xFormers version: 0.0.28.post3
Accelerator: NVIDIA RTX A5000, 24564 MiB
Using GPU in script?: CUDA / NVIDIA RTX A5000
Using distributed or parallel set-up in script?: No

Who can help?

No response

The text was updated successfully, but these errors were encountered:

Nerogar · 2024-12-21T19:21:04Z

Update: switching to the Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers weights instead seems to have fixed this issue. So I guess it's probably not a bug in the implementation, but instead with the model conversion process.

vladmandic · 2024-12-21T20:35:18Z

i went through the same rabbithole, see #10241 for details

Nerogar · 2024-12-21T23:27:38Z

Yes, I saw that. Wasn't sure if I should only add a comment to that issue or create a new one. Decided to create a new one since the other one was closed already. This definitely seems like something that's not intended. Even if we use Efficient-Large-Model/Sana_1600M_1024px_diffusers with fp16 weights, the result is worse than the bf16 version.

Here is another comparison.

bf16 weights using Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers:

fp16 weights using Efficient-Large-Model/Sana_1600M_1024px_diffusers:

Prompt was "a water color painting of a bear", and as you can see, the bf16 version looks a lot more like actual water color. I've described why this happens in my initial post. Something about the self attention is broken in the fp16 version, which means the model can't properly produce details and styles.

I can kind of understand that converting a fp16 model to bf16 doesn't always work. Those data types aren't really compatible. But upcasting to fp32 should never reduce quality. That's an obvious sign of a problem. And in any case, the weights of Efficient-Large-Model/Sana_1600M_1024px_diffusers are stored in fp32 format, so there isn't even any conversion going on.

vladmandic · 2024-12-22T02:50:44Z

good job finding exact spot.
cc @lawrence-cj if he can take a look at that?

lawrence-cj · 2024-12-22T07:15:39Z

I can kind of understand that converting a fp16 model to bf16 doesn't always work. Those data types aren't really compatible. But upcasting to fp32 should never reduce quality. That's an obvious sign of a problem. And in any case, the weights of Efficient-Large-Model/Sana_1600M_1024px_diffusers are stored in fp32 format, so there isn't even any conversion going on

This problem is due to the fact that we add the value clamping during training with mix_precesion(here), actually the model never saw value out of the scope of (-65504, 65504), so when you try FP32 or BF16 to inference using FP16 trained model, the value of self-attention output will not be clamped(refer to here) and that's why it won't give you the desired results. We provide the FP32 model only for reference, in case of someone need it for fine-tuning or something. If this makes any confusing, then should we just remove the FP32 version of safetensors in our FP16-trained models?

Cc: @vladmandic @Nerogar

lawrence-cj · 2024-12-22T07:17:23Z

Prompt was "a water color painting of a bear", and as you can see, the bf16 version looks a lot more like actual water color.

I don't think this is caused by the precision, at least, I don't have a provement for it. If you have any insight, please let me know. I'm curious about it. @Nerogar

Nerogar · 2024-12-22T11:01:48Z

To be honest, I don't really see the point in having the fp16 weights at all. If I load Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers and convert to fp16 on the fly, I get the exact same result compared to bf16 weights.

To me it looks like those weights are just broken and there is no point in using them.

lawrence-cj · 2024-12-22T13:43:33Z

We set the BF16 to the default checkpoint and the original fp16 models will serve as a reference, in case someone need to compare.

Nerogar added the bug Something isn't working label Dec 21, 2024

vladmandic mentioned this issue Dec 21, 2024

FLUX IPAdapter fails when transformers are quantized #10337

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sana activations explode / clamping issue #10336

Sana activations explode / clamping issue #10336

Nerogar commented Dec 21, 2024

Nerogar commented Dec 21, 2024

vladmandic commented Dec 21, 2024

Nerogar commented Dec 21, 2024

vladmandic commented Dec 22, 2024

lawrence-cj commented Dec 22, 2024 •

edited

Loading

lawrence-cj commented Dec 22, 2024 •

edited

Loading

Nerogar commented Dec 22, 2024

lawrence-cj commented Dec 22, 2024

Sana activations explode / clamping issue #10336

Sana activations explode / clamping issue #10336

Comments

Nerogar commented Dec 21, 2024

Describe the bug

Reproduction

Logs

System Info

Who can help?

Nerogar commented Dec 21, 2024

vladmandic commented Dec 21, 2024

Nerogar commented Dec 21, 2024

vladmandic commented Dec 22, 2024

lawrence-cj commented Dec 22, 2024 • edited Loading

lawrence-cj commented Dec 22, 2024 • edited Loading

Nerogar commented Dec 22, 2024

lawrence-cj commented Dec 22, 2024

lawrence-cj commented Dec 22, 2024 •

edited

Loading

lawrence-cj commented Dec 22, 2024 •

edited

Loading