Why padding="max_length" is needed for FLUX and SD3? #10177
ilya-lavrenov
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
From the code block
diffusers/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py
Lines 235 to 242 in 22d3a82
we can see that T5 tokenizer's output has
max_sequence_length
tokens where actual output is padded to be of that size.Why is it required? With actual number of tokenized tokens you could save T5 encoder inference time and then, Transformer model inference as well. According our experiments we have the following breakdown:
where we can see that actual number of tokens is 1.6x faster, while output image is slightly different, but it's still corresponds to text prompts
Seq len 64
Seq len 128
Seq len 256
Seq len 512
Beta Was this translation helpful? Give feedback.
All reactions