Implement framewise encoding/decoding in LTX Video VAE #10333

a-r-r-o-w · 2024-12-21T11:00:29Z

Currently, we do not implement framewise encoding/decoding in the LTX Video VAE. This leads to an opportunity for reducing memory usage, which will be beneficial for both inference and training.

LoRA finetuning LTX Video on 49x512x768 videos can be done in under 6 GB if prompts and latents are pre-computed, but the pre-computation requires about 12 GB of memory because of the VAE encode/decode. This can be reduced by a considerable amount and lower the bar for entry into video model finetuning. Our friends with potatoes need you!

diffusers/src/diffusers/models/autoencoders/autoencoder_kl_ltx.py

Line 949 in d413881

raise NotImplementedError(

As always, contributions are welcome 🤗 Happy new year!

rootonchair · 2024-12-24T02:57:32Z

Hi @a-r-r-o-w, this is interesting and I would like to take it

a-r-r-o-w added enhancement New feature or request contributions-welcome labels Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement framewise encoding/decoding in LTX Video VAE #10333

Implement framewise encoding/decoding in LTX Video VAE #10333

a-r-r-o-w commented Dec 21, 2024

rootonchair commented Dec 24, 2024

Implement framewise encoding/decoding in LTX Video VAE #10333

Implement framewise encoding/decoding in LTX Video VAE #10333

Comments

a-r-r-o-w commented Dec 21, 2024

rootonchair commented Dec 24, 2024