-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower Hunyuan Video LoRA memory requirements #135
Comments
What are the memory requirements for Hunyuan currently? I'm OOMing with 48gb |
Could you give #129 a try? I believe with FP8 it should fit in 24 gb based on rough calculations, but will continue to try and improve |
Sadly I still OOM even after precompiling the conditions and latents |
Just to confirm, are you using the bash script from README or a custom launch script? And are you sure |
Yeah, using the bash script from the readme. --gradient_checkpointing and --precompute_conditions are both being passed. |
Hi, I also try the bash in your README.md and load the CKPT you provide in https://huggingface.co/hunyuanvideo-community/HunyuanVideo. But I get OOM even in an 80 GiB H800 when loading the HunyuanVideo transformer, before training. And my training device is 1/2 H800 |
Have same OOM problem with --precompute_conditions and --gradient_checkpointing form README script on A100 |
I'm unable to replicate unfortunately. I just verified once again that I can run training in about 42 GB of memory when precomputation and gradient checkpointing is enabled with
Can you also try running training with resolution buckets set as |
Setting the bucket size to 1x512x12 still OOMs.
|
I see. I'll give pytorch 2.4 a try and profile it tomorrow. Could you try upgrading to pytorch 2.5.1 and see if it does away, or the nightly 2.6.0? |
Also, on 2.4, could you first check if the inference results in a normal video or a black video with the example code here: https://huggingface.co/docs/diffusers/en/api/pipelines/hunyuan_video There have been reports of it not working and I suspect it's something to do with the torch version. If the inference is not working, there's a slim chance training would work well The example doesn't mention it, but if you're facing OOM for inference, |
I was able to run training with accelerate_configs/uncompiled_1.yaml config but during training the loss was nan. The output of this lora model after training is black screen. Can you explain what is the difference between these configs please? Pytorch 2.4.0 |
@generalsvr It seems like 2.4 might be a problematic torch version for some operations. I'm going through the relevant commits in pytorch to try and see what exactly causes this, but I believe upgrading to 2.5.1 will fix the nan loss. Could you give that a try? |
The configs are simply some rules that tell |
After updating pytorch to 2.5.1 on the same machine original hunyuan model started to generate videos. But training is still a problem. I can see loss appeared once on step 1, but then again nan. Video generation with lora resulting in a black screen. Training run 1 log: Training steps: 0%| | 0/20 [00:00<?, ?it/s]12/24/2024 16:47:37 - DEBUG - finetrainers - Starting epoch (1/1) Training run 2 log: Training steps: 0%| | 0/20 [00:00<?, ?it/s]12/24/2024 17:10:49 - DEBUG - finetrainers - Starting epoch (1/1) |
It should be possible to leverage fp8 casted models, or torchao quantization, to support training in under 24 GB upto a reasonable resolution. Or atleast that's the hope when combined with precomputation from #129. Will take a look soon 🤗
TorchAO docs: https://huggingface.co/docs/diffusers/main/en/quantization/torchao
FP8 casting: huggingface/diffusers#10347
The text was updated successfully, but these errors were encountered: