Replies: 6 comments 3 replies
-
It's not a bug, though. Have you tried using the memory optimization techniques mentioned in the docs? |
Beta Was this translation helpful? Give feedback.
-
Hey @sayakpaul . In the 0.30.3 version, the memory consumption does not grow together with temporal dimension, so it's serialised on that axis. On the screenshots I provided above that was 52.8G for inputs on 5 or 13 temporal steps. With the latest changes, that is no longer the case and the memory consumption is not just increased, but is now dependent on the temporal axis as it goes from 56.4G to a whopping 72.1G with the same inputs as before. I understand that My main point remains: in 0.30.3 version memory is O(1) in temporal input dimension. In 0.31.0 memory consumption becomes O(n) in temporal dimension. If it's not a bug, than its a very serious regression that should be explicitly indicated that VAE does not scale any more with input sizes. |
Beta Was this translation helpful? Give feedback.
-
@sayakpaul Hello, is there an update on this? I'd like to reopen this as an issue because this memory consumption is not sustainable even for H100 usage and using cpu for offloading negates the benefits of using GPUs |
Beta Was this translation helpful? Give feedback.
-
Will let @a-r-r-o-w and @DN6 comment here. It will be helpful if you provide actual code snippets instead of screenshots so that the team members could try it out. |
Beta Was this translation helpful? Give feedback.
-
@sayakpaul @a-r-r-o-w @DN6 actual code snippet is provided in the |
Beta Was this translation helpful? Give feedback.
-
thanks for the issue! |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
The memory consumption for CogVideoX decoder in diffusers 0.31.0 version consumes significantly more memory. To the point where the model goes OOM even on 80G H100 GPUs with a relatively modest frame count.
I include two profiles for very small input tensors of only 5 frames where its visible how much larger the VAE memory consumption is.
Memory footprints for different input sizes are shown below. As you can see, with latest version memory keeps growing with frame count.
Reproduction
Run
CogVideoXDecoder3D
model with diffusers 0.30.3 and 0.31.0 on the inputs of the same shape and measure the memory consumption as the frame count increases.Logs
No response
System Info
Python 3.11.
Diffusers 0.30.3 vs 0.31.0
Who can help?
@sayakpaul @DN6 @yiyixuxu
Beta Was this translation helpful? Give feedback.
All reactions