You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When available VRAM becomes low, it looks like the Vulkan backend now allocates compute buffer on the shared memory, which causes very significant slowdowns, even if there is actually enough VRAM available. The older version of GGML used before c3eeb66 didn't have this issue.
I've had no luck finding the commit that introduced this behavior in ggml so far.
Example when generating a 896 x 896 image with Flux Schnell Q3_k, idle VRAM usage of 1.2 GB (Chrome and vsCode are opened in the background)
(dumb question if you already know this, but are you using git bisect ?)
I tried, but with the API changes it was annoying to try and fix things at every bisect step. I also tried reverting Vulkan related commits one by one, but I couldn't identify the culprit easily this way either.
(dumb question if you already know this, but are you using git bisect ?)
I tried, but with the API changes it was annoying to try and fix things at every bisect step. I also tried reverting Vulkan related commits one by one, but I couldn't identify the culprit easily this way either.
Good. Yea its annoying to also change sd.cpp code. But it still works. :)
When available VRAM becomes low, it looks like the Vulkan backend now allocates compute buffer on the shared memory, which causes very significant slowdowns, even if there is actually enough VRAM available. The older version of GGML used before c3eeb66 didn't have this issue.
I've had no luck finding the commit that introduced this behavior in ggml so far.
Example when generating a 896 x 896 image with Flux Schnell Q3_k, idle VRAM usage of 1.2 GB (Chrome and vsCode are opened in the background)
Relevant logs (identical between the two runs):
The text was updated successfully, but these errors were encountered: