Vulkan: memory management issue with ggml update #481

stduhpf · 2024-11-25T17:57:45Z

When available VRAM becomes low, it looks like the Vulkan backend now allocates compute buffer on the shared memory, which causes very significant slowdowns, even if there is actually enough VRAM available. The older version of GGML used before c3eeb66 didn't have this issue.
I've had no luck finding the commit that introduced this behavior in ggml so far.

Example when generating a 896 x 896 image with Flux Schnell Q3_k, idle VRAM usage of 1.2 GB (Chrome and vsCode are opened in the background)

	current	reverting `c3eeb66`
Taskmgr screenshot
s/it	147.07	23.34

Relevant logs (identical between the two runs):

[INFO ] stable-diffusion.cpp:514  - total params memory size = 7658.71MB (VRAM 4978.14MB, RAM 2680.56MB): clip 2680.56MB(RAM), unet 4883.57MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:518  - loading model from '' completed, taking 5.92s
[INFO ] stable-diffusion.cpp:535  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:589  - finished loaded file
[DEBUG] stable-diffusion.cpp:1463 - txt2img 896x896
[DEBUG] stable-diffusion.cpp:1193 - prompt after extract and remove lora: "a lovely cat holding a sign says 'flux.cpp'"
[INFO ] stable-diffusion.cpp:672  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1198 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1027 - parse 'a lovely cat holding a sign says 'flux.cpp'' to [['a lovely cat holding a sign says 'flux.cpp'', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:397  - token length: 256
[DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - clip compute buffer size: 1.40 MB(RAM)
[DEBUG] clip.hpp:736  - Missing text_projection matrix, assuming identity...
[DEBUG] ggml_extend.hpp:1026 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1142 - computing condition graph completed, taking 3158 ms
[INFO ] stable-diffusion.cpp:1331 - get_learned_condition completed, taking 3162 ms
[INFO ] stable-diffusion.cpp:1354 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1358 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1026 - flux compute buffer size: 1715.46 MB(VRAM)

The text was updated successfully, but these errors were encountered:

Green-Sky · 2024-11-25T19:31:03Z

(dumb question if you already know this, but are you using git bisect ?)

stduhpf · 2024-11-25T20:44:31Z

(dumb question if you already know this, but are you using git bisect ?)

I tried, but with the API changes it was annoying to try and fix things at every bisect step. I also tried reverting Vulkan related commits one by one, but I couldn't identify the culprit easily this way either.

Green-Sky · 2024-11-26T11:03:56Z

(dumb question if you already know this, but are you using git bisect ?)

I tried, but with the API changes it was annoying to try and fix things at every bisect step. I also tried reverting Vulkan related commits one by one, but I couldn't identify the culprit easily this way either.

Good. Yea its annoying to also change sd.cpp code. But it still works. :)

update the commit range if you know it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan: memory management issue with ggml update #481

Vulkan: memory management issue with ggml update #481

stduhpf commented Nov 25, 2024 •

edited

Loading

Green-Sky commented Nov 25, 2024

stduhpf commented Nov 25, 2024

Green-Sky commented Nov 26, 2024

Vulkan: memory management issue with ggml update #481

Vulkan: memory management issue with ggml update #481

Comments

stduhpf commented Nov 25, 2024 • edited Loading

Example when generating a 896 x 896 image with Flux Schnell Q3_k, idle VRAM usage of 1.2 GB (Chrome and vsCode are opened in the background)

Green-Sky commented Nov 25, 2024

stduhpf commented Nov 25, 2024

Green-Sky commented Nov 26, 2024

stduhpf commented Nov 25, 2024 •

edited

Loading