Huge VRAM usage with start/stop #154

slashedstar · 2024-03-08T11:55:25Z

Is this expected when using the start/stop? I was getting OOM errors and had to change a setting in the nvidia CP to allow the fallback to RAM, which means when I use this the it/s drop by a lot, I go from 6 it/s to 1.5it/s after the lora is stopped/started by the extension (I'm on Forge, b9705c58f66c6fd2c4a0168b26c5cf1fa6c0dde3)

hako-mikan · 2024-04-05T14:51:20Z

Does this issue also occur with the latest version of Forge?
I tested it and did not encounter any problems.

slashedstar · 2024-04-08T02:19:31Z

Brand new installation, just git cloned, started and installed the extension, the OOM happens with SDXL but not with 1.5, though its still able to complete the image.

With start=10

Moving model(s) has taken 1.59 seconds
 55%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                                                         | 11/20 [00:01<00:01,  6.03it/s]ERROR diffusion_model.output_blocks.0.1.transformer_blocks.2.ff.net.0.proj.weight CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.06 GiB is allocated by PyTorch, and 218.22 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR diffusion_model.output_blocks.0.1.transformer_blocks.3.ff.net.0.proj.weight CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.13 GiB is allocated by PyTorch, and 177.43 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
*** Error executing callback cfg_denoiser_callback for E:\blankforge\stable-diffusion-webui-forge\extensions\sd-webui-lora-block-weight\scripts\lora_block_weight.py
    Traceback (most recent call last):
      File "E:\blankforge\stable-diffusion-webui-forge\modules\script_callbacks.py", line 233, in cfg_denoiser_callback
        c.callback(params)
      File "E:\blankforge\stable-diffusion-webui-forge\extensions\sd-webui-lora-block-weight\scripts\lora_block_weight.py", line 455, in denoiser_callback
        shared.sd_model.forge_objects.unet.patch_model()
      File "E:\blankforge\stable-diffusion-webui-forge\ldm_patched\modules\model_patcher.py", line 216, in patch_model
        out_weight = self.calculate_weight(self.patches[key], temp_weight, key).to(weight.dtype)
    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.13 GiB is allocated by PyTorch, and 177.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

---
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.48it/s]
To load target model AutoencoderKL██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  5.41it/s]
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  1908.81689453125
[Memory Management] Model Memory (MB) =  159.55708122253418
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  725.2598133087158
Moving model(s) has taken 0.11 seconds
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.11it/s]
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  5.41it/s]

(this was to generate a single 512x512)

PANyZHAL · 2024-04-08T20:57:26Z

same problem on version: f0.0.17v1.8.0rc-latest-276-g29be1da7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge VRAM usage with start/stop #154

Huge VRAM usage with start/stop #154

slashedstar commented Mar 8, 2024

hako-mikan commented Apr 5, 2024

slashedstar commented Apr 8, 2024

PANyZHAL commented Apr 8, 2024

Huge VRAM usage with start/stop #154

Huge VRAM usage with start/stop #154

Comments

slashedstar commented Mar 8, 2024

hako-mikan commented Apr 5, 2024

slashedstar commented Apr 8, 2024

PANyZHAL commented Apr 8, 2024