CUDA out of Memory max_split_size_mb ERROR (Creating smaller batch sizes when working with CU files or GPU) #4931

Rashid-Hafez · 2022-11-21T14:45:23Z

Rashid-Hafez
Nov 21, 2022

I'm sure my error is a niche error however it is prevalent one at that. My issue is that, the GPU I'm using has only 4gb of memory and when working with stable diffusion there is a function that uses PyTorch and it sends the information from the system to the GPU if that function sends information that is larger than 150mb, the GPU can't receive it.

The only time this error occurs is when using the Generate function in the Img2Img program.

I have set the environment variable as this " set 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256' "

I have set it at half of 512 in order to keep any matrix math the same and not fragment the data in the code.

However I'm still getting this issue.

I believe it is either a global variable in a PyTorch file, which I have no idea where to find, or it can be changed from a Stable Diffusion variable, maybe by checking the size of an array before sending it through a PyTorch function (I wish I knew which functions were called).

(If array.size > GPU.get_max_memory_size() / 4 ) then divide the array by 2. something like that.

Anyway, if anyone has a workaround, or know where i can change these variables in the code please let me know.

cooperdk · 2022-11-26T23:08:52Z

cooperdk
Nov 26, 2022

Set it to 128

0 replies

dilectiogames · 2023-02-25T16:21:55Z

dilectiogames
Feb 25, 2023

anyone figured this out?

where does one set this environment variable in user or system variables?

6 replies

cooperdk Feb 25, 2023

I set it as this:

set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:512

So you set it at 64 or 128, but be aware that it will be at the expense of speed.
The garbage collector part is important as that will define when the splitting will occur. This setting (with 256 and not 512) worked well on my old 1660 Ti card.

You should of course also use xformers with only 4 gigabytes of vram.

In the commandline_args, to conserve vram, also set:
--precision full --no-half --lowvram --always-batch-cond-uncond

dilectiogames Feb 25, 2023

Yes! Thank You!!!

without set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:512

I can use R-ESRGAN 4x+ to resize from 512x512 to max 1177x1177 (and this may fail most of the time)

with that line on

I can use R-ESRGAN 4x+ to resize from 512x512 to 1356x1356

incredible

I didn't touched yet the other prompts, I will try them see if they can improve this even further, but the fact that I've created a bit of buffer is a huge win.

dilectiogames Mar 20, 2023

I get the CUDA error again after one or two time of using controlnet. do you knwo how can I flush the CUDA memory?

i found this on net

'torch.cuda.empty_cache()'

but have no idea where to use it or how to add it to automatic1111 so I can run it.

I keep getting that torch doesn't exit, pytorch doesn't exits

so everything works for a couple of time and then it doesn't, I'm assuming the memory is not freed correctly but have no idea how to clear it.

Nacurutu Mar 20, 2023

May i ask, whats the difference between garbage_collection_threshold:0.6 and garbage_collection_threshold:0.9

I have mine set at 0.9

set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512

I have a 1660ti 6 gb Vram

guilhermehge Oct 19, 2023

From the documentation, @Nacurutu

"garbage_collection_threshold helps actively reclaiming unused GPU memory to avoid triggering expensive sync-and-reclaim-all operation (release_cached_blocks), which can be unfavorable to latency-critical GPU applications (e.g., servers). Upon setting this threshold (e.g., 0.8), the allocator will start reclaiming GPU memory blocks if the GPU memory capacity usage exceeds the threshold (i.e., 80% of the total memory allocated to the GPU application). The algorithm prefers to free old & unused blocks first to avoid freeing blocks that are actively being reused. The threshold value should be between greater than 0.0 and less than 1.0. garbage_collection_threshold is only meaningful with backend:native. With backend:cudaMallocAsync, garbage_collection_threshold is ignored."

https://pytorch.org/docs/stable/notes/cuda.html

dilectiogames · 2023-05-05T14:01:24Z

dilectiogames
May 5, 2023

using

set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.8,max_split_size_mb:64

and the method here

https://www.youtube.com/watch?v=EmA0RwWv-os

fixed the CUDA out of memory block (most of the time - you can occasionally get the VAE NAN error but trying a few times it passes trough in the end)

I can now render big images above around 1350px that was my max limit. that is actually amazing what the community managed to do

3 replies

cooperdk May 5, 2023

64 seems low and will slow down your calculation dramatically. Did you check with 256 and 128? In general, the guides say 512 but I used 128 with my old 6 gb gpu.

dilectiogames May 5, 2023

you are right, I've set it 64 to squeeze a bit more pixels as max render.

the way I've understood this too work is this,

lets say you have 6Gb like me.

webui and stable diffusion and whatever have reserved 4Gb (to simplify is a bit more I will get back to this point at the end of the comment)

so is 4Gb but from this 4Gb only 800mb can be used when render. So depending on how big the image is the app will break it in pieces. the 56, 128, 256, 512 from the command line.

lets say your command prompt is set to 512.

and the image to be created needs 600Mb

the app will create two chunks one for 512 and another for 512

(but in reality it needed only 88Mb from those 600. (600-512)

but because the chunks needs to be 512 it will put the 88Mb in that big chunk and the rest is probably empty space

Bigger chunks faster render.

So now there are two 512 chunks and only 800Mb of available memory. When it pushes the two chunks to the GPU you get Cuda out memory because you have pushed 1Gb

Did this makes sense?

so by reducing the chunks size you can create more chunks and cut in smaller pieces the image so you can fit it in those 800Mb of available memory.

set to 256 you will need three chunks, you push 768, it fits. then pray God it doesn't need any other memory available

set to 128 , you need 5 pieces and you push 640Mb into 800Mb

Thank You for the comment, I will try to use 128 or 256, in theory it should work fine with that extension because it renders multiple images of 512x512

Now the real question is why it doesn't uses the full 6Gb of videoram. why only 4 Gb and something? I can't seem to find an option or command line to increase this usage.

This system doesn't need the vram because there are two GPUs, it can use the Intel GPU just fine, most of the time the nvidia chip stay idle, it doesn't do nothing.

Apps like Photoshop lets you set the amount of VRAM you can let for other apps.

Anyway all this is for the old version. Now I'm seeing that automatic can be updated to pytorch 2 buuuut I don't know if that is better or will break the installation.

cooperdk May 5, 2023

You're right in the assumption as to how it reserves memory, according to my understanding.

Re SD 2.0, don't worry, you can just reinstall from requirements. I have it working and it is much faster (on 3060 OC). I had to build my own xformers though (and I chose to upgrade CUDA to 11.8)

Maybe these are what you read, but I refer to them anyway: #6932 (comment) #5303 (reply in thread)
(Disregard my last line with triton - it is not needed)

bmarcelov · 2023-05-07T20:48:32Z

bmarcelov
May 7, 2023

Hi maybe this can help i was playing with the parameters, my video card have only 4gb and I fix the problem with this

set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:512
set COMMANDLINE_ARGS=--precision full --no-half --lowvram --always-batch-cond-uncond --xformers

0 replies

mary4500 · 2023-05-07T22:57:19Z

mary4500
May 7, 2023

The system can not find the path specified.
AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of Memory max_split_size_mb ERROR (Creating smaller batch sizes when working with CU files or GPU) #4931

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

CUDA out of Memory max_split_size_mb ERROR (Creating smaller batch sizes when working with CU files or GPU) #4931

Replies: 5 comments · 9 replies

Replies: 5 comments 9 replies