Suggestion to devs how to get rid of Nan exception errors that are very annoying and common #12292
Replies: 6 comments 4 replies
-
Works on the dev branch. |
Beta Was this translation helpful? Give feedback.
-
where did the Nan happen? and what you mean by switch model after Nan |
Beta Was this translation helpful? Give feedback.
-
I'm seeing this issue too in UNet, using a41 in a server setting (no UI). |
Beta Was this translation helpful? Give feedback.
-
For me --no-half doesn't solve it, or at least not reliably, neither the automatic revert (I suppose it's the same, just switching on in case of an error). What seems to work is using a custom VAE |
Beta Was this translation helpful? Give feedback.
-
Adding --disable-nan-check to the command arguments fixed the issue for me. Before using that I would get it rapidly but now it never happens anymore. I'm pretty sure the vae itself causes this because before using it the air will occur with certain vaes and almost never happened with others. |
Beta Was this translation helpful? Give feedback.
-
Not true. I get black images from the very first generation, assuming I use --disable-nan-check, which I'm removing, since I still get black images literally every time, so it might as well give me errors. My A1111: v1.5.2 Clearly, this is not something that's is always caused by a specific thing. For me, the issues started when I installed new Nvidia drivers and CUDA 12.3, after I started getting Cublas64.dll errors. And even though I re-installed different drivers and CUDA (including the original versions), I get NaNs every time in A1111, although it doesn't happen on the same system in Vladmandic's automatic. |
Beta Was this translation helpful? Give feedback.
-
How i solved it is i copied my model so i have 2 copies of it just renamed v1 an v2.... then i just switch back and forth between them when i get Nan which is fast cause theyre in VRAM.
Best result i get when im get exception on model1 then i switch to model 2 and then i switch back to model 1 again, this way its flushed so i have lot of gens before it will come back with exception again.
Disable xformers, dont use them, they prevent this fix from working, but optimisations in settings have same speeds as xformers so you should be fine.
WHen i get Nan exception i switch to v2, when i get Nan again i swtich to v1, this way Nan eception is not an issue anymore ... ever.
SO... I dont want to be a smartass but obviously Nan exception never happens when you generate with a model after you just loaded it...
How about adding a code that will reflush current model before generating, so i dont have to keep 2 copies of the same model and keep oading them manually back and forth just to move on without Nan exception ?
A code that would automate it and would not require copy of current model.
Its not solving the core problem but makes the issue bearable, so whatever webui does when loading model from VRam ( i have it set so i keep 1 model in vram) would be nice to apply to the model once Nan exception is detected.
Its very annoying issue but this simple swtich of models from VRAM is making it go away.
Whad do You guys think? Is it doable ?
Id do it myself if id find a part of the code thats responsible for reloading model from VRAM when you choose it from the list but i am not that smart with coding, maybe with the help of bing chat ... but probaly only as a switch to reflush, but id like it to be automatic when Nan exception is detected .
simple pull request with extra code that does model reflush like during swapping should solve this crappy error for good .
Beta Was this translation helpful? Give feedback.
All reactions