Suggestion to devs how to get rid of Nan exception errors that are very annoying and common #12292

2blackbar · 2023-08-03T22:38:19Z

2blackbar
Aug 3, 2023

How i solved it is i copied my model so i have 2 copies of it just renamed v1 an v2.... then i just switch back and forth between them when i get Nan which is fast cause theyre in VRAM.
Best result i get when im get exception on model1 then i switch to model 2 and then i switch back to model 1 again, this way its flushed so i have lot of gens before it will come back with exception again.

Disable xformers, dont use them, they prevent this fix from working, but optimisations in settings have same speeds as xformers so you should be fine.
WHen i get Nan exception i switch to v2, when i get Nan again i swtich to v1, this way Nan eception is not an issue anymore ... ever.

SO... I dont want to be a smartass but obviously Nan exception never happens when you generate with a model after you just loaded it...
How about adding a code that will reflush current model before generating, so i dont have to keep 2 copies of the same model and keep oading them manually back and forth just to move on without Nan exception ?
A code that would automate it and would not require copy of current model.

Its not solving the core problem but makes the issue bearable, so whatever webui does when loading model from VRam ( i have it set so i keep 1 model in vram) would be nice to apply to the model once Nan exception is detected.

Its very annoying issue but this simple swtich of models from VRAM is making it go away.
Whad do You guys think? Is it doable ?
Id do it myself if id find a part of the code thats responsible for reloading model from VRAM when you choose it from the list but i am not that smart with coding, maybe with the help of bing chat ... but probaly only as a switch to reflush, but id like it to be automatic when Nan exception is detected .
simple pull request with extra code that does model reflush like during swapping should solve this crappy error for good .

freecoderwaifu · 2023-08-03T23:04:17Z

freecoderwaifu
Aug 3, 2023

23c947a

Works on the dev branch.

0 replies

w-e-w · 2023-08-04T05:35:32Z

w-e-w
Aug 4, 2023
Collaborator

where did the Nan happen?
VAE or U-net

and what you mean by switch model after Nan
you mean like when you switch models and run the generation again with the same seed and all other parameters are the beeing the same ?

2 replies

2blackbar Aug 4, 2023
Author

Parameters are up to you, when you get Nan exception just pick other model from list on top above to prompt let it load, and then switch back to your main model again and you can regenarate image again if you lock seed and really want it back , this way exception wont happen for a long time, like hours

w-e-w Aug 4, 2023
Collaborator

Nan in VAE or U-net

pablo-mayrgundter · 2023-08-15T17:32:14Z

pablo-mayrgundter
Aug 15, 2023

I'm seeing this issue too in UNet, using a41 in a server setting (no UI).

0 replies

rkfg · 2023-08-15T17:43:34Z

rkfg
Aug 15, 2023

For me --no-half doesn't solve it, or at least not reliably, neither the automatic revert (I suppose it's the same, just switching on in case of an error). What seems to work is using a custom VAE kl-f8-anime2.vae.pt that I put near the model using a hard link to not waste disk space. Perhaps the VAE in the model is somewhat broken so replacing it with this one helps. Also, doesn't happen with every model, mostly the anime ones and even then not with every single one of them.

1 reply

DarthMov Nov 3, 2023

Adding --disable-nan-check to the command arguments fixed the issue for me. Before using that I would get it rapidly but now it never happens anymore. I'm pretty sure the vae itself causes this because before using it the air will occur with certain vaes and almost never happened with others.

DarthMov · 2023-11-03T00:38:03Z

DarthMov
Nov 3, 2023

Adding --disable-nan-check to the command arguments fixed the issue for me. Before using that I would get it rapidly but now it never happens anymore. I'm pretty sure the vae itself causes this because before using it the air will occur with certain vaes and almost never happened with others.

1 reply

freecoderwaifu Nov 3, 2023

The NAI VAE is the root cause of most of these issues yeah, it being distributed under a bunch of different names despite being the exact same file doesn't help either, and also the VAE itself being baked into some checkpoints makes things worse. Merging it with other VAEs or converting it to FP16 also doesn't help. It's a bug with the VAE itself and there are much better VAEs anyway.

Not like it matters really but still kinda funny, it's a bit like certain anti piracy in games, by reporting on its NaN errors people sort of out that they're using at least part of the leak.

Vendaciousness · 2023-11-04T21:40:27Z

Vendaciousness
Nov 4, 2023

SO... I dont want to be a smartass but obviously Nan exception never happens when you generate with a model after you just loaded it...

Not true. I get black images from the very first generation, assuming I use --disable-nan-check, which I'm removing, since I still get black images literally every time, so it might as well give me errors.

My A1111: v1.5.2
Torch: 2.01

Clearly, this is not something that's is always caused by a specific thing. For me, the issues started when I installed new Nvidia drivers and CUDA 12.3, after I started getting Cublas64.dll errors. And even though I re-installed different drivers and CUDA (including the original versions), I get NaNs every time in A1111, although it doesn't happen on the same system in Vladmandic's automatic.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion to devs how to get rid of Nan exception errors that are very annoying and common #12292

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Suggestion to devs how to get rid of Nan exception errors that are very annoying and common #12292

Replies: 6 comments · 4 replies

w-e-w Aug 4, 2023 Collaborator

2blackbar Aug 4, 2023 Author

w-e-w Aug 4, 2023 Collaborator

Replies: 6 comments 4 replies

w-e-w
Aug 4, 2023
Collaborator

2blackbar Aug 4, 2023
Author

w-e-w Aug 4, 2023
Collaborator