PyTorch 2.0.0 is now GA #8691

aifartist · 2023-03-17T02:40:21Z

aifartist
Mar 17, 2023

PyTorch 2.0.0 is now GA in the last 24 hours and has the cuDNN v8.7 fix if you get the correct version of it.
In other words, no more file copying hacks.
However, there are two versions of 2.0.0.
torch==2.0.0+cu117 Still uses cuDNN 8.5
and
torch==2.0.0+cu118 Uses cuDNN 8.7
Also the default repo's for "pip install torch" only has the cu117 version.
Thus you need to use the extra url thing as follows:

pip3 install clean-fid numba numpy torch==2.0.0+cu118 torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu118

Danger danger... If you let A1111 install your xformers it will downgrade your pytorch.
I missed this when I first created this post because I always built xformers to install it on Linux. Thus I got an xformers compatible with torch 2.0.
Either don't use xformers because sdp seems to work just as well or you'll need to get a xformers for torch 2.0 and don't ask me where to download it from. I don't know.

https://pytorch.org/blog/pytorch-2.0-release/
@vladmandic

pamparamm · 2023-03-17T02:51:05Z

pamparamm
Mar 17, 2023

With torch 2.0 you don't need xformers library for inference. You can replace all arguments related to xformers with --opt-sdp-attention and get the same performance. You can also use --opt-sdp-no-mem-attention instead to get deterministic results with near the same performance.

3 replies

aifartist Mar 17, 2023
Author

With torch 2.0 you don't need xformers library for inference. You can replace all arguments related to xformers with --opt-sdp-attention and get the same performance. You can also use --opt-sdp-no-mem-attention instead to get deterministic results with near the same performance.

Yes, I know that sdp is in torch 2.0 but I thought that someone had only enabled sdp in their own fork of A1111. I remember this only a few days back when someone asked me to test it.

johndpope Mar 30, 2023

what's the fork?

Adzeiros May 7, 2023

With torch 2.0 you don't need xformers library for inference. You can replace all arguments related to xformers with --opt-sdp-attention and get the same performance. You can also use --opt-sdp-no-mem-attention instead to get deterministic results with near the same performance.

So I can put either --opt-sdp-no-mem-attention or --opt-sdp-attention into the commandline arguments of Automatic1111 and that would work the same or better than xformers? Just want to be sure I know it's working... How would I know it works?

Pedroman1 · 2023-03-17T04:51:00Z

Pedroman1
Mar 17, 2023

So are you saying I can just edit my webui user .bat file and add the following to increase my preformance?

pip3 instald==2.0.0+cu118 --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu118

Im sorry I am new to all these pythons and pips and gits I did once before mess with downloading CUDNN and some dlls etc. but I am pretty sure i messed up automatic completely doing that and had to start from scratch again. Right now it says I have 1.13 and cu117 and torchvision 0.14 and cu117

If anyone can do a step by step for those of us a little slower on this stuff to make it so I can squeeze some more outta this 4090 would be eternally grateful.

7 replies

Pedroman1 Mar 17, 2023

@missionfloyd
By deterministic you mean better basically? Is there ever much of a situation you wouldn't want the more deterministic of the two aside from speed?

thanks you guys for all the tips I'm gonna just reinstall it so itg installs every time since I don't have linux or I feel like maybe there is somewhere in the launch.py or something maybe i can dig around and add it somehow. But even if it has to install everytime i'll be pumped if it works

missionfloyd Mar 17, 2023
Collaborator

Xformers and SDP produce non-deterministic results, meaning there are slight variations between generations of the same seed. Deterministic means there isn't, and using the same seed and settings will produce identical results each time.

Pedroman1 Mar 17, 2023

Thats the best explanation thank you makes perfect sense now

mickelliu Mar 20, 2023

Thanks you guys. After upgrading to PyTorch 2.0 with SDP enabled I went from ~30 it/s to ~35 it/s, additionally without the need to swap cuDNN dll files.

barleyj21 May 2, 2023

Thanks! This really helped running oobabooga. And I'm sure it be hlpful when I comeback to Automatic1111)

This and then installing xformers 0.0.19 (they are available with pip right now)

UPD: no, apparently xformers aren't working. But --sdp-attention does

aifartist · 2023-03-17T06:23:16Z

aifartist
Mar 17, 2023
Author

Note to all. I've updated the initial post above with a warning about xformers and modified the pip install to get rid of some dependency warnings. I should create a feature request such that if a user has torch 2.0 installed and they use "--xformers" it won't install it unless it can find a torch 2.0 compatible version of it. The only way you can use both is to do a manual build of xformers. When I was installing the nightly build before the GA came out and before I knew about SDP I used:
MAX_JOBS=20 CUDA_PATH=/usr/local/cuda-11.8 pip3 install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

each time after I changed my torch version and wait about 5 minutes for it to build. I have no idea if this "build-via-pip-install" works on Windows.

1 reply

Sakura-Luna Mar 17, 2023
Collaborator

xformers is a thing of the past, now just point out in the wiki that sdp is its superlative replacement.

vladmandic · 2023-03-17T12:24:16Z

vladmandic
Mar 17, 2023
Collaborator

no need to manually specify numpy or numba.
yes, torch 2.0 requires numpy 1.24 instead of usual 1.23, but its handled internally
that is only needed when using --upgrade instead of --force (force means force install of dependencies as well)
best to include triton as well
re: CUDA_PATH - just IMO, this can lead to self-inflicted issues as a) you may forget to set it sometimes, b) not all libs interpret that env variable consistently. best to modify /etc/ld.so.conf.d/* and rerun ldconfig so it creates unified system bindings. check results with ldconfig -p
re: sdp vs xformers - i'd agree. although there are no pre-compiled xformers for torch 2.0 yet, there is no real need for them anymore.

at the end, its:

pip uninstall xformers
pip install torch torchaudio torchvision triton --force --extra-index-url https://download.pytorch.org/whl/cu118

and for launch flags, you can use either --opt-sdp-attention (usual) or --opt-sdp-no-mem-attention (which disables one part of sdp and makes it deterministic)

4 replies

bbecausereasonss Mar 18, 2023

Do I not have to update the launch.py?

bbecausereasonss Mar 18, 2023

ERROR: Could not find a version that satisfies the requirement triton (from versions: none)
ERROR: No matching distribution found for triton

namdoyle Mar 18, 2023

i can't use --opt-sdp-attention, the error message showing that: no such args. And then i search the launch.py ,there is no any "sdp" in it.

Soulreaver90 Mar 18, 2023

i can't use --opt-sdp-attention, the error message showing that: no such args. And then i search the launch.py ,there is no any "sdp" in it.

Make sure you are running a newer commit, the spd arg was added 2-3 weeks ago so it won’t work on older copies.

biship · 2023-03-17T15:52:14Z

biship
Mar 17, 2023

Torch 2.0 is slower for me. Numbers are from performance column of System Info extension.

arch:AMD64 cpu:Intel64 Family 6 Model 167 Stepping 1, GenuineIntel system:Windows 
release:Windows-10-10.0.22624-SP0 python:3.10.10

torch:1.13.1+cu117 autocast half xformers:0.0.17.dev466 accelerate:0.17.1 transformers:4.26.1
12.62 / 16.86 / 20.96 / 23.51 / 22.97
device:NVIDIA GeForce RTX 3090 (1) (compute_37) (8, 6) cuda:11.7 cudnn:8800 24GB
xformers none | v1-5-pruned.ckpt [e1441589a6]

torch:2.0.0+cu118 autocast half xformers:unavailable accelerate:0.17.1 transformers:4.27.1
13.52 / 14.53 / 17.68 / 18.73 / 18.68
device:NVIDIA GeForce RTX 3090 (1) (compute_37) (8, 6) cuda:11.8 cudnn:8800 24GB
sdp-no-mem none  | v1-5-pruned.ckpt [e1441589a6]

GPU hits max clock speeds and 100% utilization during the test.

4 replies

Sakura-Luna Mar 17, 2023
Collaborator

sdp-no-mem sacrifices some speed for determinism, use sdp if speed is all you care about.

Sakura-Luna Mar 17, 2023
Collaborator

--xformers cannot guarantee determinism.

$@fractal-fumbler$

fractal-fumbler Mar 17, 2023

check for --xformers-flash-attention... kinda determenistic

biship Mar 17, 2023

xformers isn't installed.

Michoko92 · 2023-03-17T16:06:09Z

Michoko92
Mar 17, 2023
Collaborator

Hi, I'm trying to follow the instructions, but I'm not sure I'm doing things properly. First I changed my .bat file to this:

@echo off

set SAFETENSORS_FAST_GPU=1

set PYTHON=
set GIT=
set VENV_DIR=

set COMMANDLINE_ARGS=--autolaunch --theme dark --opt-sdp-attention
set TORCH_COMMAND=pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118

call webui.bat

But when I launched the UI with this script, nothing happened (no install of PyTorch 2.0). So I entered manually the following command in my command line to force installation :

pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118

It seemed to install properly, but when I then run the UI, I get around only 3.1 it/s with my RTX 2060.

If instead I use xformers, with those commandline_args, I get around 4.1 it/s

set COMMANDLINE_ARGS=--xformers --opt-channelslast --autolaunch --theme dark --always-batch-cond-uncond --medvram --disable-safe-unpickle

So it seems xformers are still much better for me. Do you think I did something wrong? Are there some instructions I can type to check if everything is setup properly? Thank you!

6 replies

Michoko92 Mar 17, 2023
Collaborator

Thank you, this is useful. Indeed, at the bottom of the UI page, I have:

python: 3.10.6 • torch: 1.13.1+cu117 • xformers: N/A • gradio: 3.16.2

So it seems I don't use PyTorch 2.0. Not sure what to do though. I followed your advice of adding --reinstall-torch. Here is my .bat file:

@echo off

set SAFETENSORS_FAST_GPU=1

set PYTHON=
set GIT=
set VENV_DIR=
set TORCH_COMMAND=pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118
set COMMANDLINE_ARGS=--opt-sdp-attention --opt-channelslast --autolaunch --theme dark --always-batch-cond-uncond --medvram --disable-safe-unpickle --reinstall-torch

call webui.bat

And here is the output when I run it:

Python 3.10.6 | packaged by conda-forge | (main, Oct  7 2022, 20:14:50) [MSC v.1916 64 bit (AMD64)]
Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
Installing torch and torchvision
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu118
Requirement already satisfied: torch in d:\ai\automatic\venv\lib\site-packages (1.13.1+cu117)
Requirement already satisfied: torchvision in d:\ai\automatic\venv\lib\site-packages (0.14.1+cu117)
Requirement already satisfied: typing-extensions in d:\ai\automatic\venv\lib\site-packages (from torch) (4.4.0)
Requirement already satisfied: requests in d:\ai\automatic\venv\lib\site-packages (from torchvision) (2.25.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in d:\ai\automatic\venv\lib\site-packages (from torchvision) (9.4.0)
Requirement already satisfied: numpy in d:\ai\automatic\venv\lib\site-packages (from torchvision) (1.23.3)
Requirement already satisfied: chardet<5,>=3.0.2 in d:\ai\automatic\venv\lib\site-packages (from requests->torchvision) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in d:\ai\automatic\venv\lib\site-packages (from requests->torchvision) (1.26.14)
Requirement already satisfied: certifi>=2017.4.17 in d:\ai\automatic\venv\lib\site-packages (from requests->torchvision) (2022.12.7)
Requirement already satisfied: idna<3,>=2.5 in d:\ai\automatic\venv\lib\site-packages (from requests->torchvision) (2.10)

[notice] A new release of pip available: 22.2.1 -> 23.0.1
[notice] To update, run: D:\AI\automatic\venv\Scripts\python.exe -m pip install --upgrade pip
Installing requirements for Web UI


Launching Web UI with arguments: --opt-sdp-attention --opt-channelslast --autolaunch --theme dark --always-batch-cond-uncond --medvram --disable-safe-unpickle
No module 'xformers'. Proceeding without it.

Is there a way to force it to reinstall torch v2.0 instead of v1.13.1?

missionfloyd Mar 18, 2023
Collaborator

Try set TORCH_COMMAND=pip install torch==2.0.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu118

Or delete the venv folder and it'll reinstall it all.

Michoko92 Mar 18, 2023
Collaborator

Setting the variable that way did the trick, now I got some slight performance increase (around 10-15%), which is great! Cheers!

aleimu Mar 23, 2023

What about VRAM usage after this update changes? my card is also 2060 and want to know if it's worth upgrading?

younyokel May 4, 2023

What about VRAM usage after this update changes? my card is also 2060 and want to know if it's worth upgrading?

Same card. I am having CUDA OOMs without medvram at hires fix 1.95x upscale for 512x512 after upgrading.

aifartist · 2023-03-17T19:36:44Z

aifartist
Mar 17, 2023
Author

To every one here. If you are struggling to know for sure whether your hacks or torch 2.0 is setup correctly do the following please.
Add the following line to your launch script:

python3 -c 'import torch;print(f"torch {torch.__version__}, cuda {torch.version.cuda}, cudnn {torch.backends.cudnn.version()}")'

Windows webui.bat: Add it just before the TWO launch.py lines.
Linux webui.sh: Add it just before the TWO exec ... ${LAUNCH_SCRIPT} lines.

You will then know for sure whether you have what you think you have correctly installed.
Ideally this should be done in the actual python code for A1111 so it can be printed after all the ever increasing noise of timing to load x, y and z. It should be just before printing the URL to connect to.

3 replies

Michoko92 Mar 17, 2023
Collaborator

Thank you for the tips. I added the line, but was getting errors about "unterminated string litteral" (on Windows). When I inverted simple and double quotes in the code of the line, it worked though.

Here is the output I get:

torch <module 'torch.version' from 'C:\\Users\\Fred\\AppData\\Roaming\\Python\\Python310\\site-packages\\torch\\version.py'>, cuda 11.8, cudnn 8700

And the content of the version.py file indicated in the output is:

__version__ = '2.0.0+cu118'
debug = False
cuda = '11.8'
git_version = 'c263bd43e8e8502d4726643bc6fd046f0130ac0e'
hip = None

So I suppose everything is properly set?

aifartist Mar 17, 2023
Author

Yes. You have 8.7 so what kind of GPU, CPU and perf do you get?

Michoko92 Mar 18, 2023
Collaborator

Actually torch 2.0 was not active (I still had the old version displayed at the bottom of the UI). Adding

set TORCH_COMMAND=pip install torch==2.0.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu118

did the trick. Now I can see an improvements. With my RTX 2060 (6 GB VRAM), I had around 3.9-4,1 it/s on DDIM, now I get around 4,5-4,6, so it is nice.

elen07zz · 2023-03-17T23:29:22Z

elen07zz
Mar 17, 2023

Does anyone with an amd card know if there is any benefit to updating pytorch?

and how did he do it?

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

6 replies

biship Mar 18, 2023

same, no improvement at all.

tufstraka Mar 19, 2023

👀

ClashSAN Apr 5, 2023
Collaborator

#8367 the PR author has said this works with rocm, briefly. Has anyone been able to confirm?

Sakura-Luna Apr 5, 2023
Collaborator

@ClashSAN I can confirm that AMD GPUs are not gaining.

digital-pers0n May 16, 2023

--opt-sdp-attention is fast
But uses a lot more more VRAM when resolution is increased. I can high res fix up to 1024x1536, but with --opt-sdp-attention only up to 768x1152

gsgoldma · 2023-03-17T23:51:29Z

gsgoldma
Mar 17, 2023

shouldn't this be applied as an official pull request, or no?

0 replies

Pedroman1 · 2023-03-18T00:03:53Z

Pedroman1
Mar 18, 2023

Man I got excited thought I could add a couple lines to the web ui user batch file and now I'm over here not even getting an output on my torch version beside the one at the bottom of automatic1111. I have no idea if its even activated. Considering I spend entire days trying to fiddle with even a single update and this is my first github/python app I have been playing around with I figuerd it would have clicked by now and I could just know to go into x y or z file and add this or that I thought I was pretty smart.

Is there a place to better learn the basics of this stuff or is it just trial and error for a couple years that gets you to a point where you intrinsically know to add x y z to a given file to get what you want. Right now I Feel like I need a litteral youtube video or screen cap or exact copy and paste of exactly where in each file each thing goes etc. and I honestly feel bad asking so many questions

1 reply

aifartist Mar 18, 2023
Author

Man I got excited thought I could add a couple lines to the web ui user batch file and now I'm over here not even getting an output on my torch version beside the one at the bottom of automatic1111. I have no idea if its even activated. Considering I spend entire days trying to fiddle with even a single update and this is my first github/python app I have been playing around with I figuerd it would have clicked by now and I could just know to go into x y or z file and add this or that I thought I was pretty smart.

Is there a place to better learn the basics of this stuff or is it just trial and error for a couple years that gets you to a point where you intrinsically know to add x y z to a given file to get what you want. Right now I Feel like I need a litteral youtube video or screen cap or exact copy and paste of exactly where in each file each thing goes etc. and I honestly feel bad asking so many questions

I wish I knew what you were talking about. Since you are saying "getting an output on my torch version" are you referring to the python command I gave to print the "torch version"? If so did you see Mickoko92 comment above that he had to switch the single/double quotes to get it to work.

Pedroman1 · 2023-03-18T00:40:36Z

Pedroman1
Mar 18, 2023

Yeah I'll check it out soon that was what I was talking about I am running to work rn but I never got any kind of error like he did though which makes me wonder if I put it in the correct spot. Anyways I'll get it going

…

On Fri, Mar 17, 2023 at 8:38 PM aifartist ***@***.***> wrote: Man I got excited thought I could add a couple lines to the web ui user batch file and now I'm over here not even getting an output on my torch version beside the one at the bottom of automatic1111. I have no idea if its even activated. Considering I spend entire days trying to fiddle with even a single update and this is my first github/python app I have been playing around with I figuerd it would have clicked by now and I could just know to go into x y or z file and add this or that I thought I was pretty smart. Is there a place to better learn the basics of this stuff or is it just trial and error for a couple years that gets you to a point where you intrinsically know to add x y z to a given file to get what you want. Right now I Feel like I need a litteral youtube video or screen cap or exact copy and paste of exactly where in each file each thing goes etc. and I honestly feel bad asking so many questions I wish I knew what you were talking about. Since you are saying "getting an output on my torch version" are you referring to the python command I gave to print the "torch version"? If so did you see Mickoko92 comment above that he had to switch the single/double quotes to get it to work. — Reply to this email directly, view it on GitHub <#8691 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVJSB5V27JNWDR2G3NTXXF3W4T7XHANCNFSM6AAAAAAV57LGOA> . You are receiving this because you were mentioned.Message ID: <AUTOMATIC1111/stable-diffusion-webui/repo-discussions/8691/comments/5350835 @github.com>

-- Pete Stueve

0 replies

bbecausereasonss · 2023-03-18T00:58:13Z

bbecausereasonss
Mar 18, 2023

FYI, I deleted my VENV folder first then added the lines. The first thing that happened was torh 2 got installed with cuda 118. Hopefully nothing messes it up.

Shouldn't I have to change something in the launch.py? Since it references cu117?

Collecting torch
Downloading https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-win_amd64.whl (2611.3 MB)
---------------------------------------- 2.6/2.6 GB 1.8 MB/s eta 0:00:00

 hmmm spoke to soon, getting a a maaaaaaaaaaaaaaaaaaaaaaassive laundry list of errors now.
 
 RuntimeError:

The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.

1 reply

aifartist Mar 18, 2023
Author

Yes, eventually A1111 will need to be updated for both Torch 2X and cu118.

bbecausereasonss · 2023-03-18T01:18:04Z

bbecausereasonss
Mar 18, 2023

Doesn't this conflict with the launch.py line? And can't it be installed just by changing said line?

torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117")

2 replies

missionfloyd Mar 18, 2023
Collaborator

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117 is the default value if the TORCH_COMMAND environment variable doesn't exist.

https://docs.python.org/3.10/library/os.html#os.getenv

And yes, you can install it that way too.

ThioJoe Apr 22, 2023

And yes, you can install it that way too.

Yea but it will get reverted every time you do a git pull

bluekght · 2023-03-18T08:17:26Z

bluekght
Mar 18, 2023

So I gave this a run for comparison purposes on my rig with a 3080 10Gb card. (Which can be memory challenged at times, if only I knew then what I know now.)

I saw very minimal speed improvements on benchmark iterations (~5%) with pyTorch 2.0 and SDP flags, versus using Xformers. I also discovered that my maximum Hires Resolutions had dropped signicantly, I was hitting out-of-memory errors at resolutions that worked before.

For example, with Xformers and Medvram flag, I could hi-res a 640x768 image to 1600x1920. (2.5x)
With Pytorch 2.0 and SDP, and Medvram flag, I could only hi-res that 640x768 to a maximum of 1312x1574. (2.05x)

So, it appears that for earlier gen cards - due to memory issues and no noticeable speed gains, this is something of a downgrade. I've reverted to using Xformers again.

8 replies

arijoon Mar 20, 2023

Unfortunately I was unable to fnd your reply, would be useful if someone can link it here. Regarding the memory issue, this is also completely breaks it for me --(cannot even 2x upscale 512x768 with a 4080)-- incorrect see bellow
Performance is fantastic however, no doubt about it, upscaling step is 4x faster and delay between steps has been reduced to almost nothing. Hopefully there'll be a solution for memory usage soon

Update: After restarting my WSL, seems like some things weren't properly garbage collected (had some initial cudnn issues) its working fine. the speed improvement is fantastic especially on the upscaling step (2-3x faster). running with following options:

PYTORCH_CUDA_ALLOC_CONF="garbage_collection_threshold:0.9,max_split_size_mb:512" \
  TORCH_COMMAND="pip install torch torchvision pytorch-triton --extra-index-url https://download.pytorch.org/whl/cu118" \
  py launch.py \
  --no-half-vae \
  --medvram \
  --opt-sdp-no-mem-attention

Danmoreng Mar 20, 2023

@arijoon I had the same problems with hiresfix and vram while getting an insane performance boost from 9.5it/s to 19.5it/s. (On my 4070ti) However, @vladmandic mentioned the performance boost for came from broken cudnn and not PyTorch 2. So what I did was getting back to my old install with PyTorch 1.3 and replacing cudnn with newer version. With this I also get 19.5it/s and without the vram problems. --medvram reduces performance to around 14.5it/s but allows even higher upscaling. 1280x720 2x is no problem.

Sakura-Luna Mar 20, 2023
Collaborator

@arijoon

Find it and comment out ldm.modules.diffusionmodules.model.AttnBlock.forward = sd_hijack_optimizations.sdp_attnblock_forward.

See if it works for you.

Soulreaver90 Mar 20, 2023

@arijoon I had the same problems with hiresfix and vram while getting an insane performance boost from 9.5it/s to 19.5it/s. (On my 4070ti) However, @vladmandic mentioned the performance boost for came from broken cudnn and not PyTorch 2. So what I did was getting back to my old install with PyTorch 1.3 and replacing cudnn with newer version. With this I also get 19.5it/s and without the vram problems. --medvram reduces performance to around 14.5it/s but allows even higher upscaling. 1280x720 2x is no problem.

Well that puts the nail in the coffin for AMD. I encountered no speed improvements on torch 2.0. If the improvements came from cudnn, that would suck as we rely on rocm and the new version didn’t add much to speed. Owell.

arijoon Mar 20, 2023

based on my update after restarting the VM my memory problems went away. I didn't benchmark how much memory I was using vs now, but I can easily generate 1280x1920 with medvram option. Frankly I am tempted to roll it back if there's no improvement with 2.0, but since I'm not experiencing any degredation either will keep it to report any surprising issues

gahara42 · 2023-03-18T13:15:22Z

gahara42
Mar 18, 2023

Going to add my 5 cents to this discussion. I made a separate install, just to avoid any possible conflicts with my current setup and for easier testing/comparing.

I'm using 1660 Super with 6GB VRAM, so I'm most likely not even a target audience for these improvements.

I will use the same prompt, same checkpoint, one lora and Coyote-A/ultimate-upscale-for-automatic1111 then hires fix.

Args for comparsion:
current setup i'm using (torch: 1.13.1+cu117 cudnn 8500) — COMMANDLINE_ARGS=--xformers --no-half --medvram
testing setup with new torch (torch: 2.0.0+cu118 cudnn 8700) — COMMANDLINE_ARGS=--opt-sdp-attention --no-half --medvram

(note: --opt-sdp-no-mem-attention almost instantly gives me out of vram error, but that's expected with my gpu to be fair)

Most commonly used samplers,
768x512, 25 steps, CFG 7, random seed,

Coyote-A/ultimate-upscale-for-automatic1111 params: using Euler a, x2 with R-ESRGAN 4x+, chess type, 512 tile, mask 12, padding 32, denoising strength 0.3

Hires fix params: upscaler R-ESRGAN 4x, hires steps 10, denoising strength 0.6, upscale by 1.5

	xformers	sdp	u sd upscale (xformers)	u sd upscale (sdp)	hires fix (xformers)	hires fix (sdp)
Euler a	[00:24<00:00, 1.07it/s]	[00:26<00:00, 1.00it/s]	[00:58<00:00, 1.21s/it]	[01:04<00:00, 1.34s/it]	[01:06<00:00, 2.96s/it]	[01:50<00:00, 3.35s/it]
DPM++ 2M Karras	[00:24<00:00, 1.07it/s]	[00:26<00:00, 1.01it/s]	[00:55<00:00, 1.16s/it]	[00:57<00:00, 1.20s/it]	[01:05<00:00, 2.94s/it]	[01:10<00:00, 3.12s/it]
DPM++ SDE Karras	[00:47<00:00, 1.88s/it]	[00:50<00:00, 2.00s/it]	[00:55<00:00, 1.16s/it]	[00:58<00:00, 1.21s/it]	[01:54<00:00, 5.80s/it]	[02:03<00:00, 6.11s/it]
DDIM	[00:24<00:00, 1.06it/s]	[00:26<00:00, 1.00s/it]	[00:55<00:00, 1.16s/it]	[00:58<00:00, 1.21s/it]	[01:05<00:02, 2.99s/it]	[01:21<00:03, 3.29s/it]
UniPC	[00:24<00:00, 1.07it/s]	[00:25<00:00, 1.01s/it]	[00:55<00:00, 1.15s/it]	[00:58<00:00, 1.22s/it]	[01:02<00:02, 2.89s/it]	[01:08<00:03, 3.07s/it]

VRAM info from vladmandic/sd-extension-system-info after generations with torch: 1.13.1+cu117

gpu: free:4.14 used:1.86 total:6.0
gpu-active: current:0.03 peak:3.46
gpu-allocated: current:0.03 peak:3.46
gpu-reserved: current:0.05 peak:4.25
gpu-inactive: current:0.01 peak:1.47

VRAM info from vladmandic/sd-extension-system-info after generations with torch: 2.0.0+cu118

gpu: free:4.04 used:1.96 total:6.0
gpu-active: current:0.05 peak:3.47
gpu-allocated: current:0.05 peak:3.47
gpu-reserved: current:0.08 peak:4.27
gpu-inactive: current:0.03 peak:1.41

VRAM info from vladmandic/sd-extension-system-info after generations with hires fix torch: 1.13.1+cu117

gpu: free:4.07 used:1.93 total:6.0
gpu-active: current:0.04 peak:4.37
gpu-allocated: current:0.04 peak:4.37
gpu-reserved: current:0.06 peak:5.23
gpu-inactive: current:0.02 peak:1.42

VRAM info from vladmandic/sd-extension-system-info after generations with hires fix torch: 2.0.0+cu118

gpu: free:4.04 used:1.96 total:6.0
gpu-active: current:0.04 peak:4.37
gpu-allocated: current:0.04 peak:4.37
gpu-reserved: current:0.08 peak:5.22
gpu-inactive: current:0.04 peak:1.39

13 replies

vladmandic Mar 18, 2023
Collaborator

just a quick fyi, normally performance for stable diffusion is measured in iterations-per-second (it/s) where higher is better. measuring elapsed seconds is opposite of that.

gahara42 Mar 18, 2023

just a quick fyi, normally performance for stable diffusion is measured in iterations-per-second (it/s) where higher is better. measuring elapsed seconds is opposite of that.

yeah, but if iterations-per-second are higher, then total time will be lower, i assume.
if there's a command or something that will allow me to have a better logging for it/s in console i would gladly redo the table, because right now i feel like console log is a little bit weird and not consistent

gahara42 Mar 18, 2023

The biggest startup overhead of webui is in gradio, so no plugins are fast.

alright, can you please tell me where do i better look for VRAM consumption for making the feedback better?
do i get value from Torch active/reserved: or windows task manager or maybe hwinfo64?

vladmandic Mar 18, 2023
Collaborator

https://github.com/vladmandic/sd-extension-system-info

gahara42 Mar 18, 2023

i updated the comment with it/s and data from system info,
i will test how far i can go with hires fix and add that info to the table

hollowstrawberry · 2023-03-19T00:50:03Z

hollowstrawberry
Mar 19, 2023

I'm on an RTX 2060 mobile with 6 GB VRAM. It worked perfectly. On 30 steps of Euler a the total time registered 5.5 it/s with --xformers before installation, 5.8 it/s with --opt-sdp-no-mem-attention, and 6.1 it/s with --opt-sdp-attention. I also have --opt-channels-last just in case. In practical terms I went from 10.5 seconds per generation to 9.5 seconds per generation. Also registered 31s -> 28s for a batch of 4. Minimal but that's fine.

My process is as follows: I opened cmd in the install folder and ran .\venv\scripts\activate then pip3 install clean-fid numba numpy torch==2.0.0+cu118 torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu118. I encountered a red pip warning for package dependencies, but I learned to just ignore those while using trainer colabs...

Edit: Unfortunately I can't seem to hires fix anymore... The memory usage is slightly too high and it gets stuck.

3 replies

QuietNoise Mar 19, 2023

I would double check if --opt-channels-last actually doesn't slow you down. In my case it made things around 5% slower.
It's good for fast CPUs only from what I've heard.

bluekght Mar 19, 2023

Yeah, minor speed gains but a LOT less VRAM avaialble for hires, is what I also found. So, for me this was a downgrade.

aleimu Mar 23, 2023

Overall, the performance has improved slightly, but the VRAM usage has also increased, right?

Danmoreng · 2023-03-19T15:06:02Z

Danmoreng
Mar 19, 2023

I get a really good performance bump with this on my 4070 Ti.
With xformers before I got ~9.5it/s, with PyTorch 2.0 I get ~19.4it/s

However, I run faster out of VRAM when using HiresFix which is really a problem. Before I used to be able to generate images at 1280x720 and upscale them 2x to 2440x1440. Which was nice for wallpapers. Ocassionally this already crashed and I had to use 1.8x instead of 2x in the HighresFix and use normal upscaling to WQHD size after. Now I cannot even use HighresFix 1.8x. Even when setting --medvram.

How to fix the VRAM problems?

4 replies

vladmandic Mar 19, 2023
Collaborator

you don't get double performance because of new torch, you get it because your cudnn is no longer broken - and there are plenty of posts showing how to fix it with old torch.
are regarding hiresfix, i'd love if questions for that were in thread decidated to hires fix.

Danmoreng Mar 19, 2023

Well, then I will try to fix cudnn in my old install to get more performance with less VRAM issues. Thanks!

BobBoxArt Mar 21, 2023

you don't get double performance because of new torch, you get it because your cudnn is no longer broken - and there are plenty of posts showing how to fix it with old torch. are regarding hiresfix, i'd love if questions for that were in thread decidated to hires fix.

Can you link me to one of the posts on how to fix it?

QuietNoise Mar 21, 2023

https://www.reddit.com/r/StableDiffusion/comments/y71q5k/comment/jcm67lu/?utm_source=share&utm_medium=web2x&context=3

f0lie · 2023-03-20T03:07:59Z

f0lie
Mar 20, 2023

Has anyone tried to use torch.compile to get speed improvements yet? We have to modify the code itself to enable this feature.

I am not familiar with the codebase so I am not sure if this would require a major rewrite or not.

1 reply

vladmandic Mar 20, 2023
Collaborator

There is a thread dedicated to experiments with torch compile and it's quite long...

bbecausereasonss · 2023-03-20T21:35:23Z

bbecausereasonss
Mar 20, 2023

So after the changes, it seems faster but my gens also seem worse quality. Am i crazy?

0 replies

Zetaphor · 2023-03-22T02:40:10Z

Zetaphor
Mar 22, 2023

Anyone else running into this error on Linux? Running Fedora 37
Using the previously installed cu113 works perfectly fine

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory

8 replies

Zetaphor Mar 23, 2023

It looks like all I have in my torch/lib in site-packages is libnvrtc-672ee683.so.11.2 and libnvrtc-builtins.so.11.8

drax-xard Mar 23, 2023

Yeah, mine was libnvrtc-d833c4f3.so.11.2, I created a symlink to that named libnvrtc.so

Zetaphor Mar 23, 2023

Yeah, mine was libnvrtc-d833c4f3.so.11.2, I created a symlink to that named libnvrtc.so

That solved it for me, thank you!

drax-xard Mar 23, 2023

My pleasure.

FurkanGozukara Apr 2, 2023

Yeah, mine was libnvrtc-d833c4f3.so.11.2, I created a symlink to that named libnvrtc.so

how do you make this?

Cyberbeing · 2023-03-23T04:23:54Z

Cyberbeing
Mar 23, 2023

xformers dev/rc481 pre-built Torch 2.0.0+cu118 builds are now available on pip as of yesterday:
pip install -U --pre xformers

12 replies

Michoko92 Mar 23, 2023
Collaborator

Hi @Cyberbeing , thank you for pointing to the installation instruction for Xformers compatible with torch 2.0. However, I still get an error. I installed xformers using:

pip install -U --pre xformers

and all went fine. Then I launch the WebUI with the following command args:

--xformers --opt-channelslast --autolaunch --theme dark --always-batch-cond-uncond --medvram --disable-safe-unpickle

And I get the following error:

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 1.13.1+cu117 with CUDA 1107 (you have 2.0.0+cu118)
    Python  3.10.9 (you have 3.10.6)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details

I also tried to add --reinstall-xformers to my command args, but it didn't help.

Do I still need an extra step to install it properly and make it work, please? Sorry I'm still new to all those python environments stuff. :) Thank you!

Cyberbeing Mar 23, 2023

I think this may be because the webui may reinstall older xformers on launch.
aifartist had an issue open about that here: #8696

Adding --skip-install --skip-version-check to your CLI arguments may avoid it, but I can't test that myself at the moment.

Just make sure you are remembering to run activate.bat from venv\Scripts\ first in your cmd prompt before running pip install -U --pre xformers otherwise xfromers would be installed in system python rather than your venv.

Michoko92 Mar 23, 2023
Collaborator

Activate.bat did the trick for me. Thank you so much! Early tests show a very slight increase in favor of xformers compared to sdp (from 5,1 to 5.25 it/s on my RTX 2060). I'll make further tests regarding vram consumption. Thanks again for your help! 😊🙏

Cyberbeing Apr 17, 2023

As of this past Friday, it would appear that xformers may now be deterministic when using cutlass memory-efficient forward (which is the default forward method). That change should likely exist in the 0.0.19.dev516 build or newer on pypi.

Sakura-Luna Apr 17, 2023
Collaborator

Integration into PyTorch can take a long time.

Cyberbeing · 2023-03-23T06:55:49Z

Cyberbeing
Mar 23, 2023

On a side note, the pull request I was waiting for which added BFloat16 support to nn.functional.interpolate barely missed Torch 2.0.0 GA (only available in torch-2.0.0.dev20230228 Nightly or newer), which was disappointing. This means I'll personally need to continue using Nightly builds until Torch 2.1.0 GA.

So for anyone still dealing with VAE NaN (black output) issues on Ampere+, you'd need to wait until Torch 2.1.0 GA to be able to use BFloat16 VAE bias (without additional torch modifications), which resolves the VAE NAN issue without needing to use --no-half-vae.

For that reason, if webui were planning to migrate all users to fully to Torch 2.x in the near future, it may be better to wait until 2.1.0 release rather than migrating to 2.0.0 then 2.1.0 shortly after (which my guess wouldn't happen, leaving users on 2.0.0 for a long time).

4 replies

aifartist Mar 23, 2023
Author

For that reason, if webui were planning to migrate all users to fully to Torch 2.x in the near future, it may be better to wait until 2.1.0 release rather than migrating to 2.0.0 then 2.1.0 shortly after (which my guess wouldn't happen, leaving users on 2.0.0 for a long time).

I totally agree with that. 2.0.0 release notes listed all features but one as being of beta or even prototype quality. It appears it was rushed out the door to meet the desired GA date. torch.compile maxautotune inductor absolutely is broken. I have to code my own fix to get it to work. Leaving a bunch of library files in the /tmp dir is a poorly thought out caching strategy. There should be a configurable cache directory and size. A tool to list what you have currently cached(model that was compiled and other factors like image size used to determine whether it can use the entry or needs to recompile).

Sakura-Luna Apr 1, 2023
Collaborator

Why do you think this PR can solve the problem of vae outputting black images? I tested the nightly version and still have black images.

Cyberbeing Apr 1, 2023

Why do you think this PR can solve the problem of vae outputting black images? I tested the nightly version and still have black images.

The PR allows you to set your VAE bias to torch.bfloat16 without torch throwing an error about it being unsupported in interpolate. BF16 allows you to avoid FP16 underflow/overflow NANs which are commonly caused by the NovelAI/Anything VAE used by many mixes.

You'll need to change torch.float16 in these lines to torch.bfloat16:

stable-diffusion-webui/modules/devices.py

Line 81 in 22bcc7b

dtype_vae = torch.float16

stable-diffusion-webui/modules/sd_models.py

Line 308 in 22bcc7b

    
           devices.dtype_vae = torch.float32 if shared.cmd_opts.no_half or shared.cmd_opts.no_half_vae else torch.float16

You may also want to comment out the below if statement, leaving only model.first_stage_model = None there without the indent if using an actual FP32 or BF16 VAE model to avoid a lossy FP16 to BF16 conversion a few lines below, but this isn't required and will have no effect at all when using a FP16 VAE model and also makes no difference with NANs:

stable-diffusion-webui/modules/sd_models.py

Lines 294 to 295 in 22bcc7b

    
           if shared.cmd_opts.no_half_vae: 
        
               model.first_stage_model = None

This of course won't help if you are getting black screens caused by NANs on UNET or otherwise VAE NANs which were not previously solved by --no-half-vae. There seems like there may actually be a regression in webui hires fix which can cause UNET NANs when using non-latent upscalers to output certain resolutions when using sdp. From what I've seen, these NANs don't occur when using txt2img>extras>img2img or txt2img>img2img separately to achieve the same result, but I still need to do more testing to be sure and hopefully track down the cause.

Sakura-Luna Apr 2, 2023
Collaborator

Your change has too much impact, I made a PR, you can test it.

skxj8888 · 2023-03-24T06:20:48Z

skxj8888
Mar 24, 2023

Will replacing \stable-diffusion-webui\venv\Lib\site-packages\torch\lib with \cudnn-windows-x86_64-8.8.1.3_cuda11-archive\bin be improved?Everything else remains unchanged

2 replies

Cyberbeing Mar 25, 2023

You are free to update CUDNN 8.8.1.3 if you desire, I personally use it without issue on my RTX A4000 on Windows native, but vladmandic was having system faults with CUDNN 8.8.0 using his RTX 3060 on Linux. Overall, there isn't much of an advantage over CUDNN 8.7.0 unless you have an H100/H800 GPU or otherwise using pytorch built for CUDA12. YMMV

vladmandic Mar 25, 2023
Collaborator

Just a quick follow-up, haven't had issues with cudnn 8.8 since torch 2.0 ga, only with earlier betas, I think it was related to cuda cache locking.

But no real benefits either.

Arctomachine · 2023-03-28T19:23:34Z

Arctomachine
Mar 28, 2023

I updated using this instruction (no xformers)
Generation speed went from 3.2s/it to 7.3s/it, which is more than twice as slow.
New version is displayed in interface, so it should be working python: 3.10.6 • torch: 2.0.0+cu118 • xformers: N/A
What did I do wrong?

I looked into this further and it appears --opt-sdp-attention is the reason for low performance. I removed it and got similar generation speed to old version.

0 replies

vladmandic · 2023-03-28T19:43:55Z

vladmandic
Mar 28, 2023
Collaborator

if you read the though history, it tells that SDP is not fix-all. SDP appears better than Xformers when you have a high-end GPU and you're CPU bound and Xformers are better when you're have a low-end GPU. For any mid-range GPU/CPU combo, they are nearly identical.

0 replies

FurkanGozukara · 2023-03-29T19:06:51Z

FurkanGozukara
Mar 29, 2023

i am trying to run dreambooth on runpod

unfortunately pytorch team removed xformers older version
i cant believe how smart they are
now we have to use torch 2
however it is not working on runpod

here the errors and steps i tried to solve the problem

I have installed Torch 2 via this command on RunPod io instance

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Everything installed perfectly fine

With Torch 1 and Cuda 11.7, I was not getting any error but with Torch 2 the below error produced

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory

How to fix?

It is using unix

On Windows same prodecure working very well

Using Automatic1111 web UI to use Stable Diffusion

this above i couldnt solve

therefore i have done the following things

apt update
apt install sudo
sudo apt install nvidia-cudnn
sudo apt-get install python3-dev

after installing all above

now i have this warning and training never progress

Steps: 0%| | 0/170 [00:00<?, ?it/s][2023-03-29 18:50:26,163] torch._inductor.utils: [WARNING] not enough cuda cores to use max_autotune mode

now when i run below python code i see everything looking good

import torch

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available")
    # Display the current GPU name
    print("GPU name: ", torch.cuda.get_device_name(torch.cuda.current_device()))
else:
    print("CUDA is not available")

# Verify the PyTorch version
print("PyTorch version: ", torch.__version__)

import torch
print(torch.cuda.get_device_properties(0).multi_processor_count)

test.py result

CUDA is available
GPU name:  NVIDIA RTX A4500
PyTorch version:  2.0.0+cu118
56

it is able to generate images with 15.58it which is very fast

any help appreciated very much

0 replies

vladmandic · 2023-03-29T19:24:47Z

vladmandic
Mar 29, 2023
Collaborator

@FurkanGozukara don't copy &paste same message on multiple discussions. i answered in a different thread.

1 reply

FurkanGozukara Mar 29, 2023

Thanks but from your answer I can't get a solution

glass-ships · 2023-03-30T17:29:27Z

glass-ships
Mar 30, 2023

So, mostly for the sake of newcomers to this discussion with not much context,
I'm trying to boil this down to simple, reproducible steps - someone correct me if I've misunderstood something.

Install Python 3.10 (not 3.11 yet presumably)
Install Cuda 11.8
Clone stable-diffusion-webui

Set the TORCH_COMMAND env var, or edit launch.py to reflect new torch version as follows:

...
# OLD 
# torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117")

# NEW 
torch_command = os.environ.get('TORCH_COMMAND', "pip install clean-fid triton torch==2.0.0+cu118 torchvision --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu118")

Run webui-user.bat as normal ???

0 replies

vladmandic · 2023-03-30T17:44:08Z

vladmandic
Mar 30, 2023
Collaborator

i've created a pr #9191

7 replies

Pedroman1 Apr 5, 2023

So listen sorry to be buggin you all but i cant for the life of me figure out how to get a pull request to work. I wanted to try this version out thought it would be as easy as doing a git clone but that didnt work. So then I tried the following commands from the same directory as automatic1111.

git remote add vladmandic https://github.com/vladmandic/automatic.git
git fetch vladmandic
git checkout vladmandic/torch

but it didnt' update torch or anything like that so I am guessing I have some commands wrong. Sorry I just started to use github with automatic repo.

vladmandic Apr 6, 2023
Collaborator

@Pedroman1 if torch is already installed, webui will use whatever is installed. if you want to force reinstall, specify flag --reinstall-torch

Pedroman1 Apr 6, 2023

I have a 4090 and this is soooo much faster thank you so much

CorporallClegg Apr 12, 2023

i've created a pr #9191

Is there a comprehensive Python/Git/Automatic111 workflow for Windows using your fork and pytorch/cdnn update package?

FurkanGozukara May 3, 2023

i've created a pr #9191

Is there a comprehensive Python/Git/Automatic111 workflow for Windows using your fork and pytorch/cdnn update package?

this 2 videos will help you significantly

1.) Automatic1111 Web UI - PC - Free

How To Install Python, Setup Virtual Environment VENV, Set Default Python System Path & Install Git

25.) Automatic1111 Web UI - PC - Free

How To Install New DREAMBOOTH & Torch 2 On Automatic1111 Web UI PC For Epic Performance Gains Guide

younyokel · 2023-05-04T11:33:47Z

younyokel
May 4, 2023

Anyone getting something like this with Hires fix?

Traceback (most recent call last):
  File "D:\Program Files\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "D:\Program Files\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\Program Files\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "D:\Program Files\stable-diffusion-webui\modules\processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "D:\Program Files\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "D:\Program Files\stable-diffusion-webui\modules\processing.py", line 671, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\Program Files\stable-diffusion-webui\modules\processing.py", line 671, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\Program Files\stable-diffusion-webui\modules\processing.py", line 444, in decode_first_stage
    x = model.decode_first_stage(x)
  File "D:\Program Files\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\Program Files\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\Program Files\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Program Files\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "D:\Program Files\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "D:\Program Files\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Program Files\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 641, in forward
    h = self.up[i_level].upsample(h)
  File "D:\Program Files\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Program Files\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 64, in forward
    x = self.conv(x)
  File "D:\Program Files\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Program Files\stable-diffusion-webui\extensions\a1111-sd-webui-locon\scripts\..\..\..\extensions-builtin/Lora\lora.py", line 323, in lora_Conv2d_forward
    return torch.nn.Conv2d_forward_before_lora(self, input)
  File "D:\Program Files\stable-diffusion-webui\extensions\a1111-sd-webui-lycoris\lycoris.py", line 752, in lyco_Conv2d_forward
    return torch.nn.Conv2d_forward_before_lyco(self, input)
  File "D:\Program Files\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\Program Files\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Everything else works. Only happens with hires fix on just after 100%.

2 replies

younyokel May 7, 2023

nvm torch 2 uses more vram im out of it

KazumaArt May 23, 2023

I had the same issue with torch 2.0.1. The only solution that worked for me was to reinstall everything, but with version 1.13.1. That restored everything to normal.

PyTorch 2.0.0 is now GA #8691

Replies: 32 comments · 112 replies

aifartist Mar 17, 2023 Author

missionfloyd Mar 17, 2023 Collaborator

aifartist Mar 17, 2023 Author

Sakura-Luna Mar 17, 2023 Collaborator

vladmandic Mar 17, 2023 Collaborator

Sakura-Luna Mar 17, 2023 Collaborator

Sakura-Luna Mar 17, 2023 Collaborator

Michoko92 Mar 17, 2023 Collaborator

Michoko92 Mar 17, 2023 Collaborator

missionfloyd Mar 18, 2023 Collaborator

Michoko92 Mar 18, 2023 Collaborator

aifartist Mar 17, 2023 Author

Michoko92 Mar 17, 2023 Collaborator

aifartist Mar 17, 2023 Author

Michoko92 Mar 18, 2023 Collaborator

ClashSAN Apr 5, 2023 Collaborator

Sakura-Luna Apr 5, 2023 Collaborator

aifartist Mar 18, 2023 Author

aifartist Mar 18, 2023 Author

missionfloyd Mar 18, 2023 Collaborator

Sakura-Luna Mar 20, 2023 Collaborator

Replies: 32 comments 112 replies

aifartist Mar 17, 2023
Author

missionfloyd Mar 17, 2023
Collaborator

aifartist
Mar 17, 2023
Author

Sakura-Luna Mar 17, 2023
Collaborator

vladmandic
Mar 17, 2023
Collaborator

Sakura-Luna Mar 17, 2023
Collaborator

Sakura-Luna Mar 17, 2023
Collaborator

Michoko92
Mar 17, 2023
Collaborator

Michoko92 Mar 17, 2023
Collaborator

missionfloyd Mar 18, 2023
Collaborator

Michoko92 Mar 18, 2023
Collaborator

aifartist
Mar 17, 2023
Author

Michoko92 Mar 17, 2023
Collaborator

aifartist Mar 17, 2023
Author

Michoko92 Mar 18, 2023
Collaborator

ClashSAN Apr 5, 2023
Collaborator

Sakura-Luna Apr 5, 2023
Collaborator

aifartist Mar 18, 2023
Author

aifartist Mar 18, 2023
Author

missionfloyd Mar 18, 2023
Collaborator

Sakura-Luna Mar 20, 2023
Collaborator