Windows + AMD GPUs (DirectML) #7870

ClashSAN · 2023-02-17T09:18:02Z

ClashSAN
Feb 17, 2023
Collaborator

post a comment if you got @lshqqytiger 's fork working with your gpu.

Its good to observe if it works for a variety of gpus.

small (4gb) RX 570 gpu
~4s/it for 512x512 on windows 10,
slow, since I had to use --opt-sub-quad-attention --lowvram
maximum sizes: 512x768, 640x640

I did test loras, and control net extension, they work.

commit used lshqqytiger@dce51c5
used the modified modules:
https://github.com/lshqqytiger/k-diffusion-directml
https://github.com/lshqqytiger/stablediffusion-directml

lshqqytiger · 2023-02-17T09:31:16Z

lshqqytiger
Feb 17, 2023

But training doesn't work because torch.autocast is not implemented yet.

3 replies

ClashSAN Feb 17, 2023
Collaborator Author

that's alright, it still wonderful that other extensions like controlnet, and many other things will work.
👍 ty

ClashSAN Feb 24, 2023
Collaborator Author

@lshqqytiger you may want to open up discussions on your fork.
glad to see that many of us got it working!

lshqqytiger Feb 24, 2023

Are discussions disabled? I opened it.

Shad0wyDr3amz · 2023-02-17T14:05:37Z

Shad0wyDr3amz
Feb 17, 2023

ive had 4 people test this on rx6000 series cards and 2 of them had to copy these https://github.com/crowsonkb/k-diffusion &
https://github.com/Stability-AI/stablediffusion --->this had to be names stable-diffusion-stability-ai with this one /githubs into the same folder as the other 2 to get it working because it would error out for them and fail to install fully but once they did that they were able to run webui no issues

based on the feedback i recieved there appears to be a memory leak using the above method for the people that had to do it the way.
idk if the other people have that issue that used the normal method,
i will update with more info once they messege back .
i mean for the people that didn't have to do the method i described here

it seems for all 4 people if they do more than a batch size of 1 ram usage appears to max out their gpu or cause issues. the bigger the batch size the more it appears to cause a memory leak with each subsequent image after the first

2 replies

zdl5320 Dec 31, 2023

I am also RX6600. I added -- opt split attention -- opt sub quad attention -- medvram -- disable can check -- in webui user, and the following error occurred. So, how can I use it properly

lshqqytiger Dec 31, 2023

Add --use-directml.

qwerkilo · 2023-02-17T14:10:34Z

qwerkilo
Feb 17, 2023

set COMMANDLINE_ARGS=--opt-split-attention --medvram --disable-nan-check --autolaunch
My graphics card is 6800xt, I started with the above parameters, generated 768x512 img, Euler a, 1.09s/it when not exceeding my graphics card memory, 2.05s/it over 16g vram, I am currently using ControlNet extension and it works

11 replies

Shad0wyDr3amz Feb 17, 2023

so below 4 no issues for you just memory sitting at 15.6gbs everytime

qwerkilo Feb 17, 2023

so below 4 no issues for you just memory sitting at 15.6gbs everytime

yes

KangbingZhao Feb 20, 2023

This saves my day! I'm using VEGA iGPU of 5700G, the 16GB vram is insufficient without this COMMANDLINE_ARGS, but this works!

ArchAngelAries Feb 20, 2023

Using a 7900 XT and this helped some but even with 20GB of VRAM I'm not able to generate anything above 768x768 without error

GrgMdmn Jul 15, 2023

Thank your for your advice: I have a rx 6800 and I went from ~40s/it to 3s/it... !

qwerkilo · 2023-02-17T14:41:43Z

qwerkilo
Feb 17, 2023

768x512 resolution is always a bit more than 16g vram

4 replies

Shad0wyDr3amz Feb 17, 2023

does it stays consistant with each time no memory overload or errors

qwerkilo Feb 17, 2023

Almost yes

Shad0wyDr3amz Feb 17, 2023

ok thanks for the info

GREG114 Feb 18, 2023

good , my 3050ti make 704*960 picture is 2.03s/it

Shad0wyDr3amz · 2023-02-17T15:34:48Z

Shad0wyDr3amz
Feb 17, 2023

this is what all 4 of the people had an issue with yet the guide does not mention this which is why i had to have them add the repositories that i listed above.
@qwerkilo did you have to add this file to get it to work or no

9 replies

lshqqytiger Feb 17, 2023

You can use git submodule to skip renaming steps.

$ git submodule init
$ git submodule update

Wotonly Feb 17, 2023

Holy smokes mates, just came here by accident and now I am using my 6700XT with the nicest WebUI out there and gaining 2,2s/it.

Far away from what I achieve on my GFs 2060super, but still better than everything I was able to get before.

lshqqytiger Feb 17, 2023

Try --opt-sub-quad-attention. RX 6700 XT can get faster speed because fastest of my RX 5700 XT is 1.01s/it.

Shad0wyDr3amz Feb 17, 2023

You can use git submodule to skip renaming steps.
$ git submodule init
$ git submodule update

this might be good to add to what i changed so people can understand the install instructions better

Wotonly Feb 17, 2023

--opt-sub-quad-attention

And AGAIN a huge THAAANKS!

Hitting 1.5it/se now on average

You have to know, I am a huge newbie on this whole topic, and dudes like you just make the world a better place.

Kaori42 · 2023-02-17T15:49:39Z

Kaori42
Feb 17, 2023

Hi, so I've got it working on a 7900XTX but idk if I done something wrong :

Using those 2 repo :
https://github.com/lshqqytiger/k-diffusion-directml
https://github.com/lshqqytiger/stablediffusion-directml

Windows 11, 23.2.1 drivers
VRAM 17/24GB used
first run 5.7it/s 512x512
second run (same prompt) can be as slow as 11s/it (getting slower with time, start at 2s/it)
I'm running AbyssOrangeMix2_sfw safetensors/ckpt (tried both) with it's vae.pt

Can't get anything higher than 640x512 (if I'm lucky, more so 512x512) to first run fast, trying higher res give me either runtime error or random error with vram I think (couldn't reproduce), sometime it work but with at least 10s/it
Seems like when it's slow my gpu is drawing around 200/250W with maxed out core clock around 3000mhz, when it's fast it's running 350W 2600mhz

Runtime error here : (trying to run 1024x1024)

Traceback (most recent call last):
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\txt2img.py", line 56, in txt2img
processed = process_images(p)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\processing.py", line 486, in process_images
res = process_images_inner(p)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\processing.py", line 628, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\processing.py", line 828, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 323, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 221, in launch_sampling
return func()
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 323, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\k-diffusion\k_diffusion\sampling.py", line 150, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 116, in forward
x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_hijack_utils.py", line 17, in
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_hijack_utils.py", line 28, in call
return self.__orig_func(*args, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward
out = self.diffusion_model(x, t, context=cc)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 778, in forward
h = module(h, emb, context)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
x = layer(x, context)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 334, in forward
x = block(x, context=context[i])
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 269, in forward
return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint
return CheckpointFunction.apply(func, len(inputs), *args)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 129, in forward
output_tensors = ctx.run_function(*ctx.input_tensors)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 272, in _forward
x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_hijack_optimizations.py", line 246, in split_cross_attention_forward_invokeAI
r = einsum_op(q, k, v)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_hijack_optimizations.py", line 221, in einsum_op
return einsum_op_dml(q, k, v)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_hijack_optimizations.py", line 209, in einsum_op_dml
return einsum_op_tensor_mem(q, k, v, mem_reserved - mem_active if mem_reserved > mem_active else 1)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_hijack_optimizations.py", line 193, in einsum_op_tensor_mem
return einsum_op_slice_0(q, k, v, q.shape[0] // div)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_hijack_optimizations.py", line 162, in einsum_op_slice_0
r[i:end] = einsum_op_compvis(q[i:end], k[i:end], v[i:end])
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\modules\sd_hijack_optimizations.py", line 154, in einsum_op_compvis
s = einsum('b i d, b j d -> b i j', q, k)
File "G:\SD\WINAMD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\functional.py", line 378, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError

14 replies

Kaori42 Feb 17, 2023

Ok now I got a "driver" crash while generating (probably cause I was out of vram), see comment above, leaving log if you want (runtime error without more informations)

Error completing request█████████████████████████████████▎ | 11/20 [00:05<00:04, 2.08it/s]
Arguments: ('task(kjlfgs38p2clilx)', '1girl, solo', 'sketch by bad-artist', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 800, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, 0, 0, 4, 512, 512, True, 'None', 'None', 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
File "G:\SD\stable-diffusion-webui-directml-master\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "G:\SD\stable-diffusion-webui-directml-master\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "G:\SD\stable-diffusion-webui-directml-master\modules\txt2img.py", line 56, in txt2img
processed = process_images(p)
File "G:\SD\stable-diffusion-webui-directml-master\modules\processing.py", line 486, in process_images
res = process_images_inner(p)
File "G:\SD\stable-diffusion-webui-directml-master\modules\processing.py", line 628, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "G:\SD\stable-diffusion-webui-directml-master\modules\processing.py", line 828, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "G:\SD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 323, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "G:\SD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 221, in launch_sampling
return func()
File "G:\SD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 323, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "G:\SD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "G:\SD\stable-diffusion-webui-directml-master\repositories\k-diffusion\k_diffusion\sampling.py", line 150, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "G:\SD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 139, in forward
devices.test_for_nans(x_out, "unet")
File "G:\SD\stable-diffusion-webui-directml-master\modules\devices.py", line 162, in test_for_nans
if not torch.all(torch.isnan(x)).item():
RuntimeError

Kaori42 Feb 17, 2023

I know I write a lot but I really want that to work cause finally having that webui on windows and latest amd hardware is insane, I have this error while trying to generate 512x512 image with the --opt-sub-quad-attention (no more performance issues except vram, also it was working earlier with argument and 512p) :

edit : after a restart it's working

Arguments: ('task(qywx25ll0cc0855)', '1girl, solo', 'sketch by bad-artist', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 520, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, 0, 0, 4, 512, 512, True, 'None', 'None', 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
File "G:\SD\stable-diffusion-webui-directml-master\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "G:\SD\stable-diffusion-webui-directml-master\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "G:\SD\stable-diffusion-webui-directml-master\modules\txt2img.py", line 56, in txt2img
processed = process_images(p)
File "G:\SD\stable-diffusion-webui-directml-master\modules\processing.py", line 486, in process_images
res = process_images_inner(p)
File "G:\SD\stable-diffusion-webui-directml-master\modules\processing.py", line 628, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "G:\SD\stable-diffusion-webui-directml-master\modules\processing.py", line 828, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "G:\SD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 323, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "G:\SD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 221, in launch_sampling
return func()
File "G:\SD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 323, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "G:\SD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "G:\SD\stable-diffusion-webui-directml-master\repositories\k-diffusion\k_diffusion\sampling.py", line 150, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "G:\SD\stable-diffusion-webui-directml-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "G:\SD\stable-diffusion-webui-directml-master\modules\sd_samplers_kdiffusion.py", line 139, in forward
devices.test_for_nans(x_out, "unet")
File "G:\SD\stable-diffusion-webui-directml-master\modules\devices.py", line 181, in test_for_nans
raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Spaceman466 Feb 22, 2023

I was getting the "Could not allocate tensor with []bytes" error, but after using --opt-sub-quad-attention, i now crash after generating 1024x1024 instead of before. Very fast, but i get the error after finishing the 20th step. Any clue?

Spaceman466 Feb 22, 2023

In addition to this, is the byte number the total number or something else. Since it is crashing at 1gb, and im on a 12gb 6700XT

Lycoris-SF Feb 26, 2023

In addition to this, is the byte number the total number or something else. Since it is crashing at 1gb, and im on a 12gb 6700XT

A csdn blog showing a strange case, i dk if it means something.
https://blog.csdn.net/MirageTanker/article/details/127998036
By the way it's a CUDA case, so u can only take a reference.

fanisspr · 2023-02-17T17:33:11Z

fanisspr
Feb 17, 2023

I eventually could get it to run stable diffusion 2.1.

RX 580 8gb
~1.3s/it for 512x512 on windows 10,
I used arguments --no-half --medvram

I opened this issue about problems I encountered and how I got it to work:
#lshqqytiger#5 (comment)

0 replies

CowboyWanderlust · 2023-02-17T17:50:14Z

CowboyWanderlust
Feb 17, 2023

I seem to be having issues getting past gfpan. Reading some of the other comments, it looks like the program can hang here and be skipped, but once I hit the "any" key hah it just closes the program. Don't mind my abhorrent file structure... This is my 3rd attempt on getting anyone's SD setup running. I feel like I'm missing something.

3 replies

Shad0wyDr3amz Feb 17, 2023

if it hangs for longer than 30 mins then hit enter its all dependant upon internet speed

CowboyWanderlust Feb 17, 2023

Oh wow had no idea it was that large a download. It doesn't indicate that it's downloading anything. What size is the gfpan file it's in need of? Also I was trying to run this on shaky internet. Should I try again when I get home?

lshqqytiger Feb 17, 2023

launch.py failed to find git from your PATH.
Make sure that git is installed in your PC, and the location of git.exe exists on PATH.

GrahamboJangles · 2023-02-17T18:21:04Z

GrahamboJangles
Feb 17, 2023

I had this error:

venv "D:\...\stable-diffusion-webui-directml-master\venv\Scripts\Python.exe"
Python 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)]
Commit hash: <none>
Installing gfpgan
Traceback (most recent call last):
  File "D:\...\stable-diffusion-webui-directml-master\launch.py", line 352, in <module>
    prepare_environment()
  File "D:\...\stable-diffusion-webui-directml-master\launch.py", line 271, in prepare_environment
    run_pip(f"install {gfpgan_package}", "gfpgan")
  File "D:\...\stable-diffusion-webui-directml-master\launch.py", line 137, in run_pip
    return run(f'"{python}" -m pip {args} --prefer-binary{index_url_line}', desc=f"Installing {desc}", errdesc=f"Couldn't install {desc}")
  File "D:\stable diffusion\stable-diffusion-webui-directml-master\launch.py", line 105, in run
    raise RuntimeError(message)
RuntimeError: Couldn't install gfpgan.
Command: "D:\...\python.exe" -m pip install git+https://github.com/TencentARC/GFPGAN.git@8d2447a2d918f8eba5a4a01463fd48e45126a379 --prefer-binary
Error code: 1
stdout: Collecting git+https://github.com/TencentARC/GFPGAN.git@8d2447a2d918f8eba5a4a01463fd48e45126a379
  Cloning https://github.com/TencentARC/GFPGAN.git (to revision 8d2447a2d918f8eba5a4a01463fd48e45126a379) to c:\users\ryzen\appdata\local\temp\pip-req-build-b2ih0yg6

stderr:   Running command git clone --filter=blob:none --quiet https://github.com/TencentARC/GFPGAN.git 'C:\Users\Username\AppData\Local\Temp\pip-req-build-b2ih0yg6'
  fatal: unable to access 'https://github.com/TencentARC/GFPGAN.git/': error setting certificate verify locations:
    CAfile: A:/Coding_and_Scripting/Git/mingw64/ssl/certs/ca-bundle.crt
    CApath: none
  error: subprocess-exited-with-error

  git clone --filter=blob:none --quiet https://github.com/TencentARC/GFPGAN.git 'C:\Users\Ryzen\AppData\Local\Temp\pip-req-build-b2ih0yg6' did not run successfully.
  exit code: 128

  See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

git clone --filter=blob:none --quiet https://github.com/TencentARC/GFPGAN.git 'C:\Users\Ryzen\AppData\Local\Temp\pip-req-build-b2ih0yg6' did not run successfully.
exit code: 128

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
WARNING: You are using pip version 22.0.4; however, version 23.0 is available.
You should consider upgrading via the 'D:\...\python.exe -m pip install --upgrade pip' command.

And I fixed it by adding set GIT_SSL_NO_VERIFY=true at the top of webui-user.bat

1 reply

CowboyWanderlust Feb 17, 2023

Nice!

cyatarow · 2023-02-18T01:37:37Z

cyatarow
Feb 18, 2023

Images are generated successfully, but live previews are not displayed.

My environment:

Windows 11 22H2
Radeon RX 5500 XT 8GB
COMMANDLINE_ARGS=--opt-sub-quad-attention --medvram --no-half-vae

9 replies

malkasun Jul 27, 2023

My desktop specs are the same as your specs. How to install it? Can you give an installation tutorial?

LyonelDangue Jul 27, 2023

malkasun, try following this tutorial, but instead of cloning the same git as him, clone the directml version: https://www.youtube.com/watch?v=onmqbI5XPH8

malkasun Jul 27, 2023

After installation, Show this error.

malkasun Jul 27, 2023

How to change this? COMMANDLINE_ARGS=--opt-sub-quad-attention --medvram --no-half-vae

lshqqytiger Jul 27, 2023

You should clone https://github.com/lshqqytiger/stable-diffusion-webui-directml. not AUTOMATIC1111/stable-diffusion-webui.

qwerkilo · 2023-02-18T02:05:44Z

qwerkilo
Feb 18, 2023

I have found that when batch size is set to more than 1 initially, it stays faster when batch size is changed to 1 after running the program once

set COMMANDLINE_ARGS=--opt-split-attention-v1 --opt-sub-quad-attention --medvram --disable-nan-check --autolaunch

0 replies

5ft4 · 2023-02-18T02:14:56Z

5ft4
Feb 18, 2023

I got it set up successfully following what @Shad0wyDr3amz listed above.

I just came from a Linux Install and it seems to be substantially slower.

My environment:

Windows 10
AMD Radeon RX 6600XT
COMMANDLINE_ARGS=--opt-sub-quad-attention --medvram --disable-nan-check --autolaunch

Not sure why it is slower (I'm a bit of a dumbass with this stuff) but it is working.

1 reply

Shad0wyDr3amz Feb 18, 2023

Everyone ive talked to said its slower than shark ect... could be something related to it being a newer way and the code needing more work

GREG114 · 2023-02-18T08:09:06Z

GREG114
Feb 18, 2023

there is a notice "Memory optimization for DirectML is disabled. Because this is not Windows platform." don't know why
My environment:
Windows 11
AMD Radeon RX 6850M XT

C:\stable-diffusion-webui-directml>webui --opt-sub-quad-attention
venv "C:\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
Commit hash: 9360590
Installing requirements for Web UI
Launching Web UI with arguments: --opt-sub-quad-attention
Memory optimization for DirectML is disabled. Because this is not Windows platform.
Disabled experimental graphic memory optimizations.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
No module 'xformers'. Proceeding without it.

6 replies

GREG114 Feb 18, 2023

512*768,2.18it/s

ClashSAN Feb 18, 2023
Collaborator Author

well that still seems pretty accurate to what you should be getting, DirectML is slow compared to rocm+linux. you could try increasing your batch size like said above #7870 (comment) and accept using lower steps with optimized samplers (DPM 2M++ Karras)

I also tried direct-ml with onnx https://github.com/azuritecoin/OnnxDiffusersUI
and it is a bit faster, but I don't think it has benefits when increasing batch size (it is broken)
With that, I could run fp16 onnx models just up to 512x512.

lshqqytiger Feb 18, 2023

Did you install driver that is distributed by AMD.com instead of Windows'?

GREG114 Feb 18, 2023

yes,whql-amd-software-adrenalin-edition-22.11.2-win10-win11-dec8

GREG114 Feb 18, 2023

well that still seems pretty accurate to what you should be getting, DirectML is slow compared to rocm+linux. you could try increasing your batch size like said above #7870 (comment) and accept using lower steps with optimized samplers (DPM 2M++ Karras)

I also tried direct-ml with onnx https://github.com/azuritecoin/OnnxDiffusersUI and it is a bit faster, but I don't think it has benefits when increasing batch size (it is broken) With that, I could run fp16 onnx models just up to 512x512.

tkx，I tried DPM 2M++ Karras, 1024*768 3.46s/it very good

JlECHOY6OJlBAH · 2023-02-18T12:22:50Z

JlECHOY6OJlBAH
Feb 18, 2023

Has anyone encountered a similar error? Please help a newbie
My environment:
Windows 10
AMD Radeon RX 6600 XT

1 reply

lshqqytiger Feb 18, 2023

I fixed this issue. Pull latest commit and try again.

Rainz3 · 2023-02-18T13:49:21Z

Rainz3
Feb 18, 2023

I've got it working on a 5700xt. Using --no-half --medvram I get around 2.2s/it. If I use --opt-sub-quad-attention --medvram I can get as low as 1.5s/it but will regularly have modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's no enough precision to represent this picture. Try adding --no-half-vae commandline to fix this.

Also, Dreambooth seems to still only be using CPU. I got it to run on GPU once until an Error popped up running about running out of VRAM. Now it just uses CPU.

7 replies

lshqqytiger Feb 18, 2023

You should modify source code of accelerate to run dreambooth using accelerate.
Open state.py with text editor, and let AcceleratorState have DirectML device property.

brianmcgrory Feb 18, 2023

You should modify source code of accelerate to run dreambooth using accelerate. Open state.py with text editor, and let AcceleratorState have DirectML device property.

So, just in stable-diffusion-webui-directml\venv\Lib\site-packages\accelerate
edit state,py and switch to use torch_directml.device()? also, when you make changes to these files and re-run stable diffusion it seems to overwrite your changes anyway?

DoctorPavel Feb 18, 2023

I've got it working on a 5700xt. Using --no-half --medvram I get around 2.2s/it. If I use --opt-sub-quad-attention --medvram I can get as low as 1.5s/it but will regularly have modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's no enough precision to represent this picture. Try adding --no-half-vae commandline to fix this.

Also, Dreambooth seems to still only be using CPU. I got it to run on GPU once until an Error popped up running about running out of VRAM. Now it just uses CPU.

What resolutions can you go up to? Having the same card I struggle to get any sensible resolutions and hires fix is also pretty much a no-go.

Rainz3 Feb 19, 2023

I've got it working on a 5700xt. Using --no-half --medvram I get around 2.2s/it. If I use --opt-sub-quad-attention --medvram I can get as low as 1.5s/it but will regularly have modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's no enough precision to represent this picture. Try adding --no-half-vae commandline to fix this.
Also, Dreambooth seems to still only be using CPU. I got it to run on GPU once until an Error popped up running about running out of VRAM. Now it just uses CPU.

What resolutions can you go up to? Having the same card I struggle to get any sensible resolutions and hires fix is also pretty much a no-go.

I can do up to 768*768 before it starts to become an issue. I can't use hires at all. Also, eventually no matter what setting I use it says there isn't enough gpu memory.

Brutalbeard Feb 19, 2023

i'm on 6700xt, and using --opt-sub-quad-attention --medvram --disable-nan-check (thanks @Rainz3) has it cruising pretty quickly.

getting about 1.25s/it at 512x512, euler a, 20 sampling steps (ya know, the default)

edit: 4 batches, 3 per batch

fanisspr · 2023-03-01T14:58:34Z

fanisspr
Mar 1, 2023

Anyone else having trouble loading safetensors?

I just can't find a way to use safetensors without errors.

I must use --medvram or else I will have a "can't allocate memory" error. But if I use it, safetensors won't load and I get this error:

And of course, if I load a checkpoint, then I can't load a safentensor from the UI.

I am also using set SAFETENSORS_FAST_GPU=1, which is supposed to be used.

Any ideas how I can load safetensors, while using --medvram?

9 replies

ClashSAN Mar 2, 2023
Collaborator Author

fyi, --lowram and --lowvram are different!

elen07zz Mar 7, 2023

thanks

Xeltosh Mar 7, 2023

thanks for the info.

for my 2GB setup, i tried different options and somehow some worked a little time but after that never again. removed thoes args and it works again after one hiccup

ClashSAN Mar 7, 2023
Collaborator Author

Can you share your settings?

Xeltosh Mar 8, 2023

nothing much, just

set COMMANDLINE_ARGS=--lowvram --always-batch-cond-uncond --disable-nan-check --disable-safe-unpickle
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:24
set CUDA_LAUNCH_BLOCKING=1

on my NVIDIA 960M on my "crappy" old Laptop. Don´t know if this works for AMD.
the CUDA_LAUNCH_BLOCKING is supposed to prevent the system to terminate the process because of timeout. If that doesn´t help and you still get a timeout-Error with 2GB VRAM, try to disable the GPU-Watchdog explained here:
https://learn.microsoft.com/en-us/windows-hardware/drivers/display/disabling-the-watchdog-timer-while-testing-display-drivers

with this setup, i can generate 384x512 pictures and can upscale them x2 with hires.fix. Just takes around 8-10 minutes per picture though XD better then nothing i guess

ElusiveT · 2023-03-03T23:26:59Z

ElusiveT
Mar 3, 2023

This works! I'd been having a ton of problems trying to use the ROCM webui version in linux, so I thought there was just no good way to get webui running on an amd gpu with windows, but this works just fine. Thanks for posting this

5700XT gpu
using command line arguments:
--opt-split-attention --disable-nan-check --autolaunch --medvram
getting around 1.6it/s

0 replies

jiangfeng79 · 2023-03-06T04:16:53Z

jiangfeng79
Mar 6, 2023

Pulled and installed on 3/3/2023
7900xtx driver version: Adrenalin 23.2.2, 3A platform, 5900x, 32G Ddr4

Everything works fine with default settings. Lora worked well. ControlNet seems not working when trying to get Pose info from image, guess I am missing CUDA.

One strange issue:
When start the webUI, a fresh run with batch count set to 1 would be very fast, a complex scenario with Lora would yields 3 it/second, when the batch count increase to 2 or anything bigger, the second image's iteration dropped to 1.5 per second, that is a 50% drop. When the batch count set to big values, e.g. 100, OOM also occurs sporadically.

Will play around with optimization flags, "Hardware GPU scheduling" and "browser hardware acceleration" to see any performance impact.

Also played with Nod.AI, seems good and fast, but requires additional compilation(disk space and CPU time), most importantly, missing features like extensions, Lora, ControlNet, etc.

0 replies

Rainz3 · 2023-03-07T01:09:08Z

Rainz3
Mar 7, 2023

Not really used to github, can someone explain how to update? Do I just run git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml && cd stable-diffusion-webui-directml && git submodule init && git submodule update in the stable-diffusion root directory? I've seen someone mention using git pull but when I run it I get "Please commit your changes or stash them before you merge."

I had to do a reinstall for an unrelated windows issue and now I can't even do a 512*768 without running out of memory. This is when running with --opt-sub-quad-attention --no-half --precision full --medvram --disable-nan-check.

Also, several of my models won't run with control net. Like when using IlluminatiDiffusion I get "RuntimeError: mat1 and mat2 shapes cannot be multiplied (77x1024 and 768x320)" while using canny model and canny preprocessor.

5 replies

lshqqytiger Mar 7, 2023

If you edited some files, copy them to other path and run git stash.
And then you can pull latest commits.

dennis-gonzales Mar 7, 2023

Answering the question `how to update?`

So far you're on a track as you're able to see the message "Please commit your changes or stash them before you merge.", well that just means you have made some changes to the codebase and it conficts with the changes you're trying to pull.

two things to resolve this issue would be:

Revert your changes
As the previous commenter said you can stash your changes and then pop it later (you might need to resolve the merge confict).

Rainz3 Mar 7, 2023

If you edited some files, copy them to other path and run git stash. And then you can pull latest commits.

Thanks! That did it. Will that update the repositories as well or do I need
to do a git pull in k-diffusion and stable-diffusion-stability-ai folder as
well?

lshqqytiger Mar 8, 2023

git pull doesn't pull latest commit of submodules (in repositories folder)
Although submodules are rarely updated, if you want, you can pull latest commit of submodule using git submodule update.

DLockholm Mar 8, 2023

git pull doesn't pull latest commit of submodules (in repositories folder) Although submodules are rarely updated, if you want, you can pull latest commit of submodule using git submodule update.

Hey and how can I know what is the latest version? And in general what is the way to go to update your version of the SD?
Sorry for dumb questions is just that all of this is new to me and I can't see to find an straightforward answer, never used Github before.
:(

WalterShomer · 2023-04-13T21:38:00Z

WalterShomer
Apr 13, 2023

I'm sorry if formatting is bad, the sites arrangement isn't intuitive.

Is it possible to train loras? or any sort of training for that matter(embedding or models)? i have rx5700tx which if i understood correctly is somewhere at the limit of being possible with the vram? (i might be misunderstanding something here)
How to remove the force\fall back to cpu?

4 replies

ClashSAN Apr 14, 2023
Collaborator Author

@WalterShomer

on cpu, see if you can find how to use cpu from the wiki.
a1111 extension - https://github.com/d8ahazard/sd_dreambooth_extension

currently, not very sure that training on amd gpu is possible on windows.

this team at SHARK may eventually achieve that goal:
nod-ai/SHARK-Studio@da449b7

also directml's team, i don't know https://github.com/microsoft/DirectML/commits/master

when using linux and rocm, training should be possible with 16gb or so using diffusers training scripts, or maybe less when using loras. you would need bitsandbytes-rocm for this..

With nvidia cards, we can train with less, since memory saving attention from xformers is applied, and stacking with bitsandbytes during training, reducing vram requirements.

something we aren't sure about is if torch2.0's sdp attention could provide memory savings when using rocm, currently it does not do anything like this for rocm users.

WalterShomer Apr 14, 2023

Thanks for the detailed reply.

The amd directml asking because somewhere i've seen this " > You should modify source code of accelerate to run dreambooth using accelerate. Open state.py with text editor, and let AcceleratorState have DirectML device property. " being talked about GPU and dreambooth, so made thought it might work(no perfectly but some what) and somebody knows something.

and thanks again, did not know that about the training part with rocm and amd never seen it mentioned.

ClashSAN Apr 15, 2023
Collaborator Author

@WalterShomer hi, see huggingface/diffusers#684 (comment)

training part with rocm

It would actually supposedly only need 13gb for training, so that's very nice.

XanderTheDev Jul 14, 2023

Did someone already got Dreambooth working or LoRa training (within a 12GB VRAM GPU)? When I start training I get the error: : Exception training model: ''H:\Dreambooth\01.jpg' is not in list'. And yes nothing is wrong with my images. Alternatives that I could find apart from A1111 are things that need CUDA to work. I would really like to train my own model on my pc.

nikolov6819 · 2023-05-08T11:50:01Z

nikolov6819
May 8, 2023

Hi. I have such a problem. I start the second generation and everything breaks when the picture is almost ready. I changed launch settings many times, but always the same problem. I am powerless to fix it. Help me please.

6 replies

motorist828 May 8, 2023

video card type Radeon Pro Vega 56 8GB

i also use vega 56
--medvram --disable-nan-check --autolaunch --theme dark --no-half --no-half-vae --opt-sub-quad-attention

nikolov6819 May 9, 2023

This option doesn't work. Thrown out of Windows and a blue screen appeared with a qr code

nikolov6819 May 9, 2023

The --medvram setting did not fit at all, it throws it out of the system. Only --lowvram works but only for one generation.

motorist828 May 9, 2023

The --medvram setting did not fit at all, it throws it out of the system. Only --lowvram works but only for one generation.

This is already a system problem, for example, the power supply cannot withstand such a load or problems with the video card
--lowvram just limits the GPU very much, it runs at half its capacity at best
You can try setting the GPU power limit in the Adrenaline driver -30-40%

nikolov6819 May 14, 2023

Thank you. I reinstalled the driver and everything worked.

nikolov6819 · 2023-05-09T08:08:35Z

nikolov6819
May 9, 2023

COMMANDLINE_ARGS=--no-half --always-batch-cond-uncond --opt-sub-quad-attention --lowvram --disable-nan-check --autolaunch --theme dark
Now I use these parameters, the generation has become faster. But the error still remains.

0 replies

jvp18ob · 2023-05-20T10:04:41Z

jvp18ob
May 20, 2023

Hello everyone. Happy to say SD is working with my PC Build
GPU is RX 6600 (Non XT)

Currently generating at 512 x768

0 replies

bryankhamly · 2023-06-01T07:31:21Z

bryankhamly
Jun 1, 2023

I'm getting an error I haven't seen anyone else get nor is Googling it helping me :( . Any ideas? AMD 6700XT.

0 replies

Estrich1 · 2023-06-02T17:21:01Z

Estrich1
Jun 2, 2023

I don't know what I'm doing wrong. I've looked for 3 consecutive days now. I try to press the generate button in SD it immediately fails. I check the log and it says (RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'). Easy fix I think and add --no-half to "COMMANDLINE_ARGS=". It runs! Nice! It's only using CPU at 12it/s. Not so nice... It just will not use my GPU. I have tried (--opt-sub-quad-attention, --disable-nan-check, --medvram, and --lowvram) to no avail. It's strange because many other users in here are running my same specs without --no-half with no problem. I have been up for 24 hours and I'm going to lay down for a few hours now...

Windows 10 22H2 x64 Up to date.
AMD Radeon RX 6700 XT (12GB VRAM) v23.5.1

Without --no-half:

With --no-half: (It takes 5 minutes for one image! thats enough to cook an entire instant cup noodles each time!)

4 replies

lshqqytiger Jun 2, 2023

You are not on latest commit.
git pull and try again.

Estrich1 Jun 2, 2023

you're kidding me... I've never been so relived. thank you so much I have never used git before. It works perfectly! 1.4it/s

etfleck Jun 8, 2023

hello! I had the same issue with same specs as above. Whenever I try to run images it uses my CPU instead of GPU.
I'm also a complete novice with github, and have limited computer knowledge.

You are not on latest commit. git pull and try again.
I don't know how to use git pull. After some googling, I went to command line and input

Is this correct? Sorry if this is too basic.

lshqqytiger Jun 8, 2023

Run these lines below.

$ git switch master
$ git pull

If the problem still exists, run launch.py with --backend directml. (or add it to webui-user.bat)

DarthKX · 2023-06-03T23:52:53Z

DarthKX
Jun 3, 2023

Windows 10 - Ryzen CPU and AMD Radeon Pro WX 9100 (24Gb Vram)
Only managed to render one image at 512x512, tried doing 512x768 and I get a BSOD during generation.
anyone has any idea why, it's fresh windows install, only installed the requirements: git, python 3.10.6 & AMD drivers (automatic detection and install for my board / GPU).

Thanks in advance cause I'm at a loss as to why it would bluescreen. This CPU has an integrated Vega in it, cause this be the issue?

OPT: --opt-sub-quad-attention, --disable-nan-check --autolaunch

1 reply

lshqqytiger Jun 4, 2023

Could you check which GPU works for stable diffusion? (using task manager or something) Vega or Radeon Pro?
DirectML has BSOD issue with Vega.

DarthKX · 2023-06-04T08:41:05Z

DarthKX
Jun 4, 2023

How do I check which GPU is being used? Unfortunately the WX9100 is also Vega based :(, so this might be the issue.

…

Sent from my iPhone

On 4 Jun 2023, at 06:47, Seunghoon Lee ***@***.***> wrote: Could you check which GPU works for stable diffusion? Vega or Radeon Pro? DirectML has BSOD issue with Vega. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

1 reply

lshqqytiger Jun 4, 2023

Oh, I see. I think you should wait until this issue is fixed. microsoft/DirectML#379

DarthKX · 2023-06-04T09:01:46Z

DarthKX
Jun 4, 2023

Thanks though it may be worth trIng to identify which GPU is used (can I just disable the Ryzen iGPU in device manager to test?). This GPU is quite fussy wirh Intel boards... on my 12gen rig it simply won't post :(. Cause my wx9100 has 16gb (not 24 that was a mistake on my part) of VRAM, so it shouldn't be a small Vram issue like mentioned on that directML thread. If it still BSOD I should probably give up on that gpu and see if I can resell it but it's going to be tricky regarding its apparent inability to play nice on Intel Mobos. Thx.

…

Sent from my iPhone

On 4 Jun 2023, at 11:45, Seunghoon Lee ***@***.***> wrote: Oh, I see. I think you should wait until this issue is fixed. microsoft/DirectML#379 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

0 replies

XanderTheDev · 2023-07-14T18:13:31Z

XanderTheDev
Jul 14, 2023

Did someone already got Dreambooth working or LoRa training (within a 12GB VRAM GPU)? When I start training I get the error: : Exception training model: ''H:\Dreambooth\01.jpg' is not in list'. And yes nothing is wrong with my images. Alternatives that I could find apart from A1111 are things that need CUDA to work. I would really like to train my own model on my pc.

0 replies

pandawhore · 2023-09-17T04:17:07Z

pandawhore
Sep 17, 2023

post a comment if you got @lshqqytiger 's fork working with your gpu.

Its good to observe if it works for a variety of gpus.
small (4gb) RX 570 gpu
~4s/it for 512x512 on windows 10,
slow, since I had to use --opt-sub-quad-attention --lowvram
maximum sizes: 512x768, 640x640
I did test loras, and control net extension, they work.

commit used lshqqytiger@dce51c5 used the modified modules: https://github.com/lshqqytiger/k-diffusion-directml https://github.com/lshqqytiger/stablediffusion-directml

I have to say lshqqytiger, Sir, you are a legend.

I also have have a 5700XT and all things considered it works a treat, as much as can be expected anyway with AMD.
I am running Win 11.
I installed as per the instructions, with no other requirements.
I launch the webUI with arguments --opt-sub-quad-attention --lowvram --disable-nan-check
I get no error and no crashes related to GPU memory.
Everything that is supposed to work looks to be working.
@ 512x512 I get 1.73s/it and when using Hires.fix to scale up to 1280x1280 I get 11.78 to 16.06s/it without error.
I generally use DPM++ 2M Karras as sampler and Latent (nearest) as upscaler.

I had tried Linux but it was an epic fail with my card, rocm just didn't work.

So thank you, you have now fuelled my new addiction .

Anyone having memory issues on older cards try the --lowvram instead of --medvram and see if that helps.

Cheers.

0 replies

Windows + AMD GPUs (DirectML) #7870

ClashSAN Feb 17, 2023 Collaborator

Replies: 86 comments · 237 replies

ClashSAN Feb 17, 2023 Collaborator Author

ClashSAN Feb 24, 2023 Collaborator Author

ClashSAN Feb 18, 2023 Collaborator Author

ClashSAN
Feb 17, 2023
Collaborator

Replies: 86 comments 237 replies

ClashSAN Feb 17, 2023
Collaborator Author

ClashSAN Feb 24, 2023
Collaborator Author

ClashSAN Feb 18, 2023
Collaborator Author