How to improve performance on M1 / M2 Macs #7453

brkirch · 2023-02-01T08:42:29Z

brkirch
Feb 1, 2023
Collaborator

Update (April 12, 2023):

If you use an Apple silicon Mac and want to try the latest features and optimizations, look here:
https://github.com/brkirch/stable-diffusion-webui/releases

The "experimental" builds will usually have the latest features and optimizations, and other builds will be intended to provide a stable version that may include new features once they've been tested sufficiently. Note that these currently are not intended to be considered official builds and they will not necessary reflect exactly what this main repository has at any given point in time.

Also, Intel versions are planned in the near future when I can track down exactly what is causing the reported PyTorch 2.0 issues with k-diffusion and UniPC samplers.

Original Post:

There have been several additions and changes made recently that can improve performance on macOS:

Half precision support (using web UI without --no-half) with --upcast-sampling. This significantly lowers memory usage and improves performance. This has been the default for macOS since e0df864 so other than upgrading web UI, no action is needed - unless you've edited webui-user.sh to override the default command line arguments and included --no-half. In that case you'll need to remove --no-half and make sure --upcast-sampling is used instead. Note that if you want to train embeddings or hypernetworks, you should start web UI with --no-half (e.g. ./webui.sh --no-half; you don't have to fully override the default command lines arguments to remove --upcast-sampling as --no-half overrides it).
Sub-quadratic attention. Add the command line argument --opt-sub-quad-attention to use this. This is the recommended cross attention optimization to use with newer PyTorch versions. It manages memory far better than any of the other cross attention optimizations available to Macs and is required for large image sizes.
Support for nightly PyTorch builds. The latest nightly builds get roughly 25% better performance than 1.12.1 (the current default).

To use all of these new improvements, you don't need to do much; just unzip this webui-user.sh file and replace the webui-user.sh file in stable-diffusion-webui. The next time you run ./webui.sh the web UI dependencies will be reinstalled, along with the latest nightly build of PyTorch.

Keep in mind that the nightly PyTorch builds may have issues, especially since they are updated every day. If you encounter problems, you can always revert back by running git checkout webui-user.sh and then you can delete the venv-torch-nightly folder.

If you are having problems but instead want to try reverting to an older PyTorch nightly, replace the line in webui-user.sh that starts with export TORCH_COMMAND= with export TORCH_COMMAND="pip install --pre torch==2.0.0.dev20230131 torchvision==0.15.0.dev20230131 -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html", replace the two 20230131 dates with whatever previous date you would like to use a PyTorch nightly build from, then delete the venv-torch-nightly folder and run ./webui.sh.

Please post any issues and feedback here regarding the above instructions or anything else related to web UI performance on Macs.

Edit: As of Feburary 11, the PyTorch nightly builds have broken the ability to use torch.nn.functional.layer_norm with half precision and web UI doesn't currently have a patch to fix it. I'll implement a patch and put in a PR if newer nightly builds show a performance improvement, but right now the latest build has slightly worse performance. For the time being, I've modified the webui-user.sh file in the zip linked above to use the Feburary 10 PyTorch build. Fixed in the latest PyTorch builds (pytorch/pytorch@075a494).

system1system2 · 2023-02-01T09:30:26Z

system1system2
Feb 1, 2023

I did test this configuration in an upgrade scenario as I think it's the most likely to occur for most users:

I renamed the existing webui-user.sh as webui-user.sh_bak
I copied over the new webui-user.sh
I made it executable with chmod 755 webui-user.sh
I run ./webui-user.sh (you should expect no visible output in the terminal)
I run ./webui.sh

As expected, the script created the new venv environment and downloaded the torch 2.0 nightly build.

This is the setup I ended up with after the process (captured with the sd-extension-system-info extension for A1111).

With this setup, I was able to generate an image in just 4s (not sure why the screenshot below reports 6.34, it's not correct) with the v1-5-pruned-emaonly.ckpt model and the default A1111 parameters.

This is more than 50% time reduction from what I could get out of the previous configuration with Torch 1.12.0.

Other configurations have generated images with various performance improvements:

A 768x768px image generated with the v2-1_512-ema-pruned.ckpt model, the DDIM sampler, and 100 diffusion steps took 1min and 4s, compared to 1min and 20s with Torch 1.12.0
A 768x768px image generated with the v2-1_512-ema-pruned.ckpt model, the DPM++ 2S a Karras sampler, and 30 diffusion steps took 37s, compared to 50s with Torch 1.12.0

So, the bottom line is that the configuration works and it's faster.

However, there is still a significant problem with the v2-1_768-ema-pruned.ckpt model:

Before this new setup, any attempt to generate an image would return the error:

modules.devices.NansException: A tensor with all NaNs was produced in VAE. Use --disable-nan-check commandline argument to disable this check.

I have documented this in #6923 (see my message at the bottom of the thread).

With the new setup, as the --no-half-vae flag has been removed, any attempt to generate an image returns the error:

modules.devices.NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

4 replies

brkirch Feb 1, 2023
Collaborator Author

This detail belongs in the general instructions for installation / usage on Macs (I'll add it there when I revise the instructions, hopefully in the next day or so), but it is recommended that if you plan to use SD 2.1 models, you go to Settings -> User Interface and set Quicksettings list to sd_model_checkpoint, upcast_attn then click Apply settings and Reload UI. This adds a checkbox (Upcast cross attention layer to float32) next to the model dropdown that should be checked whenever using SD 2.1 models. Uncheck it after you load a model that is not SD 2.1 based.

system1system2 Feb 1, 2023

This fixed my issue with the v2-1_768-ema-pruned.ckpt model. Thank you!

Happy to test other configurations if you want me to.

furiousjay Feb 20, 2023

Thanks a lot. It's just super nice of you to write down exactly how to use the exact way how to use the webui-user.sh. As I n00b, I highly appreciate it and would have failed otherwise. I think you helped many other users. I have two short question.

After everything was done according to your 5 steps, is it necessary to repeat this process in the future? Or will we only have to start Automatic1111 via ./webui.sh ?
Will this new webui-user.sh be affected by updates or can we use it after installing future Versions of Automatic1111?

Thanks a lot for your help

SeanFrohman Oct 23, 2023

I have never gotten an image on my 32 GB DDR5 M1 Max MacBook Pro to form in less than 30 seconds unless it was 512 x 512 and 20 steps one word MAYBE

system1system2 · 2023-02-03T09:30:58Z

system1system2
Feb 3, 2023

One other important feedback, @brkirch: before this improved configuration, my M2 system would become incredibly hot during the image generation, triggering the fan at maximum speed. It never happened before with A1111 on my previous M1 system (even during a 6h long batch generation), and it doesn't happen with any other AI model I use (TTS, STT, etc.).

However, it just happened during my test of the new InvokeAI 2.3 RC.

So, both A1111 (latest commit) and InvokeAI 2.3 RC somehow push the M2 system to its limits (probably not for the right reasons), but your configuration for A1111 seem to solve/mitigate the problem.

I'll run longer batches to see if it's still true during long inference sessions.

2 replies

system1system2 Feb 4, 2023

Nevermind, @brkirch: even with your configuration, a single image generation with a certain type of settings raises the temperature and triggers the fan at high or max speed.

For example: Analog Diffusion 1.0, DPM++ 2S a Karras, 30 steps, CFG Scale 9, 1024x1024px. 2min 53s.

I never had a similar thing with the previous M1 system. Even during 6h batch generations.

whosawhatsis Feb 14, 2023

If your system isn't getting hot, it means it's running slower than it should, and you're not getting the best performance. If you WANT it to go slower and not get hot, turn on low power mode.

mandyohhh · 2023-02-03T14:51:11Z

mandyohhh
Feb 3, 2023

Hey Guys!
After i tried the method provided in this conversation I kept running into this error when generating image: “LayerNormKernelImpl” not implemented for ‘Half’, which is the same issue that i encountered when I first installed webui. The solution I found was to use these 2 commands: --precision full --no-half. So I guess it's contradicting with the performance improvement plan haha.

Any good ideas on how to walk around this issue?

5 replies

brkirch Feb 3, 2023
Collaborator Author

It isn’t using GPU acceleration at all, the error you are seeing is what happens when it defaults to CPU without --no-half. You’re using a Mac with a M1 or M2 processor? What version of macOS are you using?

mandyohhh Feb 4, 2023

Im using a m1 macbook and the os is Monterey. Will updating to the latest Ventura help?

mandyohhh Feb 4, 2023

update: turns out it is indeed an OS issue! my previous Monterey is too old to support GPU acceleration, after installing the latest Ventura OS, not only has the error been fixed but the generation speed almost 5x'ed

mmoktl2k8 Aug 1, 2023

Dear Sir, All

I use Code about Stable Diffusion WebUI AUTOMATIC1111 on Mac M1 Pro 2021 (without GPU) , when I run then have 2 error :

Launching Web UI with arguments: --skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled

RuntimeError: MPS backend out of memory (MPS allocated: 9.93 GB, other allocations: 2.03 GB, max allowed: 18.13 GB). Tried to allocate 7.43 GB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

I try install Torch, Tensoflow to active GPU (virtual) and check have GPU, but the same error, this is file web-user.bat :

git pull

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS= --xformers --autolaunch --skip-version-check --precision full --no-half --skip-torch-cuda-test --listen --port 9999 --disable-safe-unpickle --deepdanbooru

call webui.bat

Pls help me, many thanks

ChantMisaya Oct 19, 2023

Hi mmoktl2k8
I have the same issues with you, but my device is M1 MacBook Pro 2020.
I'm not sure whether you can generate pic with xformers missing and 'Torch not compiled with CUDA enabled'. Here is the file modify I tried on my Mac, it works! Although there still shows missing xformers and Torch blablabla

vim webui-macos-env.sh
export COMMANDLINE_ARGS="--no-half --skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate"

Add --no-half in the first parameter of COMMANDLINE_ARGS, and re-exec ./webui.sh, after it shows the model has been loaded, you can enjoy creating pic.
Hope it works!

Vrk3ds · 2023-02-05T19:49:17Z

Vrk3ds
Feb 5, 2023

Tried to do this update, and now I get this error?

launch.py: error: unrecognized arguments: --upcast-sampling

Any ideas?

5 replies

brkirch Feb 5, 2023
Collaborator Author

You need to update web UI with git stash; git pull; git stash pop

Vrk3ds Feb 9, 2023

now I am getting this error, and python unexpectedly quits.

To create a public link, set share=True in launch().
5%|████████ | 1/20 [00:01<00:25, 1.36s/it/AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'
zsh: abort ./webui.sh
(base) scottd@Scotts-Mac-Studio stable-diffusion-webui % /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

brkirch Feb 10, 2023
Collaborator Author

What version of macOS are you using and how much RAM do you have?

brkirch Feb 13, 2023
Collaborator Author

@Vrk3ds I just started getting the exact same problem you are having. I didn't end up finding the exact cause but deleting the repositories folder resolved the issue.

maundytime May 22, 2023

Remove --upcast-sampling in webui-user.sh would trigger this python unexpectedly quitting

soleskun · 2023-02-13T10:06:35Z

soleskun
Feb 13, 2023

Trying to update by replacing webui-user.sh with the one provided, getting this error:

  Running setup.py install for lmdb: started
  Running setup.py install for lmdb: finished with status 'error'

stderr:   Running command git clone --filter=blob:none --quiet https://github.com/TencentARC/GFPGAN.git /private/var/folders/4t/5b7qf7j14y36pr91sj6hd6sw0000gp/T/pip-req-build-vfuhqpfn
  Running command git rev-parse -q --verify 'sha^8d2447a2d918f8eba5a4a01463fd48e45126a379'
  Running command git fetch -q https://github.com/TencentARC/GFPGAN.git 8d2447a2d918f8eba5a4a01463fd48e45126a379
  Running command git checkout -q 8d2447a2d918f8eba5a4a01463fd48e45126a379
  DEPRECATION: lmdb is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  error: subprocess-exited-with-error
  
  × Running setup.py install for lmdb did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      py-lmdb: Using bundled liblmdb with py-lmdb patches; override with LMDB_FORCE_SYSTEM=1 or LMDB_PURE=1.
      patching file lmdb.h
      patching file mdb.c
      py-lmdb: Using CPython extension; override with LMDB_FORCE_CFFI=1.
      running install
      /.projects/novelai/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build/lib.macosx-12-arm64-cpython-310
      creating build/lib.macosx-12-arm64-cpython-310/lmdb
      copying lmdb/cffi.py -> build/lib.macosx-12-arm64-cpython-310/lmdb
      copying lmdb/__init__.py -> build/lib.macosx-12-arm64-cpython-310/lmdb
      copying lmdb/_config.py -> build/lib.macosx-12-arm64-cpython-310/lmdb
      copying lmdb/tool.py -> build/lib.macosx-12-arm64-cpython-310/lmdb
      copying lmdb/__main__.py -> build/lib.macosx-12-arm64-cpython-310/lmdb
      running build_ext
      building 'cpython' extension
      creating build/temp.macosx-12-arm64-cpython-310
      creating build/temp.macosx-12-arm64-cpython-310/build
      creating build/temp.macosx-12-arm64-cpython-310/build/lib
      creating build/temp.macosx-12-arm64-cpython-310/lmdb
      clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Ilib/py-lmdb -Ibuild/lib -I/.projects/novelai/stable-diffusion-webui/venv-torch-nightly/include -I/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c build/lib/mdb.c -o build/temp.macosx-12-arm64-cpython-310/build/lib/mdb.o -DHAVE_PATCHED_LMDB=1 -UNDEBUG -w
      xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> lmdb

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Tried manually installing wheel and lmdb through with pip when working inside the venv-torch-nightly virtual environment, but no dice. I'm assuming this is because the virtual environment gets recreated every time webui.sh is run. Any pointers?
Everything works fine when using the original instructions, I just wanted to try seeing how much of a speed boost I would get out of running PyTorch 2.0.0 lol
I'm on Ventura 13.2.

11 replies

Vitkarus Mar 4, 2023

@soleskun have not yet tried to do anything with PyTorch 2, but if I find a solution, I will write. I thought that maybe it has something to do with the platform? This guide mainly describes the implementation on M processors, but I use Intel and RX6600XT, as far as I know this GPU should support half precision without problems, but there may be some problem with the implementation of this on the side of Apple or Torch developers, who knows. I'm currently using Torch 1.14.0.dev20221025, which I installed back in October, for me it works faster than 1.12.1, but now it's impossible to download, because 1.14 was renamed to 2.0

soleskun Mar 4, 2023

I don't think this is related to hardware at all, considering I'm using an M1 Ultra and we are getting the same error.

furiousjay Mar 5, 2023

I realized I have the same Issue on my MacBook Pro 16 M1 Pro. I get this error after creating 5 to 30 pictures successfully. It's very annoying because I like to produce a lot of pictures overnight and that's no longer possible. It always breaks off. If somebody finds a fix for this bug, please tell us!

Vitkarus Mar 5, 2023

So, I used conda to create venv, in the webui-user.sh I wrote venv_dir=- instead of what was there before, and commented out the line indicating which version of PyTorch to use. Then I installed the latest version via pip, as as wrote here https://pytorch.org/, so that WebUI ended up using the latest version of Torch from my venv. But the result was not so good, for some reason only DDIM (which gives mediocre results), all other samplers give out noise. And I still cannot run the program without --no-half argument. As for performance, it increased by about 15% compared to 1.14dev, I got a 768x768 image with 20 DDIM samples in 50 seconds instead of one minute, and 512x512 took about 15 seconds, which just as it was before.

soleskun Mar 13, 2023

Did a fresh install (currently on dfeee78 and things seem to be running fine (as in: nothing crashes). I'm on Ventura 13.2.1, using an M1 Ultra with 64GB of RAM.
Here's a performance comparison between installs (same commit, just different webui-user.sh arguments):

One word prompt, using default settings (Euler a, 20 steps, 512x512 etc.) and no extras

Default `webui-user.sh`:

Launching Web UI with arguments: --upcast-sampling --no-half-vae --use-cpu interrogate
torch: 1.12.1
2.97it/s
Total time: 7s
Memory Pressure: as expected

`webui-user.sh` from OP:

Launching Web UI with arguments: --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate
torch: 2.1.0.dev20230312
2.65it/s
Total time: 8s
Memory Pressure: half the above

Complex prompt, using complex settings (DPM++ 2M Karras, 22 steps, 512x768 with 2x hires fix using a custom upscaler, several LoRA and ControlNet with OpenPose)

Default `webui-user.sh`:

Launching Web UI with arguments: --upcast-sampling --no-half-vae --use-cpu interrogate
torch: 1.12.1
7.00s/it
Total time: 308s (5m08s)
Memory Pressure: as expected

`webui-user.sh` from OP:

Launching Web UI with arguments: --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate
torch: 2.1.0.dev20230312
7.59s/it
Total time: 334s (5m34s)
Memory Pressure: half the above

Any idea why I'm getting slower speeds using the PyTorch 2.1.0 nightlies vs. PyTorch 1.12.1 @brkirch? Half the Memory Pressure with 2.1 is an incredible performance boost and should be especially useful when using very complex settings/extensions but the speed difference is killing me.

remixer-dec · 2023-02-17T10:58:13Z

remixer-dec
Feb 17, 2023

It works, great news! I am able to generate up to 1472x1472 images with hires fix without using swap on a 32gb machine, it was previously possible only on InvokeAI. And despite other comments, it works flawlessly on Monterey.
The thing that still requires too much RAM and does not work in half precision mode nor in --medvram mode is textual inversion training.

0 replies

925-Studio · 2023-02-18T08:35:58Z

925-Studio
Feb 18, 2023

Big thanks to @brkirch! This is definitely a improvement for performance. I'm using M1 Mac mini with 16gb ram. Before this, it tooks me 81s (3.7s/it) to generate a 512x768 picture, now it's 64s(3.04s/it). Also the Hires.fix is faster too, it was 15m17s to upscale a 512x768 picture by 2 using the latent bicubic antialiased upscaler, now it takes 10m21s. Another benefit from this improvement is now I can do the other things smoothly (like browsing webpages or check e-mails...) when using Hires.fix. It was very laggy before this update.

And a big thumb for @mandyohhh, I was facing the same problem that you had, by update to Ventura then it's solved. The only question is, why the hack I don't get 5x more speed like you do, any ideas?

0 replies

furiousjay · 2023-02-20T13:31:19Z

furiousjay
Feb 20, 2023

Thank you very much. I can confirm the improvements on my MacBook Pro 16 2021 with M1 Pro.

0 replies

remixer-dec · 2023-02-20T13:40:42Z

remixer-dec
Feb 20, 2023

I think I found a bug

0 replies

fractal-fumbler · 2023-02-26T11:01:14Z

fractal-fumbler
Feb 26, 2023

hadn't found appropriate place to ask and i think there is no need for a new discussion/

question is: should Upcast cross attention layer to float32 work with xformers?

1 reply

$@fractal-fumbler$

fractal-fumbler Mar 4, 2023

reply from FNSpd #8282 (reply in thread)

925-Studio · 2023-02-27T10:09:24Z

925-Studio
Feb 27, 2023

@brkirch I started a fresh installation of sd-webui and used your magic file. The improvement of speed is working but I'm facing a new problem. If I use upscaler to upscale (by 2) 512x768 images, the result is very strange(512 x 512 is fine). No matter which upscaler I use, hires.fix or scaleup img2img. I've attached the oirginal and result below.

Original:

Result:

It hasn't happened before. Thanks for your time.

10 replies

leifrogers Mar 1, 2023

The garbled image is with the Feb 10 torch nightlies. I'm sorry, I should have been more specific.

I tested a bunch of the nightlies and pretty much everything from 2/10 and before just made a garbled image and anything afterwards crashed the program with a "error: input types 'tensor<1x77x1xf16>' and 'tensor<1xf32>' are not broadcast compatible" error.

the original install of torch==1.12.1 torchvision==0.13.1 works for 512x512 but then flakes out with different sizes above that (specifically, i was trying 768x512 and sizes between that and 512x512)

it works as expected with torch==1.13.1 torchvision==0.14.1 just a bit slower than it did on the 2/10 nightlies before the 13.3 upgrade.

my 512x512 euler_a at 20 steps generation time has gone from 30ish seconds to 40ish seconds. Not great but not terrible.

AIEPhoenix Mar 2, 2023

same upgraded to 13.3 TT

leifrogers Mar 9, 2023

Just tested with the new 13.3 Beta (22E5236f) -- same problem

Adreitz Mar 11, 2023

@leifrogers @AIEPhoenix That's sad to hear, since PyTorch closed the issue @brkirch linked above as fixed with 13.3. Perhaps the bug triggered by the test case @brkirch used was not the actual cause of this issue in SD?

brkirch Mar 11, 2023
Collaborator Author

I've submitted a PR to fix that issue: #8518 (it's unrelated to the other bug).

Johnastc2002 · 2023-02-28T15:22:15Z

Johnastc2002
Feb 28, 2023

Hi @brkirch, thank you for your tips. It really helps me a lot. I am not sure it is a known issue or not, I found that by using --upcast-sampling instead of --precision full --no-half, it may reproduce some "faded" images even using the same model with same seed.

5 replies

925-Studio Feb 28, 2023

I was facing the same problem when I use anime style models. I solved by put the vae file next to the model file and make sure they share same file name. When you change to this model, you will see the log in terminal that shows it load the vae file as well. I'm using Anything V4.0 now and it works fine.

Johnastc2002 Feb 28, 2023

I have quite a lot of anime models in the project, so I may not add the VAE files one by one. But at least I know the problem is coming from the VAE files. Thanks for your help @925-Studio 🙏🏿

don1138 Mar 12, 2023

I added sd_vae to Settings > User interface . Quicksettings list so I can switch between them more quickly.

My full string is sd_model_checkpoint,sd_vae,CLIP_stop_at_last_layers,eta_noise_seed_delta,upcast_attn, so I rarely have to open the Settings tab.

BTW, I've started using kl-f8-anime2.vae.pt with all my anime models, and it's fantastic. It's up on Huggingface.

LunchBox Mar 13, 2023

--no-half-vae might help

wb1357076878 Mar 23, 2023

i download kl-f8-anime2.vae.pt file,and put it to */stable-diffusion-webui/models/VAE/ dir. solve this issue. and The color is pretty good.

php-cpm · 2023-03-10T04:22:06Z

php-cpm
Mar 10, 2023

export COMMANDLINE_ARGS="--opt-sub-quad-attention --disable-safe-unpickle --skip-torch-cuda-test --medvram --upcast-sampling --no-half --no-half-vae --use-cpu interrogate "
My best args is uppon

1 reply

don1138 Mar 12, 2023

I'm going to try --medvram for a while and see if it improves my Hires fix issues, but can you explain why you include --disable-safe-unpickle, --no-half, and --no-half-vae?

My understanding is the --no-half is now set by default, so we don't need to include it. IDK how this affects my models, since I'm exclusively using fp16 versions of everything, but I'll follow best practices until I know more.

And is --disable-safe-unpickle to allow for custom extensions, or is there another reason?

UPDATE

I ran some upscaling tests using different Command Line Args:

Prompt:

((ultra-detailed)), ((illustration)), Silver hair, red eyes, beautiful eyes, dress, Queen,Anime style, pretty face, pretty eyes, pretty, girl,High resolution, beautiful girl,octane render, realistic, hyper detailed ray tracing, 8k,classic style,Rococo
Negative prompt: (low quality, worst quality:1.4) concept art, bad-hands-5 easy-negative deepnegative-v1-75t
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 2783165510, Face restoration: CodeFormer, Size: 512x768, Model hash: 18a0e48e91, Model: anime_8528d-final, Denoising strength: 0.35, ENSD: 31337, Hires upscale: 2, Hires upscaler: R-ESRGAN 4x+ Anime6B, Eta: 0

TEST 1

Launching Web UI with arguments: --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate --disable-safe-unpickle --medvram --no-half --no-half-vae

Small
Total progress: 100%|█| 20/20 [00:21<00:00, 1.10s/it]

Large
Total progress: 100%|█| 40/40 [08:14<00:00, 12.37s/it]

TEST 2

Launching Web UI with arguments: --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate --no-half --no-half-vae

Small
Total progress: 100%|█| 20/20 [00:22<00:00, 1.10s/it]

Large
Total progress: 100%|█| 40/40 [06:52<00:00, 10.31s/it]

TEST 3

Launching Web UI with arguments: --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate

Small
Total progress: 100%|█| 20/20 [00:19<00:00, 1.02it/s]

Large
Total progress: 100%|█| 40/40 [03:46<00:00, 5.66s/it]

Analysis:

The T3 string renders upscaled images more than twice as quickly as the T1 string
T3 and T2 upscaled renders are indistinguishable from each other (to my naked eye), but the T2 uses full RAM and is about 20% faster than T1.
The Small renders show little quality difference between them. IMO the T3 settings are slightly better in some ways.
The Large upscaled renders show some quality difference between them. The results are all very similar, showing no noticeable distortion in the main figure between them, but T1 has a little bit more detail in the eyes, sharper highlights in the pupils, and greater detail in some of the background elements. But aside from that, the quality of T3 is hard to distinguish from T1.

Conclusion:

In comparing these Command Line Arg, T3 -- --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate -- produces renders of nearly equivalent quality as the others, but at nearly twice the speed.

If getting the absolutely highest quality possible is required, go with T1 or T2 depending on available RAM, but for general use T3 is more than good enough.

Addendum

When using T3, upscaling with certain prompts and certain models -- I haven't yet identified exactly what conditions -- my renders were getting scrambled: blurry top half, extremely 2D line art bottom half.

I switched to T1, and now all my upscales are fine. Slow, but fine.

jrittvo · 2023-03-11T21:07:52Z

jrittvo
Mar 11, 2023

With your update: "Fixed in the latest PyTorch builds (pytorch/pytorch@075a494)" what should the install command for torch in the modified "webui-user.sh" be updated to?

8 replies

WojtekKowaluk Mar 12, 2023

@925-Studio I can't confirm this, but my problem with hires started when I upgraded to 13.2 and latest nightly don't fix it, so I assume if Mac OS 13.3 has some related fixes they are required.

brkirch Mar 12, 2023
Collaborator Author

Yes, the bug with higher resolution images is unfortunately at the OS level and will not be fixed with updates to PyTorch. The only way to fix it is to either use CPU or a version of macOS that is not 13.2.X (neither 13.1 or 13.3 have the bug).

925-Studio Mar 13, 2023

In this way I think I'll wait for the official 13.3 release and hope everything will go smooth by then. Thank you, you are a big help!

WojtekKowaluk Mar 18, 2023

@925-Studio Just installed Ventura 13.3 beta and hires fix works great with torch nightly.

925-Studio Mar 19, 2023

@WojtekKowaluk I'm on beta 4 now and does work greatly but I'm facing a new problem here. When the token is greater than 75, the results will be totally black. If the token is <= 75, the result is good for the most of time.

LunchBox · 2023-03-12T10:02:07Z

LunchBox
Mar 12, 2023

my mac studio m1 max with 32G ram, run a 512x512 img2img with 26 steps(shown 14 steps in the cmd line actually), at speed 1.28it/s, is it normal?

Warning: StyleAdapter and cfg/guess mode may not works due to non-batch-cond inference
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:11<00:00,  1.24it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:11<00:00,  1.18it/s]

3 replies

brkirch Mar 12, 2023
Collaborator Author

Looks like ControlNet may be enabled? Usually speeds will be something like that if ControlNet is enabled, that isn't unusual. If ControlNet is not enabled then that may be a little slow, but it also depends on which sampler you are using.

LunchBox Mar 12, 2023

it is almost the same when I disabled the controlnet extension

925-Studio Mar 13, 2023

In my experience, the SDE and Heun cost almost 200% generation time. The DDIM, Euler a and 2M are the fastest.

OrganicBeej · 2023-08-26T11:23:32Z

OrganicBeej
Aug 26, 2023

Hi everyone :)

Can't remember (for new d-load) what the settings here should be for iMac M1?

Thanks for any advice

1 reply

jrittvo Aug 27, 2023

I'm running from the dev branch. It has one new option at the bottom that I don't really understand.

operationairstrike · 2023-08-27T19:46:00Z

operationairstrike
Aug 27, 2023

You can't because Apple has shitty graphics

0 replies

OrganicBeej · 2023-08-29T13:24:06Z

OrganicBeej
Aug 29, 2023

@jrittvo Hi my friend,
Sorry, only just seen this today so Thank You for the reply. I seem to have run into an issue recently though and I have tried SO many different commands line args and I just can't replicate it now!!

Using a great model from Civitai (SDXL) I was generated amazing images by using my own art as init_imgs and using this Civitai model along with SDXL_vae.safetensors inside img-2-img. I have NO idea what I have changed in settings as Ive been thru everything (so I thought?) but if I push the Denoise settings now past 50% I get blurry weird images rendered?

This was working fine before and coincidentally, I went over to Colab Notebook so I could make these beautiful images larger/faster etc and this was when I started getting the first sets of blurry images inside img-2-img? I came back to iMac (where I must have changed something by accident?) and now, I have exactly the same issue on my iMac - it's just baffled me.

Here are my command line args currently:

if [[ -x "$(command -v python3.10)" ]]
then
    python_cmd="python3.10"
fi

export install_dir="$HOME"
export COMMANDLINE_ARGS="--skip-torch-cuda-test --disable-nan-check --upcast-sampling --no-half-vae --use-cpu interrogate --opt-sub-quad-attention"
export TORCH_COMMAND="pip install torch==2.0.1 torchvision==0.15.2"
export K_DIFFUSION_REPO="https://github.com/brkirch/k-diffusion.git"
export K_DIFFUSION_COMMIT_HASH="51c9778f269cedb55a4d88c79c0246d35bdadb71"
export PYTORCH_ENABLE_MPS_FALLBACK=1

If you can see anything wrong or if Anyone has a suggestion, please help! :)
Thanks all
B

I'm running from the dev branch. It has one new option at the bottom that I don't really understand.

0 replies

OrganicBeej · 2023-08-30T10:33:45Z

OrganicBeej
Aug 30, 2023

Morning All :)

Very excited to share that I d-loaded the new A1111 1.6.0-RC and not only has fixed my img-2-img issues but WOW!! It's faster, more optimised for Mac, uses less swap, has some fabulous new additions . . just check it out here and get it d-loaded (zip file).
It even launches automatically now.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/releases/tag/v1.6.0-RC

Re-named my SD directory to stable-diffusion-webui-AAA and make sure you re-name the new directory 'stable-diffusion-webui' or the install won't work properly. Move your models, lora's embeddings etc and fire it up! You won't be disappointed.

Don't forget to copy the info from 'webui-macos-env.sh' to your 'webui.sh'

I've been running it with Civitai SDXL models and it's just amazing!

B.

12 replies

OrganicBeej Sep 6, 2023

Morning All,

Ive was using the new v1.6 A1111 fine yesterday but today, I tried to load a 1.5 model (which worked before fine)? and I get this error:

Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

I've checked in Settings>Stable Diffusion and checked the Float 32 box and ran again (as I think it was un-checked) and I'm still getting the same problem when trying to swap to this model?

Any ideas please?
Thanks as always . . :)

brkirch Sep 6, 2023
Collaborator Author

@OrganicBeej It'd be helpful to see the full traceback to confirm it, but it might be the same as #12907 and brkirch#36. If so then this already has a potential fix integrated (brkirch/stable-diffusion-webui@5a35f9d) and you may want to try it to see if you still have any problems.

Trurl101 Sep 6, 2023

I tested the patch, images are now generated again but there are many warnings that had not been there before 1.6:

/Volumes/Gold/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Model loaded in 2.2s (load weights from disk: 0.3s, create model: 0.7s, apply weights to model: 0.6s, apply dtype to VAE: 0.3s, move model to device: 0.1s, calculate empty prompt: 0.2s).
  0%|                                                                                         | 0/30 [00:00<?, ?it/s/Volumes/Gold/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torchsde/_brownian/brownian_interval.py:594: UserWarning: Should have tb<=t1 but got tb=14.609373092651367 and t1=14.609373.
  warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
 80%|████████████████████████████████████████████████████████████████                | 24/30 [00:21<00:05,  1.16it/s]/Volumes/Gold/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disablingogress:  40%|█████████████████████████▌                                      | 24/60 [00:21<00:30,  1.16it/s]
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
 93%|██████████████████████████████████████████████████████████████████████████▋     | 28/30 [00:25<00:01,  1.18it/s]/Volumes/Gold/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torchsde/_brownian/brownian_interval.py:585: UserWarning: Should have ta>=t0 but got ta=0.031249981373548508 and t0=0.03125.:  48%|██████████████████████████████▉                                 | 29/60 [00:25<00:26,  1.18it/s]
  warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")
100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:26<00:00,  1.14it/s]
Total prTile 1/9 50%|████████████████████████████████                                | 30/60 [00:26<00:25,  1.19it/s]
	Tile 2/9
	Tile 3/9
	Tile 4/9
	Tile 5/9
	Tile 6/9
	Tile 7/9
	Tile 8/9
	Tile 9/9
  3%|█████████▎                                                                                                                                                                                                                                                                            | 1/30 [00:07<03:30,  7.27s/it]/Volumes/Gold/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
 20%|███████████████████████████████████████████████████████▌                                                        /Volumes/Gold/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disablingogress:  62%|███████████████████████████████████████▍                        | 37/60 [01:16<02:29,  6.51s/it]
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
 73%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                         | 22/30 [02:33<00:55,  6.96s/it]
Total progress:  88%|████████████████████████████████████████████████████████▌       | 53/60 [03:07<00:48,  6.94s/it]

OrganicBeej Sep 6, 2023

Hi :) thanks for your reply.

I downloaded 'install-web-ui-1.6.0-arm' & 'stable-diffusion-webui-1.6.0-mac-builds'
I tried to run the installer app script by dble-clicking but it won't run. On reading the info I see the page say OS 13.5 or higher and I am still on Ventura 13.4 bcos I was worried about updating incase of issues the past week or so, (ignoring the System Settings menu to update my System).

I guess I should update first then? Hope everything still works after? Yikes!

brkirch Sep 6, 2023
Collaborator Author

@OrganicBeej Are you seeing this when you try to open it? If so then see my reply to that post. You will get a warning about the macOS version but you should still be able to install.

@Trurl101 Try adding --precision full to your command line arguments.

OrganicBeej · 2023-09-07T07:20:57Z

OrganicBeej
Sep 7, 2023

@OrganicBeej Are you seeing this when you try to open it? If so then see my reply to that post. You will get a warning about the macOS version but you should still be able to install.

@Trurl101 Try adding --precision full to your command line arguments.

Hi and thanks for your comments :)

Yes, I saw that error message and I shld have known better as have seen that in the past.
But I am still a little wary of pushing my M1 to Ventura 13.5 and I assume you need to have this OS System to install this version?
If 13.4 Ventura is OK then I will give it a go.

Thanks
B

2 replies

brkirch Sep 7, 2023
Collaborator Author

It's just a recommendation so that I don't get bug reports about issues that a macOS update will fix. macOS 13.4 is probably fine, but the older the version of macOS the less reliable Metal is. For a while the hires fix was completely broken in macOS 13.2, and IIRC 2048x2048 images weren't possible until macOS 13.4. NansException is more likely to occur with older macOS versions as well.

OrganicBeej Sep 7, 2023

Ahh, I see :) OK, well maybe I shld just bite the bullet and update anyways? Running 1.5 models with the latest 1.6 version is working SO well with hardly any swap and Python is down to 6.59 GB creating a 640x960 text to image. And of course, I can use the Normal maps but I guess they will arrive soon for A1111 SDXL.

Here are my current startup Args and I've added a 1.5 base model to force it to load-in the Embeddings/Lora's properly then you can swap to SDXL model and keep them there but I had similar issues as the link posts you kindly gave, that swapping back to 1.5 models gave the Python crash.

"--precision full --skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate --ckpt 'models/Stable-diffusion/dreamshaper_8.safetensors'"

I will do more tests and report back and I will also update to OS 13.5

I was wondering though, with the 1.6 version, do we still need to tick the checkbox for 'Upcast cross attention layer to float32' ?
Just asking bcos I think this caused me issues too.

Appreciate the feedback 👍🏻🙏🏻😊

OrganicBeej · 2023-09-07T08:52:00Z

OrganicBeej
Sep 7, 2023

PS: My current Optimizations look like this:

0 replies

OrganicBeej · 2023-09-25T09:52:55Z

OrganicBeej
Sep 25, 2023

Morning All :)

Is anyone having issues today with ControlNet? A1111 keeps crashing so I went to look at updating ControlNet as it said it needed it. But having Updated/Restart sometimes it works and sometimes it doesn't?

Here is the odd screenshot from Extensions:

Does anyone have any ideas please?
Thnks
B

0 replies

tehpupu · 2023-10-08T12:00:35Z

tehpupu
Oct 8, 2023

After using the nightly PyTorch to improve the performance, I am getting MallocStackLogging warning and the performance does not differ from when before. Am I doing something wrong?

0 replies

ericwenger7 · 2023-10-13T18:32:26Z

ericwenger7
Oct 13, 2023

I'm getting lots of memory crashes and mostly meaningless results (noisy, bad render, saturated) with AnimatedIff on my M2 Mac. Anybody was successful using Animatediff on M1/M2 ?

1 reply

operationairstrike Oct 14, 2023

That's because no one likes Apple

0x1337ff · 2023-11-01T15:50:16Z

0x1337ff
Nov 1, 2023

Hello !

I have installed version is good for somes images, but i try again and i have this to launch:

# Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
export COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate"

And i have this error after 100%

`==========================================================================================
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:21<00:00, 1.09s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:26<00:00, 1.32s/it]
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:21<00:00, 1.07s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:25<00:00, 1.29s/it]
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:23<00:00, 1.19s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:24<00:00, 1.20s/it]
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:22<00:00, 1.14s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:29<00:00, 4.99s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [14:27<00:00, 57.85s/it]
*** Error completing request████████████████████████████████████████████████████████████████████████████████████████| 45/45 [17:39<00:00, 56.16s/it]
*** Arguments: ('task(irwjlestxvsd9hn)', 'A close-up shot of a fierce Viking King, covered in dripping wet black mud, wearing a dark metal helmet with imposing black horns. His dark warpaint and scruffy black beard emphasize his angry expression, while a scar on his face tells a story of battles fought. The scene is bathed in cinematic lighting with dramatic volumetric rays, creating a moody and intense atmosphere. The Viking King grips his menacing Viking Axe', '(worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur:1.3), (3D ,3D Game, 3D Game Scene, 3D Character:1.1), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)', [], 30, 'DPM++ 2M Karras', 1, 1, 7, 1168, 768, True, 0.2, 1.5, '4x_NMKD-Siax_200k', 15, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x2be161ea0>, 0, False, '', 0.8, 2146770277, False, -1, 0, 0, 0, <scripts.animatediff_ui.AnimateDiffProcess object at 0x2be13b8e0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x2adce30a0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x2adcebe50>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x2adcea140>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x2be1b79d0>, None, '', None, True, False, False, False, False, False, 0, 0, '0', 0, False, True, 0, 'Portrait of a [gender]', 'blurry', 20, ['DPM++ 2M Karras'], '', 0, 'None', 1, 1, '', False, False, False, 1, 0, 'Portrait of a [gender]', 'blurry', 20, ['DPM++ 2M Karras'], '', 0, None, '', None, True, False, False, False, False, False, 0, 0, '0', 0, False, True, 0, 'Portrait of a [gender]', 'blurry', 20, ['DPM++ 2M Karras'], '', 0, 'None', 1, 1, '', False, False, False, 1, 0, 'Portrait of a [gender]', 'blurry', 20, ['DPM++ 2M Karras'], '', 0, None, '', None, True, False, False, False, False, False, 0, 0, '0', 0, False, True, 0, 'Portrait of a [gender]', 'blurry', 20, ['DPM++ 2M Karras'], '', 0, 'None', 1, 1, '', False, False, False, 1, 0, 'Portrait of a [gender]', 'blurry', 20, ['DPM++ 2M Karras'], '', 0, 'CodeFormer', 1, 1, 'None', 1, 1, ['After Upscaling/Before Restore Face'], 0, 'Portrait of a [gender]', 'blurry', 20, ['DPM++ 2M Karras'], '', 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, None, None, False, 50) {}
Traceback (most recent call last):
File "/Users/0x1337/stable-diffusion-webui/modules/call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "/Users/0x1337/stable-diffusion-webui/modules/call_queue.py", line 36, in f
res = func(*args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/modules/txt2img.py", line 55, in txt2img
processed = processing.process_images(p)
File "/Users/0x1337/stable-diffusion-webui/modules/processing.py", line 732, in process_images
res = process_images_inner(p)
File "/Users/0x1337/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/batch_hijack.py", line 42, in processing_process_images_hijack
return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/modules/processing.py", line 867, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "/Users/0x1337/stable-diffusion-webui/modules/processing.py", line 1156, in sample
return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
File "/Users/0x1337/stable-diffusion-webui/modules/processing.py", line 1249, in sample_hr_pass
decoded_samples = decode_latent_batch(self.sd_model, samples, target_device=devices.cpu, check_for_nans=True)
File "/Users/0x1337/stable-diffusion-webui/modules/processing.py", line 594, in decode_latent_batch
sample = decode_first_stage(model, batch[i:i + 1])[0]
File "/Users/0x1337/stable-diffusion-webui/modules/sd_samplers_common.py", line 76, in decode_first_stage
return samples_to_images_tensor(x, approx_index, model)
File "/Users/0x1337/stable-diffusion-webui/modules/sd_samplers_common.py", line 58, in samples_to_images_tensor
x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/repositories/generative-models/sgm/models/diffusion.py", line 121, in decode_first_stage
out = self.first_stage_model.decode(z)
File "/Users/0x1337/stable-diffusion-webui/repositories/generative-models/sgm/models/autoencoder.py", line 315, in decode
dec = self.decoder(z, **decoder_kwargs)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/repositories/generative-models/sgm/modules/diffusionmodules/model.py", line 728, in forward
h = self.up[i_level].block[i_block](h, temb, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/repositories/generative-models/sgm/modules/diffusionmodules/model.py", line 130, in forward
h = self.norm1(h)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/0x1337/stable-diffusion-webui/extensions-builtin/Lora/networks.py", line 459, in network_GroupNorm_forward
return originals.GroupNorm_forward(self, input)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 279, in forward
return F.group_norm(
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/nn/functional.py", line 2563, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/_refs/init.py", line 3084, in native_group_norm
out, mean, rstd = _normalize(input_reshaped, reduction_dims, eps)
File "/Users/0x1337/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/torch/_refs/init.py", line 3046, in _normalize
out = (a - mean) * rstd
RuntimeError: MPS backend out of memory (MPS allocated: 13.37 GB, other allocations: 3.32 GB, max allowed: 18.13 GB). Tried to allocate 1.92 GB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

python3.10(28413) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
`

1 reply

yscity2006 Nov 20, 2023

I also have issue with this...when I followed the instructions this suddenly happened

josfaber · 2023-12-27T16:53:00Z

josfaber
Dec 27, 2023

"To use all of these new improvements, you don't need to do much; just unzip this webui-user.sh file and replace the webui-user.sh file in stable-diffusion-webui. The next time you run ./webui.sh the web UI dependencies will be reinstalled, along with the latest nightly build of PyTorch."

Did this on M1 macbook air. Huge improvement!

4 replies

925-Studio Jan 2, 2024

How much it/s do you get now? 512 x 512, SD 1.5. I have M1 macmini and I'm getting around 1.5s/it. Takes my about 30 seconds to generate 20 steps.

josfaber Jan 2, 2024

Depends hugely on what models, lora's (how many and which) and prompt words you use. I'm generating 30 steps in about 60s with many lora's and especially lots of negative words for SD1.5 (mostly under 1024x768). But some models do 20s about that. At the moment between 4 and 8s per it

925-Studio Jan 3, 2024

It does sound improved a lot. Thank you for sharing this info.

darkcrocodile Jan 7, 2024

dependencies get reinstalled but I can no longer launch automatic 1111. I replaced the file as you said. What am I doing wrong?

SeanFrohman · 2024-01-07T07:46:06Z

SeanFrohman
Jan 7, 2024

What error do you get when replacing the file? It doesn't really do anything other than change a few small arguments to allow the update and then launch the program as normal. You could always replace the file with the original text that's in it, and then see if it launches again and if it doesn't it means it's something else, But if it does then that means we replaced that file incorrectly. Keep us updated.

…

On Sun, Jan 7, 2024, 2:14 AM darkcrocodile ***@***.***> wrote: dependencies get reinstalled but I can no longer launch automatic 1111. I replaced the file as you said. What am I doing wrong? — Reply to this email directly, view it on GitHub <#7453 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKWUBU25AWTXIFT6QN3UW5DYNJDLTAVCNFSM6AAAAAAUNNSXS2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DAMZVGE4TC> . You are receiving this because you commented.Message ID: <AUTOMATIC1111/stable-diffusion-webui/repo-discussions/7453/comments/8035191 @github.com>

2 replies

darkcrocodile Jan 8, 2024

I don't get any error. Automatic 1111 just doesn't launch anymore. I'll try your suggestion. Thank you for your help

darkcrocodile Jan 8, 2024

I made it work. Thank you. Cannot say I can see a major speed improvement though. I need to test it further.

mamalieaz · 2024-01-08T10:00:41Z

mamalieaz
Jan 8, 2024

demon

0 replies

Brownbagel · 2024-01-09T08:33:10Z

Brownbagel
Jan 9, 2024

Downloaded and replaced the webui-user.sh file and the performed update went through fine. Unfortunately the web ui does not start for me anymore. I'm on a Mac Studio Max M1.

Launching launch.py...
################################################################
Python 3.10.13 (main, Nov 1 2023, 16:33:59) [Clang 14.0.0 (clang-1400.0.29.202)]
Version: v1.6.0
Commit hash: 5ef669d
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
Installing requirements for Face Editor
Launching Web UI with arguments: --skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
/xyz/zyx/stable-diffusion-webui/modules/mac_specific.py:71: UserWarning: cumsum_out_mps supported by MPS on MacOS 13+, please upgrade (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/UnaryOps.mm:425.)
cumsum_needs_int_fix = not torch.Tensor([1,2]).to(torch.device("mps")).equal(torch.ShortTensor([1,1]).to(torch.device("mps")).cumsum(0))
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
*** Error setting up CodeFormer
Traceback (most recent call last):
File "/xyz/zyx/stable-diffusion-webui/modules/codeformer_model.py", line 30, in setup_model
from modules.codeformer.codeformer_arch import CodeFormer
File "/xyz/zyx/stable-diffusion-webui/modules/codeformer/codeformer_arch.py", line 9, in
from modules.codeformer.vqgan_arch import VQAutoEncoder, ResBlock
File "/xyz/zyx/stable-diffusion-webui/modules/codeformer/vqgan_arch.py", line 11, in
from basicsr.utils import get_root_logger
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/init.py", line 4, in
from .data import *
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/data/init.py", line 22, in
_dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/data/init.py", line 22, in
_dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
File "/opt/homebrew/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/data/realesrgan_dataset.py", line 11, in
from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/data/degradations.py", line 8, in
from torchvision.transforms.functional_tensor import rgb_to_grayscale
ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

Traceback (most recent call last):
File "/xyz/zyx/stable-diffusion-webui/launch.py", line 48, in
main()
File "/xyz/zyx/stable-diffusion-webui/launch.py", line 44, in main
start()
File "/xyz/zyx/stable-diffusion-webui/modules/launch_utils.py", line 436, in start
webui.webui()
File "/xyz/zyx/stable-diffusion-webui/webui.py", line 52, in webui
initialize.initialize()
File "/xyz/zyx/stable-diffusion-webui/modules/initialize.py", line 71, in initialize
from modules import gfpgan_model
File "/xyz/zyx/stable-diffusion-webui/modules/gfpgan_model.py", line 4, in
import gfpgan
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/gfpgan/init.py", line 3, in
from .data import *
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/gfpgan/data/init.py", line 10, in
_dataset_modules = [importlib.import_module(f'gfpgan.data.{file_name}') for file_name in dataset_filenames]
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/gfpgan/data/init.py", line 10, in
_dataset_modules = [importlib.import_module(f'gfpgan.data.{file_name}') for file_name in dataset_filenames]
File "/opt/homebrew/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/gfpgan/data/ffhq_degradation_dataset.py", line 7, in
from basicsr.data import degradations as degradations
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/init.py", line 4, in
from .data import *
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/data/init.py", line 22, in
_dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/data/init.py", line 22, in
_dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
File "/opt/homebrew/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/data/realesrgan_dataset.py", line 11, in
from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels
File "/xyz/zyx/stable-diffusion-webui/venv-torch-nightly/lib/python3.10/site-packages/basicsr/data/degradations.py", line 8, in
from torchvision.transforms.functional_tensor import rgb_to_grayscale
ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

3 replies

darkcrocodile Jan 9, 2024

i had the same issue. I fixed it by only replacing the part of the code that was different from the new file to the original file. However after making the changes, I cannot see a performance improvement with my mac m2

Brownbagel Jan 9, 2024

Maybe I don't bother trying this again if there's no real improvement in performance.

I was able to get the web ui working again by reverting to older PyTorch Nightly version. Although only when using version from dev20230131 date it worked, with newer ones I just got an error there isn't a suitable version available, "ERROR: Could not find a version that satisfies the requirement" but 20230131 worked so all good I guess. :)

Brownbagel Jan 9, 2024

Noticed the PNG info seems not work anymore. Something I could do to fix it?

SeanFrohman · 2024-01-09T10:30:29Z

SeanFrohman
Jan 9, 2024

Just run through and follow the steps again from wherever you need to to replace whatever files you may have changed, It's the easiest way to fix whatever problems. https://stable-diffusion-art.com/install-mac/ Then follow this link to get a first speed boost https://www.reddit.com/r/StableDiffusion/s/xuychoxnU8 Second https://www.reddit.com/r/StableDiffusion/s/SLQOs374Qr And then obviously follow: Easy auto updates! In your folder right click on "webui-user.bat" And click edit. (I use notepad) Add git pull between the last to lines "Set" and "Call". Like bellow! (--medvram --autolaunch) optional. Make bigger images with --medvram Auto lunch Web up with --autolaunch `set COMMANDLINE_ARGS= --medvram --autolaunch` `git pull` `call webui.bat` Done! Every time you start your "webui-user.bat" it will update every time. You don't have to do the medium VRAM thing but it will usually help.

…

On Tue, Jan 9, 2024, 4:32 AM Brownbagel ***@***.***> wrote: Noticed the PNG info seems not work anymore. Something I could do to fix it? — Reply to this email directly, view it on GitHub <#7453 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKWUBU6VK5YLW3RIXEDX73TYNUFEFAVCNFSM6AAAAAAUNNSXS2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DANRSGY4DK> . You are receiving this because you commented.Message ID: <AUTOMATIC1111/stable-diffusion-webui/repo-discussions/7453/comments/8062685 @github.com>

1 reply

Brownbagel Jan 9, 2024

Perfect, thank you! :)

How to improve performance on M1 / M2 Macs #7453

brkirch Feb 1, 2023 Collaborator

Update (April 12, 2023):

Original Post:

Replies: 68 comments · 218 replies

brkirch Feb 1, 2023 Collaborator Author

brkirch Feb 3, 2023 Collaborator Author

brkirch Feb 5, 2023 Collaborator Author

brkirch Feb 10, 2023 Collaborator Author

brkirch Feb 13, 2023 Collaborator Author

One word prompt, using default settings (Euler a, 20 steps, 512x512 etc.) and no extras

Default webui-user.sh:

webui-user.sh from OP:

Complex prompt, using complex settings (DPM++ 2M Karras, 22 steps, 512x768 with 2x hires fix using a custom upscaler, several LoRA and ControlNet with OpenPose)

Default webui-user.sh:

webui-user.sh from OP:

brkirch Mar 11, 2023 Collaborator Author

UPDATE

Prompt:

TEST 1

TEST 2

TEST 3

Analysis:

Conclusion:

Addendum

brkirch Mar 12, 2023 Collaborator Author

brkirch
Feb 1, 2023
Collaborator

Replies: 68 comments 218 replies

brkirch Feb 1, 2023
Collaborator Author

brkirch Feb 3, 2023
Collaborator Author

brkirch Feb 5, 2023
Collaborator Author

brkirch Feb 10, 2023
Collaborator Author

brkirch Feb 13, 2023
Collaborator Author

Default `webui-user.sh`:

`webui-user.sh` from OP:

Default `webui-user.sh`:

`webui-user.sh` from OP:

brkirch Mar 11, 2023
Collaborator Author

brkirch Mar 12, 2023
Collaborator Author