How to improve performance on M1 / M2 Macs #7453
Replies: 68 comments 218 replies
-
I did test this configuration in an upgrade scenario as I think it's the most likely to occur for most users:
As expected, the script created the new venv environment and downloaded the torch 2.0 nightly build. This is the setup I ended up with after the process (captured with the With this setup, I was able to generate an image in just 4s (not sure why the screenshot below reports 6.34, it's not correct) with the This is more than 50% time reduction from what I could get out of the previous configuration with Other configurations have generated images with various performance improvements:
So, the bottom line is that the configuration works and it's faster. However, there is still a significant problem with the
I have documented this in #6923 (see my message at the bottom of the thread).
|
Beta Was this translation helpful? Give feedback.
-
One other important feedback, @brkirch: before this improved configuration, my M2 system would become incredibly hot during the image generation, triggering the fan at maximum speed. It never happened before with A1111 on my previous M1 system (even during a 6h long batch generation), and it doesn't happen with any other AI model I use (TTS, STT, etc.). However, it just happened during my test of the new InvokeAI 2.3 RC. So, both A1111 (latest commit) and InvokeAI 2.3 RC somehow push the M2 system to its limits (probably not for the right reasons), but your configuration for A1111 seem to solve/mitigate the problem. I'll run longer batches to see if it's still true during long inference sessions. |
Beta Was this translation helpful? Give feedback.
-
Hey Guys! Any good ideas on how to walk around this issue? |
Beta Was this translation helpful? Give feedback.
-
Tried to do this update, and now I get this error? launch.py: error: unrecognized arguments: --upcast-sampling Any ideas? |
Beta Was this translation helpful? Give feedback.
-
Trying to update by replacing
Tried manually installing |
Beta Was this translation helpful? Give feedback.
-
It works, great news! I am able to generate up to 1472x1472 images with hires fix without using swap on a 32gb machine, it was previously possible only on InvokeAI. And despite other comments, it works flawlessly on Monterey. |
Beta Was this translation helpful? Give feedback.
-
Big thanks to @brkirch! This is definitely a improvement for performance. I'm using M1 Mac mini with 16gb ram. Before this, it tooks me 81s (3.7s/it) to generate a 512x768 picture, now it's 64s(3.04s/it). Also the Hires.fix is faster too, it was 15m17s to upscale a 512x768 picture by 2 using the latent bicubic antialiased upscaler, now it takes 10m21s. Another benefit from this improvement is now I can do the other things smoothly (like browsing webpages or check e-mails...) when using Hires.fix. It was very laggy before this update. And a big thumb for @mandyohhh, I was facing the same problem that you had, by update to Ventura then it's solved. The only question is, why the hack I don't get 5x more speed like you do, any ideas? |
Beta Was this translation helpful? Give feedback.
-
Thank you very much. I can confirm the improvements on my MacBook Pro 16 2021 with M1 Pro. |
Beta Was this translation helpful? Give feedback.
-
I think I found a bug |
Beta Was this translation helpful? Give feedback.
-
hadn't found appropriate place to ask and i think there is no need for a new discussion/ question is: should |
Beta Was this translation helpful? Give feedback.
-
@brkirch I started a fresh installation of sd-webui and used your magic file. The improvement of speed is working but I'm facing a new problem. If I use upscaler to upscale (by 2) 512x768 images, the result is very strange(512 x 512 is fine). No matter which upscaler I use, hires.fix or scaleup img2img. I've attached the oirginal and result below. It hasn't happened before. Thanks for your time. |
Beta Was this translation helpful? Give feedback.
-
Hi @brkirch, thank you for your tips. It really helps me a lot. I am not sure it is a known issue or not, I found that by using |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
With your update: "Fixed in the latest PyTorch builds (pytorch/pytorch@075a494)" what should the |
Beta Was this translation helpful? Give feedback.
-
my mac studio m1 max with 32G ram, run a 512x512 img2img with 26 steps(shown 14 steps in the cmd line actually), at speed 1.28it/s, is it normal?
|
Beta Was this translation helpful? Give feedback.
-
Hi everyone :) Can't remember (for new d-load) what the settings here should be for iMac M1? |
Beta Was this translation helpful? Give feedback.
-
You can't because Apple has shitty graphics |
Beta Was this translation helpful? Give feedback.
-
@jrittvo Hi my friend, Using a great model from Civitai (SDXL) I was generated amazing images by using my own art as init_imgs and using this Civitai model along with SDXL_vae.safetensors inside img-2-img. I have NO idea what I have changed in settings as Ive been thru everything (so I thought?) but if I push the Denoise settings now past 50% I get blurry weird images rendered? This was working fine before and coincidentally, I went over to Colab Notebook so I could make these beautiful images larger/faster etc and this was when I started getting the first sets of blurry images inside img-2-img? I came back to iMac (where I must have changed something by accident?) and now, I have exactly the same issue on my iMac - it's just baffled me. Here are my command line args currently:
If you can see anything wrong or if Anyone has a suggestion, please help! :)
|
Beta Was this translation helpful? Give feedback.
-
Morning All :) Very excited to share that I d-loaded the new A1111 1.6.0-RC and not only has fixed my img-2-img issues but WOW!! It's faster, more optimised for Mac, uses less swap, has some fabulous new additions . . just check it out here and get it d-loaded (zip file). https://github.com/AUTOMATIC1111/stable-diffusion-webui/releases/tag/v1.6.0-RC Re-named my SD directory to stable-diffusion-webui-AAA and make sure you re-name the new directory 'stable-diffusion-webui' or the install won't work properly. Move your models, lora's embeddings etc and fire it up! You won't be disappointed. Don't forget to copy the info from 'webui-macos-env.sh' to your 'webui.sh' I've been running it with Civitai SDXL models and it's just amazing! B. |
Beta Was this translation helpful? Give feedback.
-
Hi and thanks for your comments :) Yes, I saw that error message and I shld have known better as have seen that in the past. Thanks |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
After using the nightly PyTorch to improve the performance, I am getting MallocStackLogging warning and the performance does not differ from when before. Am I doing something wrong? |
Beta Was this translation helpful? Give feedback.
-
I'm getting lots of memory crashes and mostly meaningless results (noisy, bad render, saturated) with AnimatedIff on my M2 Mac. Anybody was successful using Animatediff on M1/M2 ? |
Beta Was this translation helpful? Give feedback.
-
Hello ! I have installed version is good for somes images, but i try again and i have this to launch:
And i have this error after 100% `========================================================================================== python3.10(28413) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. |
Beta Was this translation helpful? Give feedback.
-
"To use all of these new improvements, you don't need to do much; just unzip this webui-user.sh file and replace the webui-user.sh file in stable-diffusion-webui. The next time you run ./webui.sh the web UI dependencies will be reinstalled, along with the latest nightly build of PyTorch." Did this on M1 macbook air. Huge improvement! |
Beta Was this translation helpful? Give feedback.
-
What error do you get when replacing the file? It doesn't really do
anything other than change a few small arguments to allow the update and
then launch the program as normal.
You could always replace the file with the original text that's in it, and
then see if it launches again and if it doesn't it means it's something
else, But if it does then that means we replaced that file incorrectly.
Keep us updated.
…On Sun, Jan 7, 2024, 2:14 AM darkcrocodile ***@***.***> wrote:
dependencies get reinstalled but I can no longer launch automatic 1111. I
replaced the file as you said. What am I doing wrong?
—
Reply to this email directly, view it on GitHub
<#7453 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWUBU25AWTXIFT6QN3UW5DYNJDLTAVCNFSM6AAAAAAUNNSXS2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DAMZVGE4TC>
.
You are receiving this because you commented.Message ID:
<AUTOMATIC1111/stable-diffusion-webui/repo-discussions/7453/comments/8035191
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Downloaded and replaced the webui-user.sh file and the performed update went through fine. Unfortunately the web ui does not start for me anymore. I'm on a Mac Studio Max M1. Launching launch.py... Traceback (most recent call last): |
Beta Was this translation helpful? Give feedback.
-
Just run through and follow the steps again from wherever you need to to
replace whatever files you may have changed, It's the easiest way to fix
whatever problems.
https://stable-diffusion-art.com/install-mac/
Then follow this link to get a first speed boost
https://www.reddit.com/r/StableDiffusion/s/xuychoxnU8
Second
https://www.reddit.com/r/StableDiffusion/s/SLQOs374Qr
And then obviously follow:
Easy auto updates! In your folder right click on "webui-user.bat" And click
edit. (I use notepad) Add git pull between the last to lines "Set" and
"Call". Like bellow!
(--medvram --autolaunch) optional.
Make bigger images with --medvram
Auto lunch Web up with --autolaunch
`set COMMANDLINE_ARGS= --medvram --autolaunch`
`git pull`
`call webui.bat`
Done! Every time you start your "webui-user.bat" it will update every time.
You don't have to do the medium VRAM thing but it will usually help.
…On Tue, Jan 9, 2024, 4:32 AM Brownbagel ***@***.***> wrote:
Noticed the PNG info seems not work anymore. Something I could do to fix
it?
—
Reply to this email directly, view it on GitHub
<#7453 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWUBU6VK5YLW3RIXEDX73TYNUFEFAVCNFSM6AAAAAAUNNSXS2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DANRSGY4DK>
.
You are receiving this because you commented.Message ID:
<AUTOMATIC1111/stable-diffusion-webui/repo-discussions/7453/comments/8062685
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Update (April 12, 2023):
If you use an Apple silicon Mac and want to try the latest features and optimizations, look here:
https://github.com/brkirch/stable-diffusion-webui/releases
The "experimental" builds will usually have the latest features and optimizations, and other builds will be intended to provide a stable version that may include new features once they've been tested sufficiently. Note that these currently are not intended to be considered official builds and they will not necessary reflect exactly what this main repository has at any given point in time.
Also, Intel versions are planned in the near future when I can track down exactly what is causing the reported PyTorch 2.0 issues with k-diffusion and UniPC samplers.
Original Post:
There have been several additions and changes made recently that can improve performance on macOS:
--no-half
) with--upcast-sampling
. This significantly lowers memory usage and improves performance. This has been the default for macOS since e0df864 so other than upgrading web UI, no action is needed - unless you've edited webui-user.sh to override the default command line arguments and included--no-half
. In that case you'll need to remove--no-half
and make sure--upcast-sampling
is used instead. Note that if you want to train embeddings or hypernetworks, you should start web UI with--no-half
(e.g../webui.sh --no-half
; you don't have to fully override the default command lines arguments to remove--upcast-sampling
as--no-half
overrides it).--opt-sub-quad-attention
to use this. This is the recommended cross attention optimization to use with newer PyTorch versions. It manages memory far better than any of the other cross attention optimizations available to Macs and is required for large image sizes.To use all of these new improvements, you don't need to do much; just unzip this webui-user.sh file and replace the
webui-user.sh
file instable-diffusion-webui
. The next time you run./webui.sh
the web UI dependencies will be reinstalled, along with the latest nightly build of PyTorch.Keep in mind that the nightly PyTorch builds may have issues, especially since they are updated every day. If you encounter problems, you can always revert back by running
git checkout webui-user.sh
and then you can delete thevenv-torch-nightly
folder.If you are having problems but instead want to try reverting to an older PyTorch nightly, replace the line in
webui-user.sh
that starts withexport TORCH_COMMAND=
withexport TORCH_COMMAND="pip install --pre torch==2.0.0.dev20230131 torchvision==0.15.0.dev20230131 -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html"
, replace the two20230131
dates with whatever previous date you would like to use a PyTorch nightly build from, then delete thevenv-torch-nightly
folder and run./webui.sh
.Please post any issues and feedback here regarding the above instructions or anything else related to web UI performance on Macs.
Edit:
As of Feburary 11, the PyTorch nightly builds have broken the ability to useFixed in the latest PyTorch builds (pytorch/pytorch@075a494).torch.nn.functional.layer_norm
with half precision and web UI doesn't currently have a patch to fix it. I'll implement a patch and put in a PR if newer nightly builds show a performance improvement, but right now the latest build has slightly worse performance. For the time being, I've modified the webui-user.sh file in the zip linked above to use the Feburary 10 PyTorch build.Beta Was this translation helpful? Give feedback.
All reactions