Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance 4/6] Precompute is_sdxl_inpaint flag #15806

Merged
merged 3 commits into from
Jun 8, 2024

Conversation

huchenlei
Copy link
Contributor

Description

According to lllyasviel/stable-diffusion-webui-forge#716 (comment) , the check of whether the model is sdxl inpaint is calling state_dict on every sampling step. state_dict is a very expensive function that costs ~40ms. This overhead is for all inference regardless of model type, which is dumb.

This PR precomputes is_sdxl_inpaint flag so that we do not call state_dict on every sampling step.

Original PR that introduce this change: #14390

Screenshots/videos:

image

Checklist:

@huchenlei huchenlei requested a review from AUTOMATIC1111 as a code owner May 15, 2024 20:36
@huchenlei huchenlei changed the title Precompute is_sdxl_inpaint flag [Performance 4/6] Precompute is_sdxl_inpaint flag May 15, 2024
@huchenlei huchenlei changed the base branch from master to dev May 15, 2024 20:50
@Panchovix
Copy link

Panchovix commented May 16, 2024

Just wanted to comment that all these performance PRs are amazing! I get pretty similar speeds vs Forge on a RTX 4090. (It seems that A1111 with these PRs actually generate a tad bit faster vs Forge, but the former takes a bit more time to start generating)

@huchenlei
Copy link
Contributor Author

Just wanted to comment that all these performance PRs are amazing! I get pretty similar speeds vs Forge on a RTX 4090. (It seems that A1111 with these PRs actually generate a tad bit faster vs Forge, but the former takes a bit more time to start generating)

There are 2 more PRs to come, but they are not as straightforward. So they might take longer to prepare. I am also having all performance fix merged to https://github.com/huchenlei/stable-diffusion-webui/tree/all_perf so you don't need to patch these PRs one by one.

@AUTOMATIC1111 AUTOMATIC1111 merged commit 6450d24 into AUTOMATIC1111:dev Jun 8, 2024
3 checks passed
@lawchingman lawchingman mentioned this pull request Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants