Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stable-diffusion: implement ESRGAN upscaler + Metal Backend #104

Merged
merged 31 commits into from
Dec 28, 2023

Conversation

FSSRepo
Copy link
Contributor

@FSSRepo FSSRepo commented Dec 5, 2023

In recent days, I have been working on implementing this upscaler that could be useful to some. I compared the results generated by the original implementation in PyTorch, and they don't differ much. I did my best to implement the architecture, although I may have made a mistake. In the next few days, I'll be reviewing it carefully.

Results

image 128x128 rescaled ESRGAN Original 512 x 512
rescaled_image_1 upscaled_image_1_sdcpp original_image_1
rescaled_image_2 upscaled_image_2_sdcpp original_image_2

At the moment, only the RealESRGAN_x4plus_anime_6B.pth model is supported, you can use it specifying -um RealESRGAN_x4plus_anime_6B.pth.

@FSSRepo FSSRepo marked this pull request as draft December 6, 2023 03:07
@FSSRepo FSSRepo marked this pull request as ready for review December 8, 2023 20:27
@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 8, 2023

@leejet I have already reviewed the architecture and performed various tests on the upscaler, and everything seems to be working well. At the moment, it will only support one model due to the inflexibility of the current implementation.

Additionally, I have added the latest changes from ggml (backend). I had to make some modifications to model.cpp since the ggml_get_backend function no longer exists. I created a function sd_tiling that could be useful for VAE tiling, but for now, it is only used by the upscaler.

In 2 days, the missing kernels may already be available to add the Metal backend.

Here's a test of VAE tiling. I think we need to overlap and blend the colors to eliminate the seams.

output2

@Jonathhhan
Copy link

Jonathhhan commented Dec 9, 2023

@FSSRepo thanks you. It works great. Only that the seams are also visible with the upscaler (but much less than in your vae tiling example).

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 9, 2023

@Jonathhhan some example, i can't see seams in the images upscaled, and dimens of the image

@Jonathhhan
Copy link

Jonathhhan commented Dec 9, 2023

@FSSRepo true. It depends.
Here you can see it for example (not much):
ofxStableDiffusion-2023-12-09-01-48-59

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 9, 2023

I will see if tomorrow I have time to add something to make the seams less noticeable.

@leejet
Copy link
Owner

leejet commented Dec 9, 2023

@leejet I have already reviewed the architecture and performed various tests on the upscaler, and everything seems to be working well. At the moment, it will only support one model due to the inflexibility of the current implementation.

Additionally, I have added the latest changes from ggml (backend). I had to make some modifications to model.cpp since the ggml_get_backend function no longer exists. I created a function sd_tiling that could be useful for VAE tiling, but for now, it is only used by the upscaler.

In 2 days, the missing kernels may already be available to add the Metal backend.

Here's a test of VAE tiling. I think we need to overlap and blend the colors to eliminate the seams.

output2

It seems like there are different color variations between different blocks. Could this be due to some subtle differences in the computation during the process?

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 9, 2023

It seems like there are different color variations between different blocks. Could this be due to some subtle differences in the computation during the process?

The example I showed you is with VAE tiling; with the upscaler, it's subtle only if you pay attention to the details, or it will depend on the complexity of the image to notice the discontinuous cuts.

In the case of the upscaler, it can be mitigated to some extent by overlapping and combining colors in an interpolated manner. For VAE tiling, it would be something like color leveling and overlapping, although it still doesn't achieve a convincing result. In any case, I'm going to run tests.

@Jonathhhan
Copy link

Jonathhhan commented Dec 9, 2023

I found the top answer in this reddit thread helpful for my understanding of optimizing upscaling:
https://www.reddit.com/r/StableDiffusion/comments/11ahfyo/seams_help_with_ultimate_upscale_for_a1111/?sort=new
Some comments are specific to A1111, but I guess most can be applied to all upscalers (I doubt, that anything from the link is new to you).

@Green-Sky
Copy link
Contributor

my goto test fails in this pr:

log looks normal.

$ sd -m models/epicphotogasm_lastUnicorn-f16.gguf --lora-model-dir models/og/sd1/ -p "<lora:lcm-lora-sdv1-5:1>a lovely cat, cinematic" --sampling-method lcm --steps 4 --cfg-scale 1 -b 1
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5
[INFO]  stable-diffusion.cpp:4914 - loading model from 'models/epicphotogasm_lastUnicorn-f16.gguf'
[INFO]  model.cpp:615  - load models/epicphotogasm_lastUnicorn-f16.gguf using gguf format
[INFO]  stable-diffusion.cpp:4936 - Stable Diffusion 1.x
[INFO]  stable-diffusion.cpp:4942 - Stable Diffusion weight type: f16
[INFO]  stable-diffusion.cpp:5092 - total memory buffer size = 1972.80MB (clip 236.18MB, unet 1641.16MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:5098 - loading model from 'models/epicphotogasm_lastUnicorn-f16.gguf' completed, taking 0.51s
[INFO]  stable-diffusion.cpp:5112 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:4520 - loading LoRA from 'models/og/sd1/lcm-lora-sdv1-5.safetensors'
[INFO]  model.cpp:618  - load models/og/sd1/lcm-lora-sdv1-5.safetensors using safetensors format
[INFO]  stable-diffusion.cpp:5213 - lora 'lcm-lora-sdv1-5' applied, taking 0.47s
[INFO]  stable-diffusion.cpp:5940 - apply_loras completed, taking 0.47s
[INFO]  stable-diffusion.cpp:5969 - get_learned_condition completed, taking 11 ms
[INFO]  stable-diffusion.cpp:5979 - sampling using LCM method
[INFO]  stable-diffusion.cpp:5983 - generating image: 1/1 - seed 42
  |==================================================| 4/4 - 6.00it/s
[INFO]  stable-diffusion.cpp:5995 - sampling completed, taking 0.69s
[INFO]  stable-diffusion.cpp:6003 - generating 1 latent images completed, taking 0.69s
[INFO]  stable-diffusion.cpp:6005 - decoding 1 latents
[INFO]  stable-diffusion.cpp:6017 - latent 1 decoded, taking 0.73s
[INFO]  stable-diffusion.cpp:6021 - decode_first_stage completed, taking 0.73s
[INFO]  stable-diffusion.cpp:6026 - txt2img completed in 1.44s
[INFO]  main.cpp:521  - save result image to 'output.png'

but the genrated image,
output
does not.

@Jonathhhan
Copy link

@Green-Sky you are still using .gguf model files. Just use the unconverted .ckpt or .safetensonrs file (not sure, if that's the issue).

@Green-Sky
Copy link
Contributor

Green-Sky commented Dec 10, 2023

@Green-Sky you are still using .gguf model files. Just use the unconverted .ckpt or .safetensonrs file (not sure, if that's the issue).

I preffer the smaller gguf files. On that topic, what happened to the converter program?

edit: Also, the gguf works fine on master.

@leejet
Copy link
Owner

leejet commented Dec 10, 2023

@Green-Sky you are still using .gguf model files. Just use the unconverted .ckpt or .safetensonrs file (not sure, if that's the issue).

I preffer the smaller gguf files. On that topic, what happened to the converter program?

edit: Also, the gguf works fine on master.

The gguf file can still be used, and this issue should be independent of the file format.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 10, 2023

@Green-Sky can you test the commits of this PR, to locate the root of this issue?

@Green-Sky
Copy link
Contributor

Green-Sky commented Dec 10, 2023

@Green-Sky can you test the commits of this PR, to locate the root of this issue?

the first commit in this pr, the first non-pr commit works 968226a
output
the cat looks a bit funny tho. (edit: removing the cinematic gives it its face back :) )

@leejet
Copy link
Owner

leejet commented Dec 10, 2023

I tested it on the latest master code and it works fine.

.\bin\Release\sd.exe -m ..\models\epicphotogasm_lastUnicorn.safetensors --lora-model-dir ..\models\ -p "<lora:lcm-lora-sdv1-5:1>a lovely cat, cinematic" --sampling-method lcm --steps 4 --cfg-scale 1 -b 1

output

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 10, 2023

@leejet can you test this PR??, I have not been able to replicate that color inversion error.

@Cyberhan123
Copy link
Contributor

@FSSRepo Okay, I'll try migrating your code now.

@leejet
Copy link
Owner

leejet commented Dec 23, 2023

@leejet I hope you can pin this issue,so we can collect it in one place

Done.

@Cyberhan123
Copy link
Contributor

@leejet I hope you can pin this issue,so we can collect it in one place

Done.

Thx

@leejet
Copy link
Owner

leejet commented Dec 23, 2023

@Green-Sky I couldn't replicate your issue using Colab. Based on the available information, I attempted a simple fix. Could you pull the latest code from this PR to see if it resolves your problem?

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 23, 2023

@leejet slaren had already solved the synchronization issues in ggml, and had also synchronized the changes in my ggml repository. GreenSky has confirmed that everything is working well, except for VAE Tiling, which seems to be causing issues, But I have been conducting tests, and I cannot reproduce the VAE tiling error.

@leejet
Copy link
Owner

leejet commented Dec 23, 2023

@FSSRepo keep in mind that it also happens when not tiling. (see my last picture)

@FSSRepo In the latest comment from GreenSky, he mentioned that it also occurs when not tiling.

@leejet
Copy link
Owner

leejet commented Dec 23, 2023

@Green-Sky After pulling the latest code, did you run git submodule update to update ggml?

@leejet
Copy link
Owner

leejet commented Dec 23, 2023

@FSSRepo I'm trying a similar approach to what was used in the master branch with ggml when using CUDA: using get_tensor_async followed by synchronize. To see if this approach helps resolve the issue.

@Amin456789
Copy link

Amin456789 commented Dec 23, 2023

@FSSRepo @Cyberhan123 so excited about this, finally a gui! please make it a simple gui if possible for windows so we don't need to run it on a browser as browsers take ram, so the whole ram will be used by sd cpp only
also a dark mode will be great

@Green-Sky
Copy link
Contributor

. . .
output output_2 output_3
output_4 output_5 output_6
output_7 output_8 output_9

with the most simplest invocation

result/bin/sd -m ../../stable-diffusion-webui/models/Stable-diffusion/epicphotogasm_lastUnicorn.safetensors -p "a lovely cat" -b 9

421e39b

$ git submodule
 a0c2ec77a5ef8e630aff65bc535d13b9805cb929 ggml (remotes/origin/sd-cpp)

@leejet
Copy link
Owner

leejet commented Dec 24, 2023

@FSSRepo I've consistently been unable to reproduce GreenSky's issue. I'm unsure whether to merge this PR now or wait until this issue is fixed. While everything works fine on my own device, evidently, problems arise in some devices. This issue has been consistently blocking me from merging this PR. I'd like to hear your opinion.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 24, 2023

@leejet I haven't been able to reproduce the issue either. I thought that if someone experiencing a similar problem with a T4 GPU, which is the same as Google Colab's, came across it, I might finally discover the root cause. However, after testing, I found no issues with Google Colab. It's frustrating.

I'm still not entirely sure if this problem originates from the first commit of this pull request or after I introduced the changes from ggml. If so, I could revert the ggml changes (removing Metal support) and keep only the essential components for the upscaler to function.

@leejet
Copy link
Owner

leejet commented Dec 24, 2023

I'm still not entirely sure if this problem originates from the first commit of this pull request or after I introduced the changes from ggml.

@Green-Sky Can you help us test to determine which commit introduced the issue?

@Amin456789
Copy link

why dont u guys merge it now and let people report the bugs, after all img2img got fixed after sometime too, im so hyped for new features of this lovely sd cpp!

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 27, 2023

@leejet are you working to bring control net to this project?

@Cyberhan123
Copy link
Contributor

. . .
output output_2 output_3
output_4 output_5 output_6
output_7 output_8 output_9
with the most simplest invocation

result/bin/sd -m ../../stable-diffusion-webui/models/Stable-diffusion/epicphotogasm_lastUnicorn.safetensors -p "a lovely cat" -b 9

421e39b

$ git submodule
 a0c2ec77a5ef8e630aff65bc535d13b9805cb929 ggml (remotes/origin/sd-cpp)

@Green-Sky I spent a long time tracking and even compared the output tensor files. I found that the CUDA core has accuracy problems in the decoder stage, which may cause memory overflow, so half of your pictures are solid colors.

@Cyberhan123
Copy link
Contributor

Cyberhan123 commented Dec 28, 2023

Thanks to @bssrdf reminder, I can debug clumsily. I traced all the processes and found that mul-mat may not be the source of the problem. On my cloud host, a complete image will be generated, but the parsing may fail.I now need someone to tell me testing techniques, because debugging cuda is too scary, there will be 512x512x3 float type data.

@leejet
Copy link
Owner

leejet commented Dec 28, 2023

@FSSRepo @Green-Sky Given the relatively low probability of encountering the issue of generating invalid images, I'll merge this PR and #117. Then, I'll open a separate issue to follow up on this problem because it has been blocking the merging of this PR for quite some time.

@leejet
Copy link
Owner

leejet commented Dec 28, 2023

@leejet are you working to bring control net to this project?

Not yet. My next step is to work on implementing SVD.

@leejet leejet merged commit 004dfbe into leejet:master Dec 28, 2023
7 checks passed
@leejet
Copy link
Owner

leejet commented Dec 28, 2023

This PR has been merged. Thank you for your contributions!

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Dec 28, 2023

@leejet so cool bro, In that case, I will work on adding control net. I more or less understand how it works now, but what's still not entirely clear to me is what the zero_convolution it applies will be. Anyway, I'm reviewing it in more detail now that I have some time.

Screen-Shot-2023-02-17-at-5.27.29-PM.png

@bssrdf
Copy link
Contributor

bssrdf commented Dec 29, 2023

Thank you, @FSSRepo, for implementing ESRGAN. Really nice to have a try of this upscaler in stable-diffusion.cpp.
I managed to make another model RealESRGAN_x4plus work. This model is supposed to be for general images and it requires a bigger sized graph (bumped MAX_NODES to 4096). I'll try to get a PR ready with this model.

  • Original SD generated 512x512
    artius28

  • Upscaled 2048x2048
    artius29

@voidastro4
Copy link

There are a multitude of upscalers far better than the 2 in this thread.
Would be nice if they worked with stable-diffusion.cpp

This site contains them all: https://openmodeldb.info

Some of the generally well regarded ESRGAN ones:
general purpose: https://openmodeldb.info/models/4x-NMKD-Siax-CX
anime: https://openmodeldb.info/models/4x-IllustrationJaNai-V1-ESRGAN

There are also the DAT models, they are generally better but much slower.

@FSSRepo
Copy link
Contributor Author

FSSRepo commented Sep 24, 2024

Here we try to add simple architectures since they are easier to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.