stable-diffusion: implement ESRGAN upscaler + Metal Backend #104

FSSRepo · 2023-12-05T15:58:58Z

In recent days, I have been working on implementing this upscaler that could be useful to some. I compared the results generated by the original implementation in PyTorch, and they don't differ much. I did my best to implement the architecture, although I may have made a mistake. In the next few days, I'll be reviewing it carefully.

Results

image 128x128 rescaled	ESRGAN	Original 512 x 512

At the moment, only the RealESRGAN_x4plus_anime_6B.pth model is supported, you can use it specifying -um RealESRGAN_x4plus_anime_6B.pth.

FSSRepo · 2023-12-08T20:38:29Z

@leejet I have already reviewed the architecture and performed various tests on the upscaler, and everything seems to be working well. At the moment, it will only support one model due to the inflexibility of the current implementation.

Additionally, I have added the latest changes from ggml (backend). I had to make some modifications to model.cpp since the ggml_get_backend function no longer exists. I created a function sd_tiling that could be useful for VAE tiling, but for now, it is only used by the upscaler.

In 2 days, the missing kernels may already be available to add the Metal backend.

Here's a test of VAE tiling. I think we need to overlap and blend the colors to eliminate the seams.

Jonathhhan · 2023-12-09T00:19:14Z

@FSSRepo thanks you. It works great. Only that the seams are also visible with the upscaler (but much less than in your vae tiling example).

FSSRepo · 2023-12-09T00:37:35Z

@Jonathhhan some example, i can't see seams in the images upscaled, and dimens of the image

Jonathhhan · 2023-12-09T00:47:39Z

@FSSRepo true. It depends.
Here you can see it for example (not much):

FSSRepo · 2023-12-09T01:19:51Z

I will see if tomorrow I have time to add something to make the seams less noticeable.

leejet · 2023-12-09T09:43:31Z

@leejet I have already reviewed the architecture and performed various tests on the upscaler, and everything seems to be working well. At the moment, it will only support one model due to the inflexibility of the current implementation.

Additionally, I have added the latest changes from ggml (backend). I had to make some modifications to model.cpp since the ggml_get_backend function no longer exists. I created a function sd_tiling that could be useful for VAE tiling, but for now, it is only used by the upscaler.

In 2 days, the missing kernels may already be available to add the Metal backend.

Here's a test of VAE tiling. I think we need to overlap and blend the colors to eliminate the seams.

It seems like there are different color variations between different blocks. Could this be due to some subtle differences in the computation during the process?

FSSRepo · 2023-12-09T18:41:46Z

It seems like there are different color variations between different blocks. Could this be due to some subtle differences in the computation during the process?

The example I showed you is with VAE tiling; with the upscaler, it's subtle only if you pay attention to the details, or it will depend on the complexity of the image to notice the discontinuous cuts.

In the case of the upscaler, it can be mitigated to some extent by overlapping and combining colors in an interpolated manner. For VAE tiling, it would be something like color leveling and overlapping, although it still doesn't achieve a convincing result. In any case, I'm going to run tests.

Jonathhhan · 2023-12-09T19:06:54Z

I found the top answer in this reddit thread helpful for my understanding of optimizing upscaling:
https://www.reddit.com/r/StableDiffusion/comments/11ahfyo/seams_help_with_ultimate_upscale_for_a1111/?sort=new
Some comments are specific to A1111, but I guess most can be applied to all upscalers (I doubt, that anything from the link is new to you).

Green-Sky · 2023-12-10T12:33:13Z

my goto test fails in this pr:

log looks normal.

$ sd -m models/epicphotogasm_lastUnicorn-f16.gguf --lora-model-dir models/og/sd1/ -p "<lora:lcm-lora-sdv1-5:1>a lovely cat, cinematic" --sampling-method lcm --steps 4 --cfg-scale 1 -b 1
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5
[INFO]  stable-diffusion.cpp:4914 - loading model from 'models/epicphotogasm_lastUnicorn-f16.gguf'
[INFO]  model.cpp:615  - load models/epicphotogasm_lastUnicorn-f16.gguf using gguf format
[INFO]  stable-diffusion.cpp:4936 - Stable Diffusion 1.x
[INFO]  stable-diffusion.cpp:4942 - Stable Diffusion weight type: f16
[INFO]  stable-diffusion.cpp:5092 - total memory buffer size = 1972.80MB (clip 236.18MB, unet 1641.16MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:5098 - loading model from 'models/epicphotogasm_lastUnicorn-f16.gguf' completed, taking 0.51s
[INFO]  stable-diffusion.cpp:5112 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:4520 - loading LoRA from 'models/og/sd1/lcm-lora-sdv1-5.safetensors'
[INFO]  model.cpp:618  - load models/og/sd1/lcm-lora-sdv1-5.safetensors using safetensors format
[INFO]  stable-diffusion.cpp:5213 - lora 'lcm-lora-sdv1-5' applied, taking 0.47s
[INFO]  stable-diffusion.cpp:5940 - apply_loras completed, taking 0.47s
[INFO]  stable-diffusion.cpp:5969 - get_learned_condition completed, taking 11 ms
[INFO]  stable-diffusion.cpp:5979 - sampling using LCM method
[INFO]  stable-diffusion.cpp:5983 - generating image: 1/1 - seed 42
  |==================================================| 4/4 - 6.00it/s
[INFO]  stable-diffusion.cpp:5995 - sampling completed, taking 0.69s
[INFO]  stable-diffusion.cpp:6003 - generating 1 latent images completed, taking 0.69s
[INFO]  stable-diffusion.cpp:6005 - decoding 1 latents
[INFO]  stable-diffusion.cpp:6017 - latent 1 decoded, taking 0.73s
[INFO]  stable-diffusion.cpp:6021 - decode_first_stage completed, taking 0.73s
[INFO]  stable-diffusion.cpp:6026 - txt2img completed in 1.44s
[INFO]  main.cpp:521  - save result image to 'output.png'

but the genrated image,

does not.

Jonathhhan · 2023-12-10T13:06:58Z

@Green-Sky you are still using .gguf model files. Just use the unconverted .ckpt or .safetensonrs file (not sure, if that's the issue).

Green-Sky · 2023-12-10T13:33:43Z

@Green-Sky you are still using .gguf model files. Just use the unconverted .ckpt or .safetensonrs file (not sure, if that's the issue).

I preffer the smaller gguf files. On that topic, what happened to the converter program?

edit: Also, the gguf works fine on master.

leejet · 2023-12-10T14:12:15Z

@Green-Sky you are still using .gguf model files. Just use the unconverted .ckpt or .safetensonrs file (not sure, if that's the issue).

I preffer the smaller gguf files. On that topic, what happened to the converter program?

edit: Also, the gguf works fine on master.

The gguf file can still be used, and this issue should be independent of the file format.

FSSRepo · 2023-12-10T14:16:49Z

@Green-Sky can you test the commits of this PR, to locate the root of this issue?

Green-Sky · 2023-12-10T14:37:12Z

@Green-Sky can you test the commits of this PR, to locate the root of this issue?

the first commit in this pr, the first non-pr commit works 968226a

the cat looks a bit funny tho. (edit: removing the cinematic gives it its face back :) )

leejet · 2023-12-10T15:54:22Z

I tested it on the latest master code and it works fine.

.\bin\Release\sd.exe -m ..\models\epicphotogasm_lastUnicorn.safetensors --lora-model-dir ..\models\ -p "<lora:lcm-lora-sdv1-5:1>a lovely cat, cinematic" --sampling-method lcm --steps 4 --cfg-scale 1 -b 1

FSSRepo · 2023-12-10T16:05:36Z

@leejet can you test this PR??, I have not been able to replicate that color inversion error.

Cyberhan123 · 2023-12-23T03:36:43Z

@FSSRepo Okay, I'll try migrating your code now.

leejet · 2023-12-23T03:38:14Z

@leejet I hope you can pin this issue，so we can collect it in one place

Done.

Cyberhan123 · 2023-12-23T03:38:57Z

@leejet I hope you can pin this issue，so we can collect it in one place

Done.

Thx

leejet · 2023-12-23T05:05:05Z

@Green-Sky I couldn't replicate your issue using Colab. Based on the available information, I attempted a simple fix. Could you pull the latest code from this PR to see if it resolves your problem?

FSSRepo · 2023-12-23T05:48:22Z

@leejet slaren had already solved the synchronization issues in ggml, and had also synchronized the changes in my ggml repository. GreenSky has confirmed that everything is working well, except for VAE Tiling, which seems to be causing issues, But I have been conducting tests, and I cannot reproduce the VAE tiling error.

leejet · 2023-12-23T06:13:46Z

@FSSRepo keep in mind that it also happens when not tiling. (see my last picture)

@FSSRepo In the latest comment from GreenSky, he mentioned that it also occurs when not tiling.

leejet · 2023-12-23T06:27:55Z

@Green-Sky After pulling the latest code, did you run git submodule update to update ggml?

leejet · 2023-12-23T06:44:15Z

@FSSRepo I'm trying a similar approach to what was used in the master branch with ggml when using CUDA: using get_tensor_async followed by synchronize. To see if this approach helps resolve the issue.

Amin456789 · 2023-12-23T11:51:55Z

@FSSRepo @Cyberhan123 so excited about this, finally a gui! please make it a simple gui if possible for windows so we don't need to run it on a browser as browsers take ram, so the whole ram will be used by sd cpp only
also a dark mode will be great

Green-Sky · 2023-12-23T13:20:55Z

.	.	.

with the most simplest invocation

result/bin/sd -m ../../stable-diffusion-webui/models/Stable-diffusion/epicphotogasm_lastUnicorn.safetensors -p "a lovely cat" -b 9

421e39b

$ git submodule
 a0c2ec77a5ef8e630aff65bc535d13b9805cb929 ggml (remotes/origin/sd-cpp)

leejet · 2023-12-24T10:23:25Z

@FSSRepo I've consistently been unable to reproduce GreenSky's issue. I'm unsure whether to merge this PR now or wait until this issue is fixed. While everything works fine on my own device, evidently, problems arise in some devices. This issue has been consistently blocking me from merging this PR. I'd like to hear your opinion.

FSSRepo · 2023-12-24T10:55:42Z

@leejet I haven't been able to reproduce the issue either. I thought that if someone experiencing a similar problem with a T4 GPU, which is the same as Google Colab's, came across it, I might finally discover the root cause. However, after testing, I found no issues with Google Colab. It's frustrating.

I'm still not entirely sure if this problem originates from the first commit of this pull request or after I introduced the changes from ggml. If so, I could revert the ggml changes (removing Metal support) and keep only the essential components for the upscaler to function.

leejet · 2023-12-24T11:01:13Z

I'm still not entirely sure if this problem originates from the first commit of this pull request or after I introduced the changes from ggml.

@Green-Sky Can you help us test to determine which commit introduced the issue?

Amin456789 · 2023-12-26T12:25:01Z

why dont u guys merge it now and let people report the bugs, after all img2img got fixed after sometime too, im so hyped for new features of this lovely sd cpp!

FSSRepo · 2023-12-27T21:58:06Z

@leejet are you working to bring control net to this project?

Cyberhan123 · 2023-12-28T12:03:59Z

. . .

with the most simplest invocation

result/bin/sd -m ../../stable-diffusion-webui/models/Stable-diffusion/epicphotogasm_lastUnicorn.safetensors -p "a lovely cat" -b 9

421e39b

$ git submodule
 a0c2ec77a5ef8e630aff65bc535d13b9805cb929 ggml (remotes/origin/sd-cpp)

@Green-Sky I spent a long time tracking and even compared the output tensor files. I found that the CUDA core has accuracy problems in the decoder stage, which may cause memory overflow, so half of your pictures are solid colors.

Cyberhan123 · 2023-12-28T12:15:44Z

Thanks to @bssrdf reminder, I can debug clumsily. I traced all the processes and found that mul-mat may not be the source of the problem. On my cloud host, a complete image will be generated, but the parsing may fail.I now need someone to tell me testing techniques, because debugging cuda is too scary, there will be 512x512x3 float type data.

leejet · 2023-12-28T13:59:23Z

@FSSRepo @Green-Sky Given the relatively low probability of encountering the issue of generating invalid images, I'll merge this PR and #117. Then, I'll open a separate issue to follow up on this problem because it has been blocking the merging of this PR for quite some time.

leejet · 2023-12-28T14:01:01Z

@leejet are you working to bring control net to this project?

Not yet. My next step is to work on implementing SVD.

leejet · 2023-12-28T15:48:05Z

This PR has been merged. Thank you for your contributions!

FSSRepo · 2023-12-28T15:48:51Z

@leejet so cool bro, In that case, I will work on adding control net. I more or less understand how it works now, but what's still not entirely clear to me is what the zero_convolution it applies will be. Anyway, I'm reviewing it in more detail now that I have some time.

bssrdf · 2023-12-29T20:36:04Z

Thank you, @FSSRepo, for implementing ESRGAN. Really nice to have a try of this upscaler in stable-diffusion.cpp.
I managed to make another model RealESRGAN_x4plus work. This model is supposed to be for general images and it requires a bigger sized graph (bumped MAX_NODES to 4096). I'll try to get a PR ready with this model.

Original SD generated 512x512
Upscaled 2048x2048

voidastro4 · 2024-09-24T06:01:04Z

There are a multitude of upscalers far better than the 2 in this thread.
Would be nice if they worked with stable-diffusion.cpp

This site contains them all: https://openmodeldb.info

Some of the generally well regarded ESRGAN ones:
general purpose: https://openmodeldb.info/models/4x-NMKD-Siax-CX
anime: https://openmodeldb.info/models/4x-IllustrationJaNai-V1-ESRGAN

There are also the DAT models, they are generally better but much slower.

FSSRepo · 2024-09-24T12:27:33Z

Here we try to add simple architectures since they are easier to debug.

add esrgan upscaler

2a74c4a

FSSRepo marked this pull request as draft December 6, 2023 03:07

FSSRepo added 2 commits December 8, 2023 15:16

add sd_tiling

f140532

ggml: adapt to new backend

b5ade20

FSSRepo marked this pull request as ready for review December 8, 2023 20:27

FSSRepo added 2 commits December 9, 2023 13:05

Merge remote-tracking branch 'upstream/master'

d99e650

fix some conflicts

f83b742

FSSRepo and others added 7 commits December 10, 2023 13:42

Merge branch 'leejet:master' into master

10ac491

sd_tiling support overlapping + vae tiling

136474d

Merge branch 'master' of https://github.com/FSSRepo/stable-diffusion.cpp

1ad22f2

prepare to use metal as backend

2bcca30

support metal backend

05101e4

fix submodule

e641b8c

fix metal ggml-submodule

e1f5a1c

Cyberhan123 mentioned this pull request Dec 23, 2023

Is there anyone who can't generate images correctly? #122

Open

4 tasks

leejet added 2 commits December 23, 2023 12:32

Merge branch 'master' into pr75.head

fd5726c

synchronize after get tensor from backend

9ea2bcd

use ggml_backend_tensor_get_async and sync for cuda backend

421e39b

Cyberhan123 mentioned this pull request Dec 28, 2023

Fix mul-mat error for older GPUs ggerganov/ggml#669

Merged

leejet merged commit 004dfbe into leejet:master Dec 28, 2023
7 checks passed

stable-diffusion: implement ESRGAN upscaler + Metal Backend #104

stable-diffusion: implement ESRGAN upscaler + Metal Backend #104

Conversation

FSSRepo commented Dec 5, 2023

Results

FSSRepo commented Dec 8, 2023 • edited Loading

Jonathhhan commented Dec 9, 2023 • edited Loading

FSSRepo commented Dec 9, 2023 • edited Loading

Jonathhhan commented Dec 9, 2023 • edited Loading

FSSRepo commented Dec 9, 2023

leejet commented Dec 9, 2023

FSSRepo commented Dec 9, 2023 • edited Loading

Jonathhhan commented Dec 9, 2023 • edited Loading

Green-Sky commented Dec 10, 2023

Jonathhhan commented Dec 10, 2023

Green-Sky commented Dec 10, 2023 • edited Loading

leejet commented Dec 10, 2023

FSSRepo commented Dec 10, 2023 • edited Loading

Green-Sky commented Dec 10, 2023 • edited Loading

leejet commented Dec 10, 2023

FSSRepo commented Dec 10, 2023 • edited Loading

Cyberhan123 commented Dec 23, 2023

leejet commented Dec 23, 2023

Cyberhan123 commented Dec 23, 2023

leejet commented Dec 23, 2023

FSSRepo commented Dec 23, 2023 • edited Loading

leejet commented Dec 23, 2023 • edited Loading

leejet commented Dec 23, 2023

leejet commented Dec 23, 2023

Amin456789 commented Dec 23, 2023 • edited Loading

Green-Sky commented Dec 23, 2023

leejet commented Dec 24, 2023

FSSRepo commented Dec 24, 2023

leejet commented Dec 24, 2023

Amin456789 commented Dec 26, 2023

FSSRepo commented Dec 27, 2023

Cyberhan123 commented Dec 28, 2023

Cyberhan123 commented Dec 28, 2023 • edited Loading

leejet commented Dec 28, 2023

leejet commented Dec 28, 2023

leejet commented Dec 28, 2023

FSSRepo commented Dec 28, 2023

bssrdf commented Dec 29, 2023

voidastro4 commented Sep 24, 2024

FSSRepo commented Sep 24, 2024

FSSRepo commented Dec 8, 2023 •

edited

Loading

Jonathhhan commented Dec 9, 2023 •

edited

Loading

FSSRepo commented Dec 9, 2023 •

edited

Loading

Jonathhhan commented Dec 9, 2023 •

edited

Loading

FSSRepo commented Dec 9, 2023 •

edited

Loading

Jonathhhan commented Dec 9, 2023 •

edited

Loading

Green-Sky commented Dec 10, 2023 •

edited

Loading

FSSRepo commented Dec 10, 2023 •

edited

Loading

Green-Sky commented Dec 10, 2023 •

edited

Loading

FSSRepo commented Dec 10, 2023 •

edited

Loading

FSSRepo commented Dec 23, 2023 •

edited

Loading

leejet commented Dec 23, 2023 •

edited

Loading

Amin456789 commented Dec 23, 2023 •

edited

Loading

Cyberhan123 commented Dec 28, 2023 •

edited

Loading