-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stable-diffusion: implement ESRGAN upscaler + Metal Backend #104
Conversation
@leejet I have already reviewed the architecture and performed various tests on the upscaler, and everything seems to be working well. At the moment, it will only support one model due to the inflexibility of the current implementation. Additionally, I have added the latest changes from ggml (backend). I had to make some modifications to In 2 days, the missing kernels may already be available to add the Metal backend. Here's a test of VAE tiling. I think we need to overlap and blend the colors to eliminate the seams. |
@FSSRepo thanks you. It works great. Only that the seams are also visible with the upscaler (but much less than in your vae tiling example). |
@Jonathhhan some example, i can't see seams in the images upscaled, and dimens of the image |
@FSSRepo true. It depends. |
I will see if tomorrow I have time to add something to make the seams less noticeable. |
It seems like there are different color variations between different blocks. Could this be due to some subtle differences in the computation during the process? |
The example I showed you is with VAE tiling; with the upscaler, it's subtle only if you pay attention to the details, or it will depend on the complexity of the image to notice the discontinuous cuts. In the case of the upscaler, it can be mitigated to some extent by overlapping and combining colors in an interpolated manner. For VAE tiling, it would be something like color leveling and overlapping, although it still doesn't achieve a convincing result. In any case, I'm going to run tests. |
I found the top answer in this reddit thread helpful for my understanding of optimizing upscaling: |
my goto test fails in this pr: log looks normal.
|
@Green-Sky you are still using |
I preffer the smaller edit: Also, the |
The gguf file can still be used, and this issue should be independent of the file format. |
@Green-Sky can you test the commits of this PR, to locate the root of this issue? |
the first commit in this pr, the first non-pr commit works 968226a |
@leejet can you test this PR??, I have not been able to replicate that color inversion error. |
@FSSRepo Okay, I'll try migrating your code now. |
Done. |
Thx |
@Green-Sky I couldn't replicate your issue using Colab. Based on the available information, I attempted a simple fix. Could you pull the latest code from this PR to see if it resolves your problem? |
@leejet slaren had already solved the synchronization issues in ggml, and had also synchronized the changes in my ggml repository. GreenSky has confirmed that everything is working well, except for VAE Tiling, which seems to be causing issues, But I have been conducting tests, and I cannot reproduce the VAE tiling error. |
@Green-Sky After pulling the latest code, did you run |
@FSSRepo I'm trying a similar approach to what was used in the master branch with ggml when using CUDA: using |
@FSSRepo @Cyberhan123 so excited about this, finally a gui! please make it a simple gui if possible for windows so we don't need to run it on a browser as browsers take ram, so the whole ram will be used by sd cpp only |
@FSSRepo I've consistently been unable to reproduce GreenSky's issue. I'm unsure whether to merge this PR now or wait until this issue is fixed. While everything works fine on my own device, evidently, problems arise in some devices. This issue has been consistently blocking me from merging this PR. I'd like to hear your opinion. |
@leejet I haven't been able to reproduce the issue either. I thought that if someone experiencing a similar problem with a T4 GPU, which is the same as Google Colab's, came across it, I might finally discover the root cause. However, after testing, I found no issues with Google Colab. It's frustrating. I'm still not entirely sure if this problem originates from the first commit of this pull request or after I introduced the changes from ggml. If so, I could revert the ggml changes (removing Metal support) and keep only the essential components for the upscaler to function. |
@Green-Sky Can you help us test to determine which commit introduced the issue? |
why dont u guys merge it now and let people report the bugs, after all img2img got fixed after sometime too, im so hyped for new features of this lovely sd cpp! |
@leejet are you working to bring control net to this project? |
@Green-Sky I spent a long time tracking and even compared the output tensor files. I found that the CUDA core has accuracy problems in the decoder stage, which may cause memory overflow, so half of your pictures are solid colors. |
Thanks to @bssrdf reminder, I can debug clumsily. I traced all the processes and found that mul-mat may not be the source of the problem. On my cloud host, a complete image will be generated, but the parsing may fail.I now need someone to tell me testing techniques, because debugging cuda is too scary, there will be 512x512x3 float type data. |
@FSSRepo @Green-Sky Given the relatively low probability of encountering the issue of generating invalid images, I'll merge this PR and #117. Then, I'll open a separate issue to follow up on this problem because it has been blocking the merging of this PR for quite some time. |
Not yet. My next step is to work on implementing SVD. |
This PR has been merged. Thank you for your contributions! |
@leejet so cool bro, In that case, I will work on adding control net. I more or less understand how it works now, but what's still not entirely clear to me is what the zero_convolution it applies will be. Anyway, I'm reviewing it in more detail now that I have some time. |
Thank you, @FSSRepo, for implementing ESRGAN. Really nice to have a try of this upscaler in stable-diffusion.cpp. |
There are a multitude of upscalers far better than the 2 in this thread. This site contains them all: https://openmodeldb.info Some of the generally well regarded ESRGAN ones: There are also the DAT models, they are generally better but much slower. |
Here we try to add simple architectures since they are easier to debug. |
In recent days, I have been working on implementing this upscaler that could be useful to some. I compared the results generated by the original implementation in PyTorch, and they don't differ much. I did my best to implement the architecture, although I may have made a mistake. In the next few days, I'll be reviewing it carefully.
Results
At the moment, only the RealESRGAN_x4plus_anime_6B.pth model is supported, you can use it specifying
-um RealESRGAN_x4plus_anime_6B.pth
.