-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting for Ternary DiT #470
Comments
I think just updating the ggml submodule to a more recent version should be most of the work. |
Thank you for your suggestion. Updating the ggml submodule to a more recent version sounds like a good starting point. However, I must admit that I have really limited experience with writing kernel codes😵. |
There has been one for a while that uses a categorical classifier. Do you mean embedding based? Here: #331 Edit: oh, its you. hahah |
😇👀 |
@stduhpf I will try to make a pr to update to latest, or newer ggml. We can then try to do some stuff based on that. @Lucky-Lance Why did you user Lables and not Embedding(s) for the classifier? This makes its somewhat unusable for text-to-image. Are there any plans to "distil" something like flux schnell, so training a new TerDiT on the outputs? |
Label-based generation was just an attempt I made previously. In fact, I've always wanted to work on a text-to-image model, but the actual deployment only resulted in reduced memory usage without improving inference speed. This has made me less confident about further pursuing text-to-image models. If I receive support, I would certainly train a text-to-image model afterwards. Thanks a lot for your support 🤩🥳. |
I noticed you're facing some problems while upgrading ggml. :( Just checking in to see if you're still planning to support it, and if so, can it be completed within one or two months..? |
Well, it all depends on the individuals motivation and time, so no promises. 😅 That being said, after updating ggml, I did a test, where i quantize flux to tq1_0/tq2_0 (5w/byte and 4w/byte) and it runs. On cpu only. And produces noise. So it might or might not work. I will probably continue updating ggml and adopting code changes to sd.cpp, before trying any architectural stuff. |
Oh, truly grateful for your efforts! 😆 Hoping everything goes smoothly. |
Link to the "quantization" pr in llama.cpp that added tq1/2 ggerganov/llama.cpp#8151 |
Another thing, that I leave to the future is looking into ik's fork with better bitnet support https://github.com/ikawrakow/ik_llama.cpp |
Hi, a month has slipped away, and I was wondering if the support is still part of the plan 😌 |
Ternary data types are now supported. Which means that in theory, any model with the same overall architecture as a supported model like SD3 or Flux, but trained in ternary, would work. |
Haven't had time to work on sd.cpp this month, sorry. Yea the bitnets have extra normalization layers in places. |
OK I will give it a try 😆 |
Hi,
Ternary quantization has become popular and has demonstrated computational speedups and power reductions, as demonstrated in works like llama.cpp and bitnet.cpp. We trained the first ternary DiT network, DiT is a popular structure nowadays for text to image generation. We would like to know if we can be assisted in realizing the deployment of it on stable-diffusion.cpp.
We asked llama.cpp for help and they advised me to come here for guidance link.
The text was updated successfully, but these errors were encountered: