Replies: 3 comments
-
I started a similar thread several days ago titled, "Improving prompt understanding in Stable Diffusion (e.g. LLM integration)," but no one responded. Maybe this isn't the right place for that type of discussion? If not, where? |
Beta Was this translation helpful? Give feedback.
-
Well, I do like the idea of combining both of them to create this new product. However, I do think that the process of combining and forming a single product with both capabilities might be lengthy. To start, I believe it would be better to create an extension that either connects directly to the OpenAI API or runs locally through the Oogabooga API or something similar. Initially, running both locally can be quite demanding unless you have a well-endowed PC setup, but connecting to the OpenAI API seems like a better option. It should be somewhat similar to the Open Interpreter, which offers the option to either connect to the API or run locally. Then we can simply communicate with the LLM to modify the prompt and send it for generation according to the required changes, and so on. Many people are already using LLMs for prompt creation right now. So, it will make it even more enjoyable directly in the WebUI. So, I don't know if people are interested in it or not, but I'm definitely up for it. |
Beta Was this translation helpful? Give feedback.
-
I had the same idea and would be so cool. The main reason I would use this is to generate prompts or improve them if I understand the discussion correctly. |
Beta Was this translation helpful? Give feedback.
-
First of all, I'd like to thank everyone in the community who works tirelessly making AI Image Generation accessible to all of us. Also, thank you for the recent release of version 1.6.0 for SDXL, it's just crazy.
I'd like to initiate a new discussion here and attempt to bring all the key developers together. With the recent announcement of Dall-E 3 (https://openai.com/dall-e-3), I became aware that OpenAI has successfully incorporated text into image generation, thus combining both models. Sure, with ControlNet it is also possible to incorporate logos or letters, but this is not the same.
The ability to enhance prompts through a Language Model (LLM) or integrate text into images should represent a new goal for us as an open community.
After some research, I came across this team (https://github.com/oobabooga/text-generation-webui) trying to do the same as A1111 did. Unfortunately, I'm not a programmer and don't no if this is possible, ... but my idea is ... “Could it be possible in the future to combine both projects into one AI HUB?”
Because I believe that if we can achieve this, it would lead us to an entirely new path for the future.
I'm eagerly looking forward to your comments :)
Beta Was this translation helpful? Give feedback.
All reactions