The FUTURE of AI Image Generation - Combining LLMs with Stable Diffusion #13350

DonKDev · 2023-09-22T01:12:51Z

DonKDev
Sep 22, 2023

First of all, I'd like to thank everyone in the community who works tirelessly making AI Image Generation accessible to all of us. Also, thank you for the recent release of version 1.6.0 for SDXL, it's just crazy.

I'd like to initiate a new discussion here and attempt to bring all the key developers together. With the recent announcement of Dall-E 3 (https://openai.com/dall-e-3), I became aware that OpenAI has successfully incorporated text into image generation, thus combining both models. Sure, with ControlNet it is also possible to incorporate logos or letters, but this is not the same.

The ability to enhance prompts through a Language Model (LLM) or integrate text into images should represent a new goal for us as an open community.

After some research, I came across this team (https://github.com/oobabooga/text-generation-webui) trying to do the same as A1111 did. Unfortunately, I'm not a programmer and don't no if this is possible, ... but my idea is ... “Could it be possible in the future to combine both projects into one AI HUB?”

Because I believe that if we can achieve this, it would lead us to an entirely new path for the future.

I'm eagerly looking forward to your comments :)

CalculonPrime · 2023-09-22T23:09:31Z

CalculonPrime
Sep 22, 2023

I started a similar thread several days ago titled, "Improving prompt understanding in Stable Diffusion (e.g. LLM integration)," but no one responded. Maybe this isn't the right place for that type of discussion? If not, where?

0 replies

BarrenWardo · 2023-09-23T06:04:44Z

BarrenWardo
Sep 23, 2023

Well, I do like the idea of combining both of them to create this new product. However, I do think that the process of combining and forming a single product with both capabilities might be lengthy.

To start, I believe it would be better to create an extension that either connects directly to the OpenAI API or runs locally through the Oogabooga API or something similar. Initially, running both locally can be quite demanding unless you have a well-endowed PC setup, but connecting to the OpenAI API seems like a better option. It should be somewhat similar to the Open Interpreter, which offers the option to either connect to the API or run locally. Then we can simply communicate with the LLM to modify the prompt and send it for generation according to the required changes, and so on.

Many people are already using LLMs for prompt creation right now. So, it will make it even more enjoyable directly in the WebUI.

So, I don't know if people are interested in it or not, but I'm definitely up for it.

0 replies

PierrunoYT · 2024-02-10T20:02:50Z

PierrunoYT
Feb 10, 2024

I had the same idea and would be so cool. The main reason I would use this is to generate prompts or improve them if I understand the discussion correctly.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The FUTURE of AI Image Generation - Combining LLMs with Stable Diffusion #13350

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

The FUTURE of AI Image Generation - Combining LLMs with Stable Diffusion #13350

DonKDev Sep 22, 2023

Replies: 3 comments

CalculonPrime Sep 22, 2023

BarrenWardo Sep 23, 2023

PierrunoYT Feb 10, 2024

DonKDev
Sep 22, 2023

CalculonPrime
Sep 22, 2023

BarrenWardo
Sep 23, 2023

PierrunoYT
Feb 10, 2024