diffusers multicontrolnet pipeline with paint with words #44

alexblattner · 2023-04-19T07:21:08Z

Hi, I'm not at your level and was wondering how I could add paint with words to my multicontrolnet pipeline. Here's code that works for example (partial):

controlnet = [
        ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_openpose", torch_dtype=torch.float16),
        ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-depth", torch_dtype=torch.float16),
        ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16),
    ]
pipe = StableDiffusionControlNetPipeline.from_pretrained(
  "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16,safety_checker=None, requires_safety_checker=False,
).to("cuda")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()

fimage=pipe(
      prompt,
      images,
      num_inference_steps=20,
      negative_prompt=n_prompt,
      controlnet_conditioning_scale=weights,
    )

I would appreciate your input on it and am ready to pay if necessary

The text was updated successfully, but these errors were encountered:

lwchen6309 · 2023-04-23T06:18:41Z

Hi, there

(1) I might be able to revise it directly if it's open-source. If so, please show me the repo link.

(2) To revise the multi-control (mc) pipeline, you can refer to the pipeline implementation

class PaintWithWord_StableDiffusionPipeline(StableDiffusionPipeline):

in 'paint_with_words.py' at
https://github.com/lwchen6309/paint-with-words-sd/blob/ae75a8f6d1279c501c17a2482164571962761816/paint_with_words/paint_with_words.py#L513
especially for the denoising step of call function at
https://github.com/lwchen6309/paint-with-words-sd/blob/ae75a8f6d1279c501c17a2482164571962761816/paint_with_words/paint_with_words.py#L783

        # 7. Denoising loop
        num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
        with self.progress_bar(total=num_inference_steps) as progress_bar:
            for i, t in enumerate(timesteps):
                step_index = (self.scheduler.timesteps == t).nonzero().item()
                sigma = self.scheduler.sigmas[step_index]

                latent_model_input = self.scheduler.scale_model_input(latents, t)
                
                # _t = t if not is_mps else t.float()
                encoder_hidden_states.update({
                        "SIGMA": sigma,
                        "WEIGHT_FUNCTION": weight_function,
                    })
                noise_pred_text = self.unet(
                    latent_model_input,
                    t,
                    encoder_hidden_states=encoder_hidden_states,
                ).sample

(a) As you can see here, the unet receive a conditional tensor encoder_hidden_states, which used to be a tensor but now replaced by a dict that consists of sigma, weight_funcion and the original tensor.

(b) You also have to replace the forward function of cross attention as at https://github.com/lwchen6309/paint-with-words-sd/blob/ae75a8f6d1279c501c17a2482164571962761816/paint_with_words/paint_with_words.py#L539
You can add (1) and (2) to mc pipeline to combine pww and mc pipeline.

(3) I'd also recommend you to try (latent couple)[https://github.com/opparco/stable-diffusion-webui-two-shot], which simply modify the unet input by adding text weighted map. Using this, you don't even need to inject the forward function of cross attention module but directly revise the denoising steps.

alexblattner · 2023-04-23T08:16:01Z

I appreciate your willingness to help, but honestly there's nothing else that needs to be checked beyond the libraries:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler,ModelMixin
Also, I want to keep my code private because there's sensitive info there, sorry about that.
maybe we can pass the multicontrolnet pipeline in there or something? also, I didn't understand a or b, I'm sorry about that...

I'll be completely honest with you. I don't know how either multicontrolnet or paintwithwords work beyond some theory.
For multicontrolnet all I know is that it uses multiple controlnet models to generate something (don't know how they're mixed together or how the weights are taken into account) and for paintwithwords I have no idea how you tell it to focus prompts on specific parts of an image (i know there's color mapping but not how it works).

Are we just adding more and more weights and it works? But paintwithwords doesn't use weights and won't things get corrupted eventually with all those models and weights?

lwchen6309 · 2023-04-23T11:47:14Z

In that case, I'm afraid it would be hard for you to revise the code without knowing the theory behind them, which is probably why (a) and (b) are difficult to understand.
To mix the mc pipeline and pww, it would be essential to know how they work.
I'm afraid I cannot help with this unfortunately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diffusers multicontrolnet pipeline with paint with words #44

diffusers multicontrolnet pipeline with paint with words #44

alexblattner commented Apr 19, 2023 •

edited

Loading

lwchen6309 commented Apr 23, 2023

alexblattner commented Apr 23, 2023 •

edited

Loading

lwchen6309 commented Apr 23, 2023

diffusers multicontrolnet pipeline with paint with words #44

diffusers multicontrolnet pipeline with paint with words #44

Comments

alexblattner commented Apr 19, 2023 • edited Loading

lwchen6309 commented Apr 23, 2023

alexblattner commented Apr 23, 2023 • edited Loading

lwchen6309 commented Apr 23, 2023

alexblattner commented Apr 19, 2023 •

edited

Loading

alexblattner commented Apr 23, 2023 •

edited

Loading