Improving STOP Sign Graffiti Generation Using ControlNet and Stable Diffusion #9584

davidemerolla · 2024-10-04T18:01:19Z

davidemerolla
Oct 4, 2024

Hi everyone,

I'm working on a project where I need to generate realistic images of damaged road signs. I am using Stable Diffusion and ControlNet. Specifically, I'm trying to create an image where a road sign (e.g., a STOP sign) appears vandalized or damaged with graffiti, scratches, or other wear and tear.

I've been using the Stable Diffusion v1.5 with ControlNet to condition the generation on normal maps, but I can't seem to get the "damaged" or "vandalized" effect right. The results look too abstract, and I'm aiming for something closer to the reality of a damaged road sign.

Here’s what I’ve done so far:

Input image: I’ve used a clean STOP sign as input.
Normal map: Generated using the Dense Prediction Transformer (DPT) model.
ControlNet: Implemented with Stable Diffusion v1.5.
Pipeline: I use StableDiffusionControlNetPipeline with CUDA enabled, and inference works fine.
Output: I'm getting graffiti, but it doesn't look as realistic or weathered as I'd like. Below are examples of the input, normal map, and output.

INPUT IMAGE:

REFERENCE IMAGE:

MASK:

OUTPUT IMAGE:

QUESTION
Does anyone have suggestions for improving the realism of the graffiti/damage on the road sign? Should I tweak the prompt, the normal maps, or perhaps experiment with a different ControlNet model?

I've also considered adjusting parameters like guidance_scale and num_inference_steps, but I'm not quite hitting the target.

Any advice, tips, or examples from your own projects would be greatly appreciated!

CODE

# Example of code block in Markdown format for GitHub

from PIL import Image
from transformers import pipeline
import numpy as np
import cv2
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import torch
from diffusers.utils import load_image

# CUDA info
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"Device count: {torch.cuda.device_count()}")
print(f"Current device: {torch.cuda.current_device()}")
print(f"Device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")

# Set the device to CUDA if available, otherwise use CPU
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the Dense Prediction Transformer (DPT) model for getting normal maps
depth_estimator = pipeline("depth-estimation", model="Intel/dpt-hybrid-midas", device=0)  # Use CUDA (GPU)

# Load the ControlNet model for normal maps
controlnet = ControlNetModel.from_pretrained(
    "fusing/stable-diffusion-v1-5-controlnet-normal", torch_dtype=torch.float16
)

# Load the Stable Diffusion pipeline with ControlNet
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

# Move the pipeline to CUDA if available
pipe = pipe.to(device)

# Enable efficient implementations using xformers for faster inference
# pipe.enable_xformers_memory_efficient_attention()

# Load the STOP sign image
image_input = load_image("STOP_sign.jpg")

# Preprocess to get the normal map using the depth estimator
image = depth_estimator(image_input)['predicted_depth'][0]
image = image.numpy()

# Normalize the depth map
image_depth = image.copy()
image_depth -= np.min(image_depth)
image_depth /= np.max(image_depth)

# Apply Sobel filters for normal map extraction
bg_threshold = 0.4

x = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3)
x[image_depth < bg_threshold] = 0

y = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=3)
y[image_depth < bg_threshold] = 0
z = np.ones_like(x) * np.pi * 2.0

# Stack the normal map components
image = np.stack([x, y, z], axis=2)
image /= np.sum(image ** 2.0, axis=2, keepdims=True) ** 0.5
image = (image * 127.5 + 127.5).clip(0, 255).astype(np.uint8)
image_normal = Image.fromarray(image)
image_normal.show()

# Generate the output with Stable Diffusion using the STOP sign as input and the normal map
# New prompt for a more realistic graffiti look
prompt = "A weathered black graffiti roughly spray-painted onto an old STOP sign, with faded colors and scratches. The graffiti is imperfect and blends naturally into the worn surface of the sign."

image_output = pipe(prompt, image_normal, num_inference_steps=50, guidance_scale=8.5).images[0]


# Save the result
image_output.save("stop_sign_with_graffiti.png")
image_output.show()

print("Graffiti added to STOP sign successfully!")

a-r-r-o-w · 2024-10-05T20:25:14Z

a-r-r-o-w
Oct 5, 2024
Maintainer

cc @asomoza

0 replies

asomoza · 2024-10-07T07:26:57Z

asomoza
Oct 7, 2024
Maintainer

Hi,

Does anyone have suggestions for improving the realism of the graffiti/damage on the road sign? Should I tweak the prompt, the normal maps, or perhaps experiment with a different ControlNet model?

The first suggestion is you change the model to one of the new ones, at least SDXL because SD 1.5 wasn't that great with photorealism and also for what you want to do, the tiny details make it realistic and a 512px image won't make it.

After that, I don't have the time now to play with some methods, but I can suggest you to try this:

Use a lineart standard, teed or anyline controlnet because they grab a lot better the tiny details (SDXL) and you can even draw over them.
You can make some templates for placing the graffity over the lineart images, this will require some clever code so you place them just over the signs but should be doable with masking and something like SAM.
You can also try with IP Adapters, you can use a single or multiple images of signs with graffity and use a mask and use certain blocks to make the graffiti over the new image.
You can use a instruct model, like instruct-pix2pix or cosxl and ask them to add the graffity over the sign, it sounds like a task ideal for those models. There's also some new ones coming from Meta and Google that sound promising for this task too.
You can just use, again some clever technique, to just paste the graffiti over the signs and then do a low strength img2img to finish the image with a realistic result.
Train a lora

These are some ideas from the top of my head, probably there's more you can play with, also you can mix them to get even better results, they aren't mutually exclusive.

1 reply

davidemerolla Oct 7, 2024
Author

thanks!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving STOP Sign Graffiti Generation Using ControlNet and Stable Diffusion #9584

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Improving STOP Sign Graffiti Generation Using ControlNet and Stable Diffusion #9584

davidemerolla Oct 4, 2024

Replies: 2 comments · 1 reply

a-r-r-o-w Oct 5, 2024 Maintainer

asomoza Oct 7, 2024 Maintainer

davidemerolla Oct 7, 2024 Author

davidemerolla
Oct 4, 2024

Replies: 2 comments 1 reply

a-r-r-o-w
Oct 5, 2024
Maintainer

asomoza
Oct 7, 2024
Maintainer

davidemerolla Oct 7, 2024
Author