Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Error with SD15 Attention Injection when batch size = 2; When max pixels > 2**21 #109

Open
Lia-C opened this issue Oct 9, 2024 · 1 comment

Comments

@Lia-C
Copy link

Lia-C commented Oct 9, 2024

What happened?

I am using SD15. When the batch size on "Empty Latent Image" is set to 2, I get a CUDA error with torch.nn.functional.scaled_dot_product_attentionfrom attention_sharing.py and attention_pytorch.

When the batch size is 1 with SD15, there is no issue.

It's fine for SDXL models--- for both "SDXL Conv Injection" as well as "SDXL Attention Injection", there is no error with larger batch sizes.

Steps to reproduce the problem

  1. Load the workflow.json
  2. Run the workflow with SD15, and change the batch size to 2. You should get an error. If you reduce the batch size to 1, the error should go away.
    Screenshot 2024-10-09 at 1 34 19 PM

What should have happened?

SD15 with transparency should have run with batch size 2, and produced 2 transparent images.

Commit where the problem happens

ComfyUI: 7718ada4eddf101d088b69e159011e4108286b5b
ComfyUI-layerdiffuse: 6e4aeb2

Sysinfo

Linux, NVIDIA L4 from google cloud console:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA L4           Off  | 00000000:00:03.0 Off |                    0 |
| N/A   70C    P0    32W /  72W |   3598MiB / 23034MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Console logs

got prompt
model_type EPS
Using xformers attention in VAE
Using xformers attention in VAE
loaded straight to GPU
Requested to load BaseModel
Loading 1 new model
Requested to load SD1ClipModel
Loading 1 new model
Requested to load BaseModel
Loading 1 new model
  0%|                                                              | 0/20 [00:00<?, ?it/s]
!!! Exception during processing!!! CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/home/ComfyUI/execution.py", line 186, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/home/ComfyUI/execution.py", line 86, in get_output_data
    return_values = map_node_over_list(
  File "/home/ComfyUI/execution.py", line 78, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/home/ComfyUI/nodes.py", line 2016, in sample
    return common_ksampler(
  File "/home/ComfyUI/nodes.py", line 1868, in common_ksampler
    samples = comfy.sample.sample(
  File "/home/ComfyUI/custom_nodes/ComfyUI-Impact-Pack/modules/impact/sample_error_enhancer.py", line 22, in informative_sample
    raise e
  File "/home/ComfyUI/custom_nodes/ComfyUI-Impact-Pack/modules/impact/sample_error_enhancer.py", line 9, in informative_sample
    return original_sample(*args, **kwargs)  # This code helps interpret error messages that occur within exceptions but does not have any impact on other operations.
  File "/home/ComfyUI/comfy/sample.py", line 85, in sample
    samples = sampler.sample(
  File "/home/ComfyUI/comfy/samplers.py", line 1118, in sample
    return sample(
  File "/home/ComfyUI/comfy/samplers.py", line 972, in sample
    return cfg_guider.sample(
  File "/home/ComfyUI/comfy/samplers.py", line 934, in sample
    output = self.inner_sample(
  File "/home/ComfyUI/comfy/samplers.py", line 888, in inner_sample
    samples = sampler.sample(
  File "/home/ComfyUI/comfy/samplers.py", line 703, in sample
    samples = self.sampler_function(
  File "/home/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ComfyUI/comfy/k_diffusion/sampling.py", line 175, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
  File "/home/ComfyUI/comfy/samplers.py", line 378, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "/home/ComfyUI/comfy/samplers.py", line 845, in __call__
    return self.predict_noise(*args, **kwargs)
  File "/home/ComfyUI/comfy/samplers.py", line 848, in predict_noise
    return sampling_function(
  File "/home/ComfyUI/comfy/samplers.py", line 341, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "/home/ComfyUI/comfy/samplers.py", line 248, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
  File "/home/ComfyUI/comfy/model_base.py", line 120, in apply_model
    model_output = self.diffusion_model(
  File "/home/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 1058, in forward
    h = forward_timestep_embed(
  File "/home/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 64, in forward_timestep_embed
    x = layer(x, context, transformer_options)
  File "/home/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ComfyUI/comfy/ldm/modules/attention.py", line 854, in forward
    x = block(x, context=context[i], transformer_options=transformer_options)
  File "/home/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ComfyUI/custom_nodes/ComfyUI-layerdiffuse/lib_layerdiffusion/attention_sharing.py", line 253, in forward
    return func(self, x, context, transformer_options)
  File "/home/ComfyUI/comfy/ldm/modules/attention.py", line 691, in forward
    n = self.attn1(n, context=context_attn1, value=value_attn1)
  File "/home/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ComfyUI/custom_nodes/ComfyUI-layerdiffuse/lib_layerdiffusion/attention_sharing.py", line 239, in forward
    x = optimized_attention(q, k, v, self.heads)
  File "/home/ComfyUI/comfy/ldm/modules/attention.py", line 406, in attention_xformers
    return attention_pytorch(q, k, v, heads, mask)
  File "/home/ComfyUI/comfy/ldm/modules/attention.py", line 435, in attention_pytorch
    out = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Prompt executed in 13.60 seconds

Workflow json file

workflow (2).json

Additional information

No response

@Lia-C
Copy link
Author

Lia-C commented Oct 15, 2024

I have debugged this and concluded that SD15 errors out for max pixel size > 2**21.

The rule is: H x W x N <= 2**21, where H=height, W=width, N=batch_size.

So:
Batch size of 1: Works up to 1408p. Anything above this fails: 1472p and above
Batch size of 2: Works up to 960p. Anything above this fails: 1024p and above

Here are some examples of ones that fail with the above torch.nn.functional.scaled_dot_product_attention error:
(1472, 1472 1)
(1536, 1536, 1)
(2048, 2048, 1)

And here are ones that work:
(344, 1344, 1)
(1408, 1408, 1)
(1408, 1472, 1)
(960, 1024, 2)
(960, 960, 2)
(512, 512, 2)

@Lia-C Lia-C changed the title [Bug]: Error when batch size = 2 with SD15 Attention Injection [Bug]: Error when batch size = 2 with SD15 Attention Injection, Max pixel size Oct 15, 2024
@Lia-C Lia-C changed the title [Bug]: Error when batch size = 2 with SD15 Attention Injection, Max pixel size [Bug]: Error when batch size = 2 with SD15 Attention Injection; when max pixels > 2**21 Oct 15, 2024
@Lia-C Lia-C changed the title [Bug]: Error when batch size = 2 with SD15 Attention Injection; when max pixels > 2**21 [Bug]: Error with SD15 Attention Injection when batch size = 2; when max pixels > 2**21 Oct 15, 2024
@Lia-C Lia-C changed the title [Bug]: Error with SD15 Attention Injection when batch size = 2; when max pixels > 2**21 [Bug]: Error with SD15 Attention Injection when batch size = 2; When max pixels > 2**21 Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant