Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triton + sageattention error: RuntimeError: PassManager::run failed #6228

Open
zhaberator opened this issue Dec 26, 2024 · 3 comments
Open
Labels
Potential Bug User is reporting a bug. This should be tested.

Comments

@zhaberator
Copy link

zhaberator commented Dec 26, 2024

Expected Behavior

Expected image gen to start as intended

Actual Behavior

Image gen crashed on 1st step.

Steps to Reproduce

added --use-sage-attention tag
made a default workflow

Debug Logs

loaded completely 9.5367431640625e+25 4897.0483474731445 True
  0%|                                                                                           | 0/40 [00:00<?, ?it/s]ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'quant_per_block_int8_kernel' for 'sm_61'
ptxas info    : Function properties for quant_per_block_int8_kernel
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 80 registers, used 1 barriers, 384 bytes cmem[0]
main.c
   Создается библиотека main.lib и объект main.exp
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'quant_per_block_int8_kernel' for 'sm_61'
ptxas info    : Function properties for quant_per_block_int8_kernel
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 48 registers, used 1 barriers, 384 bytes cmem[0]
loc(callsite("C:\\cui\\python_embeded\\Lib\\site-packages\\sageattention\\attn_qk_int8_per_block.py":18:23 at "C:\\cui\\python_embeded\\Lib\\site-packages\\sageattention\\attn_qk_int8_per_block.py":78:55)): error: 'tt.fp_to_fp' op operand #0 must be ranked tensor of floating-point values, but got 'tensor<128x64xi8, #triton_gpu.dot_op<{opIdx = 0, parent = #triton_gpu.blocked<{sizePerThread = [4, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>}>>'
  0%|                                                                                           | 0/40 [00:02<?, ?it/s]
!!! Exception during processing !!! PassManager::run failed
Traceback (most recent call last):
  File "C:\cui\ComfyUI\execution.py", line 328, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
  File "C:\cui\python_embeded\Lib\site-packages\triton\runtime\jit.py", line 662, in run
    kernel = self.compile(
             ^^^^^^^^^^^^^
  File "C:\cui\python_embeded\Lib\site-packages\triton\compiler\compiler.py", line 286, in compile
    next_module = compile_ir(module, metadata)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\cui\python_embeded\Lib\site-packages\triton\backends\nvidia\compiler.py", line 329, in <lambda>
    stages["ttgir"] = lambda src, metadata: self.make_ttgir(src, metadata, options, self.capability)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\cui\python_embeded\Lib\site-packages\triton\backends\nvidia\compiler.py", line 195, in make_ttgir
    pm.run(mod)
RuntimeError: PassManager::run failed

Prompt executed in 0.12 seconds

Other

GPU: Nvidia GTX1080
Steps:

  1. Installed Python 3.11
  2. Installed torch 2.5.1
  3. Installed triton-3.1.0-cp311 via ComfyUI manager
  4. Installed sageattention-1.0.6 via ComfyUI manager
@zhaberator zhaberator added the Potential Bug User is reporting a bug. This should be tested. label Dec 26, 2024
@doctorpangloss
Copy link

the 1080 is shader model 6.7, which is not supported by triton. it requires 7.5 or higher

@zhaberator
Copy link
Author

zhaberator commented Dec 27, 2024

which is not supported by triton

Ok, why then I get this test program (written lower) working fine for me, without any errors?

import torch
import triton
import triton.language as tl

@triton.jit
def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    x = tl.load(x_ptr + offsets, mask=mask)
    y = tl.load(y_ptr + offsets, mask=mask)
    output = x + y
    tl.store(output_ptr + offsets, output, mask=mask)

def add(x: torch.Tensor, y: torch.Tensor):
    output = torch.empty_like(x)
    assert x.is_cuda and y.is_cuda and output.is_cuda
    n_elements = output.numel()
    grid = lambda meta: (triton.cdiv(n_elements, meta["BLOCK_SIZE"]),)
    add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
    return output

a = torch.rand(3, device="cuda")
b = a + a
b_compiled = add(a, a)
print(b_compiled - b)
print("If you see tensor([0., 0., 0.], device='cuda:0'), then it works")

@doctorpangloss
Copy link

doctorpangloss commented Dec 28, 2024

I am trying to tell you that your error is caused by the fact that your GPU is too old. CUDA features are added progressively so it's possible for simple test programs to pass but your actual application to fail.

https://github.com/triton-lang/triton?tab=readme-ov-file#compatibility

SM 8.0 is actually required by Triton.

loc(callsite("C:\\cui\\python_embeded\\Lib\\site-packages\\sageattention\\attn_qk_int8_per_block.py":18:23 at "C:\\cui\\python_embeded\\Lib\\site-packages\\sageattention\\attn_qk_int8_per_block.py":78:55)): error: 'tt.fp_to_fp' op operand #0 must be ranked tensor of floating-point values, but got 'tensor<128x64xi8, #triton_gpu.dot_op<{opIdx = 0, parent = #triton_gpu.blocked<{sizePerThread = [4, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>}>>'

the int8-int8 op that is referenced there and that Sage Attention uses isn't implemented by Triton for SM 6.5

in principle pascal can have it just fine. it's just triton doesn't implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Potential Bug User is reporting a bug. This should be tested.
Projects
None yet
Development

No branches or pull requests

2 participants