triton + sageattention error: RuntimeError: PassManager::run failed #6228

zhaberator · 2024-12-26T10:45:23Z

Expected Behavior

Expected image gen to start as intended

Actual Behavior

Image gen crashed on 1st step.

Steps to Reproduce

added --use-sage-attention tag
made a default workflow

Debug Logs

loaded completely 9.5367431640625e+25 4897.0483474731445 True
  0%|                                                                                           | 0/40 [00:00<?, ?it/s]ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'quant_per_block_int8_kernel' for 'sm_61'
ptxas info    : Function properties for quant_per_block_int8_kernel
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 80 registers, used 1 barriers, 384 bytes cmem[0]
main.c
   Создается библиотека main.lib и объект main.exp
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'quant_per_block_int8_kernel' for 'sm_61'
ptxas info    : Function properties for quant_per_block_int8_kernel
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 48 registers, used 1 barriers, 384 bytes cmem[0]
loc(callsite("C:\\cui\\python_embeded\\Lib\\site-packages\\sageattention\\attn_qk_int8_per_block.py":18:23 at "C:\\cui\\python_embeded\\Lib\\site-packages\\sageattention\\attn_qk_int8_per_block.py":78:55)): error: 'tt.fp_to_fp' op operand #0 must be ranked tensor of floating-point values, but got 'tensor<128x64xi8, #triton_gpu.dot_op<{opIdx = 0, parent = #triton_gpu.blocked<{sizePerThread = [4, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>}>>'
  0%|                                                                                           | 0/40 [00:02<?, ?it/s]
!!! Exception during processing !!! PassManager::run failed
Traceback (most recent call last):
  File "C:\cui\ComfyUI\execution.py", line 328, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
  File "C:\cui\python_embeded\Lib\site-packages\triton\runtime\jit.py", line 662, in run
    kernel = self.compile(
             ^^^^^^^^^^^^^
  File "C:\cui\python_embeded\Lib\site-packages\triton\compiler\compiler.py", line 286, in compile
    next_module = compile_ir(module, metadata)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\cui\python_embeded\Lib\site-packages\triton\backends\nvidia\compiler.py", line 329, in <lambda>
    stages["ttgir"] = lambda src, metadata: self.make_ttgir(src, metadata, options, self.capability)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\cui\python_embeded\Lib\site-packages\triton\backends\nvidia\compiler.py", line 195, in make_ttgir
    pm.run(mod)
RuntimeError: PassManager::run failed

Prompt executed in 0.12 seconds

Other

GPU: Nvidia GTX1080
Steps:

Installed Python 3.11
Installed torch 2.5.1
Installed triton-3.1.0-cp311 via ComfyUI manager
Installed sageattention-1.0.6 via ComfyUI manager

doctorpangloss · 2024-12-26T15:51:27Z

the 1080 is shader model 6.7, which is not supported by triton. it requires 7.5 or higher

zhaberator · 2024-12-27T14:09:39Z

which is not supported by triton

Ok, why then I get this test program (written lower) working fine for me, without any errors?

import torch
import triton
import triton.language as tl

@triton.jit
def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
    pid = tl.program_id(axis=0)
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    mask = offsets < n_elements
    x = tl.load(x_ptr + offsets, mask=mask)
    y = tl.load(y_ptr + offsets, mask=mask)
    output = x + y
    tl.store(output_ptr + offsets, output, mask=mask)

def add(x: torch.Tensor, y: torch.Tensor):
    output = torch.empty_like(x)
    assert x.is_cuda and y.is_cuda and output.is_cuda
    n_elements = output.numel()
    grid = lambda meta: (triton.cdiv(n_elements, meta["BLOCK_SIZE"]),)
    add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
    return output

a = torch.rand(3, device="cuda")
b = a + a
b_compiled = add(a, a)
print(b_compiled - b)
print("If you see tensor([0., 0., 0.], device='cuda:0'), then it works")

doctorpangloss · 2024-12-28T00:38:18Z

I am trying to tell you that your error is caused by the fact that your GPU is too old. CUDA features are added progressively so it's possible for simple test programs to pass but your actual application to fail.

https://github.com/triton-lang/triton?tab=readme-ov-file#compatibility

SM 8.0 is actually required by Triton.

loc(callsite("C:\\cui\\python_embeded\\Lib\\site-packages\\sageattention\\attn_qk_int8_per_block.py":18:23 at "C:\\cui\\python_embeded\\Lib\\site-packages\\sageattention\\attn_qk_int8_per_block.py":78:55)): error: 'tt.fp_to_fp' op operand #0 must be ranked tensor of floating-point values, but got 'tensor<128x64xi8, #triton_gpu.dot_op<{opIdx = 0, parent = #triton_gpu.blocked<{sizePerThread = [4, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>}>>'

the int8-int8 op that is referenced there and that Sage Attention uses isn't implemented by Triton for SM 6.5

in principle pascal can have it just fine. it's just triton doesn't implement it.

zhaberator added the Potential Bug User is reporting a bug. This should be tested. label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triton + sageattention error: RuntimeError: PassManager::run failed #6228

triton + sageattention error: RuntimeError: PassManager::run failed #6228

zhaberator commented Dec 26, 2024 •

edited

Loading

doctorpangloss commented Dec 26, 2024

zhaberator commented Dec 27, 2024 •

edited

Loading

doctorpangloss commented Dec 28, 2024 •

edited

Loading

triton + sageattention error: RuntimeError: PassManager::run failed #6228

triton + sageattention error: RuntimeError: PassManager::run failed #6228

Comments

zhaberator commented Dec 26, 2024 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

doctorpangloss commented Dec 26, 2024

zhaberator commented Dec 27, 2024 • edited Loading

doctorpangloss commented Dec 28, 2024 • edited Loading

zhaberator commented Dec 26, 2024 •

edited

Loading

zhaberator commented Dec 27, 2024 •

edited

Loading

doctorpangloss commented Dec 28, 2024 •

edited

Loading