pycuda do not support triton.autotune when len(configs) > 1 #466
Replies: 4 comments
-
What interface does Triton expect from third-party GPU array types? |
Beta Was this translation helpful? Give feedback.
-
@inducer triton need a uint64 memory address. It seems that triton first use the address is ok. but second use the address it occur error. |
Beta Was this translation helpful? Give feedback.
-
May or may not be relevant but I have been successfully using https://github.com/jax-ml/jax-triton/tree/main if you wanna check how they pass in their arrays |
Beta Was this translation helpful? Give feedback.
-
@mitkotak i try to use triton.autotune in the jax-triton, it also not support triton.autotune. Fortunately, i use the cupy library instead of pycuda, the cupy is ok. Thaks very much |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
I am using openai triton to write some operator. I want to use pycuda array instead of pytorch tensor, so i will not dependent on the pytorch(for some platform pytorch is not support). but it do not support triton.autotune when len(configs) > 1. The source code is here:
https://github.com/l1351868270/implicit_gemm.triton/blob/main/vector-add.py
To Reproduce
Steps to reproduce the behavior:
File "/root/implicit_gemm.triton/vector-add.py", line 97, in
output_triton = add(tl_a, tl_b, tl_c, size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/implicit_gemm.triton/vector-add.py", line 80, in add
add_kernel[grid](x, y, output, n_elements)
File "/root/miniconda3/envs/cu121/lib/python3.11/site-packages/triton/runtime/jit.py", line 347, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/cu121/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 156, in run
timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/cu121/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 156, in
timings = {config: self._bench(*args, config=config, **kwargs) for config in pruned_configs}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/cu121/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 133, in _bench
return do_bench(kernel_call, warmup=self.num_warmups, rep=self.num_reps, quantiles=(0.5, 0.2, 0.8))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/cu121/lib/python3.11/site-packages/triton/testing.py", line 120, in do_bench
fn()
File "/root/miniconda3/envs/cu121/lib/python3.11/site-packages/triton/runtime/autotuner.py", line 114, in kernel_call
self.fn.run(
File "/root/miniconda3/envs/cu121/lib/python3.11/site-packages/triton/runtime/jit.py", line 693, in run
kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
File "/root/miniconda3/envs/cu121/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 366, in call
self.launch(*args, **kwargs)
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
7.the triton code is here https://github.com/triton-lang/triton/blob/185299e204ea6e163cc717da137526939002659b/third_party/nvidia/backend/driver.py#L273-L278
Beta Was this translation helpful? Give feedback.
All reactions