Flash Attention for Neuron #883

apoorvtintin · 2024-12-11T20:08:07Z

This PR adds support for flash attention kernel for Neuron implemented through Neuron Kernel Interface (NKI).

The flash attention kernel works with TRN1 and TRN2.

axlearn/common/flash_attention/neuron_attention.py

kelvin-zou · 2024-12-11T22:18:46Z

axlearn/common/flash_attention/neuron_attention.py

@@ -0,0 +1,129 @@
+from absl import logging


Also please add comments for the file.

kelvin-zou · 2024-12-11T23:37:26Z

axlearn/common/flash_attention/neuron_attention.py

+    return out
+
+
+def _mha_forward(query, key, value, bias, causal, softmax_scale):


Can we get a support for segment ID and dropout as well? Both are quite needed nowadays.

kelvin-zou · 2024-12-11T23:39:06Z

axlearn/common/flash_attention/utils.py

@@ -159,6 +159,21 @@ def jit_attn(query, key, value, bias, segment_ids):

        return jit_attn

+    elif backend == "neuron":
+        from axlearn.common.flash_attention.neuron_attention import (


On demand import is kind of risky, we can live with it for functions inside neuron_attention.py, can we at least get it as a header import for files outside of neuron_attention.py?

kelvin-zou · 2024-12-11T23:39:22Z

axlearn/common/flash_attention/neuron_attention.py

+    k = key.transpose(0, 2, 3, 1)  # [batch_size, num_heads, d_model, kv_seq_len]
+    v = value.transpose(0, 2, 1, 3)  # [batch_size, num_heads, kv_seq_len, d_model]
+
+    import neuronxcc.nki.language as nl


Please add pylint here

ruomingp

Will defer to @kelvin-zou for approval.

apoorvtintin requested review from ruomingp and markblee as code owners December 11, 2024 20:08

apivovarov reviewed Dec 11, 2024

View reviewed changes

axlearn/common/flash_attention/neuron_attention.py Outdated Show resolved Hide resolved

apoorvtintin mentioned this pull request Dec 11, 2024

[DO-NOT-MERGE] PR encompassing all changes needed to support neuron on Axlearn #886

Open

Flash Attention for Neuron

347f522

apoorvtintin force-pushed the mainline_upstream_fa branch from eab90f9 to 347f522 Compare December 11, 2024 22:57

kelvin-zou reviewed Dec 11, 2024

View reviewed changes

ruomingp reviewed Dec 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention for Neuron #883

Flash Attention for Neuron #883

apoorvtintin commented Dec 11, 2024

kelvin-zou Dec 11, 2024

kelvin-zou Dec 11, 2024

kelvin-zou Dec 11, 2024

kelvin-zou Dec 11, 2024

kelvin-zou Dec 11, 2024

ruomingp left a comment

		return out


		def _mha_forward(query, key, value, bias, causal, softmax_scale):

Flash Attention for Neuron #883

Are you sure you want to change the base?

Flash Attention for Neuron #883

Conversation

apoorvtintin commented Dec 11, 2024

kelvin-zou Dec 11, 2024

Choose a reason for hiding this comment

kelvin-zou Dec 11, 2024

Choose a reason for hiding this comment

kelvin-zou Dec 11, 2024

Choose a reason for hiding this comment

kelvin-zou Dec 11, 2024

Choose a reason for hiding this comment

kelvin-zou Dec 11, 2024

Choose a reason for hiding this comment

ruomingp left a comment

Choose a reason for hiding this comment