llama tests #157

zzhhjjj · 2024-04-30T13:23:59Z

Add end-to-end test for llama.

200 steps, 1M tokens batch size. 190M parameters llama model, tp=2, dp=4. Assert loss is lower than the target
Assert examples/train_tiny_llama.sh run successfully

tests/test_llama.py

src/nanotron/nn/layer_norm.py

xrsrke · 2024-05-06T09:30:07Z

src/nanotron/scaling/parametrization.py

@@ -37,6 +37,7 @@ def __init__(self, config: ModelArgs):
            TensorParallelColumnLinear: self._parametrize_column_linear,
            TensorParallelRowLinear: self._parametrize_row_linear,
            TritonRMSNorm: self._parametrize_layer_norm,
+            RMSNorm: self._parametrize_layer_norm,


add the same for SpectralMupParametrizator

examples/config_tiny_llama.yaml

.github/workflows/3d_parallelism_unit_tests.yaml

src/nanotron/models/llama.py

src/nanotron/nn/layer_norm.py

xrsrke

Good job! Left some changes

NouamaneTazi · 2024-05-13T12:57:46Z

We can disable flash attention automatically for old hardware:

import torch
def supports_flash_attention(device_id):
    """Check if a GPU supports FlashAttention."""
    major, minor = torch.cuda.get_device_capability(device_id)
    
    # Check if the GPU architecture is Ampere (SM 8.x) or newer (SM 9.0)
    is_sm8x = major == 8 and minor >= 0
    is_sm90 = major == 9 and minor == 0

    return is_sm8x or is_sm90

zzhhjjj added 10 commits April 30, 2024 13:21

llama tests

c565621

add dependencies

8f01f82

yaml

7033d24

end2end test with 8-t4 gpus, add diable flash attention

1df2792

Merge remote-tracking branch 'upstream/main' into haojun/tests

da7cf7a

test llama example with 8-t4

a3efab8

add flash attention

280cb6c

try to solve fsdp bug

673237b

reduce memory usage

cb5d5af

rename

e0f89c1

zzhhjjj requested review from xrsrke, NouamaneTazi and 3outeille May 6, 2024 08:27

Merge remote-tracking branch 'upstream/main' into haojun/tests

0e7a08e