Add initial support for RWKV v6 #174

MollySophia · 2024-06-22T04:34:31Z

Add initial support for RWKV v6.
Tested sequence prefill and normal inference with RWKV v6 1.6B and it gets exactly the same texts with rwkv-pip-package when generating with FP32, top_k=0.
Precise testing of differences of logits is not done yet.

Regarding the tests, I wonder if the training of a tiny-rwkv-v6 is needed (since rwkv-x060-173m-pile is still a bit large for testing use) ?

Edit: I'm new to ggml :P Feel free to point out anything optimizable

sequence inferencing doesn't work correctly yet Signed-off-by: Molly Sophia <[email protected]>

@cryscan

Thanks to @cryscan Signed-off-by: Molly Sophia <[email protected]>

Signed-off-by: Molly Sophia <[email protected]>

saharNooby · 2024-06-22T07:01:00Z

HI!

Regarding the tests, I wonder if the training of a tiny-rwkv-v6 is needed

Testing tiny-rwkv in CI process serves 2 goals:

comprehensive testing before merge: you've verified that the model produces identical texts for FP32 on your OS & architecture; but tests of rwkv-tiny test all combinations of all formats (including quantized formats) and OS & architectures (that are available in GitHub Actions). They also compare whole logit vectors instead of texts (which look only at the top token). They also enable sanitizer, which detects memory use mistakes that may not be visible during manual testing.
comprehensive testing of each subsequent change after merge: these tests ensure that any change, no matter how small or unrelated, will not break all supported RWKV versions in all formats on most OS & architectures.

No one can expect from an engineer to test every single combination after each commit. I definitely did not expect if from myself, so that's why these tests were added.

I would strongly recommend training tiny-rwkv v6 and integrating it into existing tests. It would take some significant time, but I believe the resulting quality is worth it :)

Unfortunately, the code that I've used to train tiny-rwkv is not open-source. Nevertheless, here is some information, including hyerparameter values, that may help get you started.

MollySophia · 2024-06-23T15:04:30Z

HI!

Regarding the tests, I wonder if the training of a tiny-rwkv-v6 is needed

Testing tiny-rwkv in CI process serves 2 goals:

comprehensive testing before merge: you've verified that the model produces identical texts for FP32 on your OS & architecture; but tests of rwkv-tiny test all combinations of all formats (including quantized formats) and OS & architectures (that are available in GitHub Actions). They also compare whole logit vectors instead of texts (which look only at the top token). They also enable sanitizer, which detects memory use mistakes that may not be visible during manual testing.

comprehensive testing of each subsequent change after merge: these tests ensure that any change, no matter how small or unrelated, will not break all supported RWKV versions in all formats on most OS & architectures.

No one can expect from an engineer to test every single combination after each commit. I definitely did not expect if from myself, so that's why these tests were added.

I would strongly recommend training tiny-rwkv v6 and integrating it into existing tests. It would take some significant time, but I believe the resulting quality is worth it :)

Unfortunately, the code that I've used to train tiny-rwkv is not open-source. Nevertheless, here is some information, including hyerparameter values, that may help get you started.

Hi,
I've trained a tiny v6 model (tiny-rwkv-6v0-1m.pth.zip) with alpaca_data_cleaned.json using the official RWKV-LM trainer for 1 epoch, with no tokenizer.
The parameters are basicly the default ones in RWKV-LM repo's script, with the modifications below:

N_LAYER="12"
N_EMBD="64"
CTX_LEN="512"
vocab_size=256
head_size=8

The tiny model can output correct words, so I guess this is enough?

MollySophia · 2024-06-24T05:05:18Z

Update: The tiny-rwkv testing indeed revealed that there's still something wrong in my code, which I'm currently struggling to debug...
Both tiny-rwkv and rwkv v6 1.6b gets about -84 ~ -90 difference sum on logits comparing with the results of rwkv pip package. I guess this isn't normal since both are using FP32 computation?
@saharNooby Do you have any ideas about how to print out values of specific tensor on specific layer with rwkv.cpp?

Update 1: I've corrected some buggy code and now the diff sums of sanity checks don't look insane now :D

The expected_difference_sum values needs to be determined, after making sure FP32 logits are nearly the same. Signed-off-by: Molly Sophia <[email protected]>

Signed-off-by: Molly Sophia <[email protected]>

saharNooby

I suggest some minor code style improvements,

python/convert_pytorch_to_ggml.py

rwkv_graph.inc

rwkv_model_loading.inc

rwkv_operators_wkv_v6.inc

saharNooby · 2024-06-24T11:05:00Z

Great work!

There are couple of things remaining:

new tiny-rwkv models need to be added to test_quantization_format_compatibility.c -- you've already added Q5_1 and Q5_0 models, need to just register them
README.md should mention that we now support v6 (probably we should just additionally mention v6 wherever v5 is mentioned)
macOS build is failing with some ggml assertion

I will be able to approve the PR, but I'm not sure I can merge it. 4 months ago I've given the repository to the RWKV Foundation and stepped down as a maintainer. Last time I checked, @LaylBongers was the new maintainer. In any case, I think the PR should be reviewed and merged by a new maintainer, whoever they are.

MollySophia · 2024-06-24T11:07:49Z

Great work!

There are couple of things remaining:

new tiny-rwkv models need to be added to test_quantization_format_compatibility.c -- you've already added Q5_1 and Q5_0 models, need to just register them

README.md should mention that we now support v6 (probably we should just additionally mention v6 wherever v5 is mentioned)

macOS build is failing with some ggml assertion

I will be able to approve the PR, but I'm not sure I can merge it. 4 months ago I've given the repository to the RWKV Foundation and stepped down as a maintainer. Last time I checked, @LaylBongers was the new maintainer. In any case, I think the PR should be reviewed and merged by a new maintainer, whoever they are.

Thanks! I’ll do the changes later today.

Signed-off-by: Molly Sophia <[email protected]>

MollySophia · 2024-06-24T13:08:10Z

@saharNooby Hi! I've applied some changes mentioned above.

macOS build is failing with some ggml assertion

Regarding this, the assertion happens at:

static void ggml_vec_dot_q5_0_q8_0(const int n, float * restrict s, const void * restrict vx, const void * restrict vy) {
    const int qk = QK8_0;
    const int nb = n / qk;

    assert(n % qk == 0);
    assert(qk == QK5_0);

    const block_q5_0 * restrict x = vx;
    const block_q8_0 * restrict y = vy;

#if defined(__ARM_NEON)
    float32x4_t sumv0 = vdupq_n_f32(0.0f);
    float32x4_t sumv1 = vdupq_n_f32(0.0f);

    uint32_t qh0;
    uint32_t qh1;

    uint64_t tmp0[4];
    uint64_t tmp1[4];

    GGML_ASSERT(nb % 2 == 0); // TODO: handle odd nb      <----
    for (int i = 0; i < nb; i += 2) {

Using lldb, I can see that it happens in FFN: mul_mat(vw, k), where vw has the shape [n_embed, dim_ffn=int((n_embd * 3.5) // 32 * 32)].
Here for the tiny-rwkv v6 I've trained, n_embed = 64, while dim_ffn=224.
const int nb = n / qk = 224 / 32 = 7, thus violating the assertion.

I wonder what's the best solution here? Re-train a tiny-rwkv v6 with n_embed = 128, or fix ggml?

saharNooby · 2024-06-24T14:30:36Z

I wonder what's the best solution here?

There is a concern that this assertion also fails for larger RWKV v6 models. If so, it would mean macOS inference is broken. Is it possible for you to verify it? (probably just need to know relevant dims for the models)

If larger models are OK, then I think retraining tiny-rwkv would be quicker. The ggml used in rwkv.cpp is very outdated, and it would be complicated to update it (regardless of whether it is already fixed or you merge a fix into upstream).

saharNooby · 2024-06-24T14:32:36Z

Although if this // TODO: handle odd nb is already resolved in upstream ggml, you can try copy-pasting the new ggml_vec_dot_q5_0_q8_0 into our fork of ggml.

MollySophia · 2024-06-24T15:23:35Z

There is a concern that this assertion also fails for larger RWKV v6 models. If so, it would mean macOS inference is broken. Is it possible for you to verify it? (probably just need to know relevant dims for the models)

It should be okay if the dims of these operands of matmuls are all even multiples of 32. For ChannelMixing weights, there won't be any problem as long as n_embed >= 128. dim_att is equal to n_embed by default, which seems to be okay for all these existing models for now (n_embed = [512, 1024, 2048, 4096]).
There could potentially have problem though. Fixing our fork of ggml is not really hard either I guess? Just use arm neon for the even multiples of 32 parts, then do the rest using normal operations. This still needs some time though.

Although if this // TODO: handle odd nb is already resolved in upstream ggml, you can try copy-pasting the new ggml_vec_dot_q5_0_q8_0 into our fork of ggml.

Unfortunately, it isn't fixed upstream either :P
https://github.com/ggerganov/ggml/blob/5a4ecabfa5f503937a63acb11c6d008096ce5a1f/src/ggml-quants.c#L4636

MollySophia · 2024-06-24T16:39:48Z

I just tested RWKV v6 1.6B/3B/7B with Q5_0/Q5_1 on my M1 MacBook, and they all generate completions normally without failing the assertion. So I guess I can just re-train a tiny-rwkv with n_embed=128 tomorrow :P Maybe fix ggml someday when an irregular rwkv6 is released.

Signed-off-by: Molly Sophia <[email protected]>

LaylBongers · 2024-06-28T08:03:53Z

I've looked over the code and ran a smoke test. Everything looks good to me. I see you've already included a tiny-rwkv by now for the tests. If there's nothing else left I'll have it merged in.

MollySophia · 2024-07-01T05:50:54Z

I've looked over the code and ran a smoke test. Everything looks good to me. I see you've already included a tiny-rwkv by now for the tests. If there's nothing else left I'll have it merged in.

I guess there's nothing else left for now?

MollySophia added 3 commits June 21, 2024 22:36

Initial support for RWKV v6.

458298f

sequence inferencing doesn't work correctly yet Signed-off-by: Molly Sophia <[email protected]>

Fix sequence mode for RWKV v6

10d973a

Thanks to @cryscan Signed-off-by: Molly Sophia <[email protected]>

RWKV v6: Add lora merging

9d5797f

Signed-off-by: Molly Sophia <[email protected]>

Add tiny-rwkv test for RWKV v6

2c14946

The expected_difference_sum values needs to be determined, after making sure FP32 logits are nearly the same. Signed-off-by: Molly Sophia <[email protected]>

MollySophia force-pushed the rwkv6 branch 4 times, most recently from e7a1f2a to d1f95d0 Compare June 24, 2024 07:11

RWKV v6: Make outputs correct and update test values

edea0c2

Signed-off-by: Molly Sophia <[email protected]>

MollySophia force-pushed the rwkv6 branch from d1f95d0 to edea0c2 Compare June 24, 2024 07:47

saharNooby reviewed Jun 24, 2024

View reviewed changes

saharNooby linked an issue Jun 24, 2024 that may be closed by this pull request

Support RWKV v6 #168

Closed

Apply some code-format changes

2c8daac

Signed-off-by: Molly Sophia <[email protected]>

MollySophia force-pushed the rwkv6 branch from c8639b2 to b6865a0 Compare June 25, 2024 02:58

tests: Use tiny-rwkv-6v0-3m

29ca869

Signed-off-by: Molly Sophia <[email protected]>

MollySophia force-pushed the rwkv6 branch from b6865a0 to 29ca869 Compare June 25, 2024 03:03

Converter: Fix a problem when tensor cannot be converted to numpy

3c4c01f

Signed-off-by: Molly Sophia <[email protected]>

LaylBongers approved these changes Jun 28, 2024

View reviewed changes

saharNooby approved these changes Jun 30, 2024

View reviewed changes

PicoCreator merged commit 970a813 into RWKV:master Jul 2, 2024
1 check passed

saharNooby mentioned this pull request Jul 5, 2024

基本支持RWKV6 #171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial support for RWKV v6 #174

Add initial support for RWKV v6 #174

MollySophia commented Jun 22, 2024 •

edited

Loading

saharNooby commented Jun 22, 2024

MollySophia commented Jun 23, 2024

MollySophia commented Jun 24, 2024 •

edited

Loading

saharNooby left a comment

saharNooby commented Jun 24, 2024 •

edited

Loading

MollySophia commented Jun 24, 2024

MollySophia commented Jun 24, 2024

saharNooby commented Jun 24, 2024

saharNooby commented Jun 24, 2024

MollySophia commented Jun 24, 2024 •

edited

Loading

MollySophia commented Jun 24, 2024

LaylBongers commented Jun 28, 2024

MollySophia commented Jul 1, 2024

Add initial support for RWKV v6 #174

Add initial support for RWKV v6 #174

Conversation

MollySophia commented Jun 22, 2024 • edited Loading

saharNooby commented Jun 22, 2024

MollySophia commented Jun 23, 2024

MollySophia commented Jun 24, 2024 • edited Loading

saharNooby left a comment

Choose a reason for hiding this comment

saharNooby commented Jun 24, 2024 • edited Loading

MollySophia commented Jun 24, 2024

MollySophia commented Jun 24, 2024

saharNooby commented Jun 24, 2024

saharNooby commented Jun 24, 2024

MollySophia commented Jun 24, 2024 • edited Loading

MollySophia commented Jun 24, 2024

LaylBongers commented Jun 28, 2024

MollySophia commented Jul 1, 2024

MollySophia commented Jun 22, 2024 •

edited

Loading

MollySophia commented Jun 24, 2024 •

edited

Loading

saharNooby commented Jun 24, 2024 •

edited

Loading

MollySophia commented Jun 24, 2024 •

edited

Loading