Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. bug: context shift results in error #10958

Open
gompa opened this issue Dec 23, 2024 · 0 comments
Open

Misc. bug: context shift results in error #10958

gompa opened this issue Dec 23, 2024 · 0 comments

Comments

@gompa
Copy link

gompa commented Dec 23, 2024

Name and Version

build/bin/./llama-server --version
version: 4384 (14b699e)
built with cc (Debian 14.2.0-11) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Problem description & steps to reproduce

When running llama-server with the following command:
./build/bin/llama-server -fa -ctk q8_0 -ctv q8_0 -m ../models/phi-4-Q6_K.gguf --host 0.0.0.0 --port 8085
the same happens with llama3.2-3b so I don't think its model specific

sending a large request with chat history (full context length) crashes the server with :
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
New requests to the server are ignored
I think its related to the function : ggml_compute_forward_dup
the dst->type and src->type (8 vs 0 ) mismatch and there is no q* handler

First Bad Commit

No response

Relevant log output

request: POST /v1/chat/completions 192.168.1.59 200
slot launch_slot_: id  0 | task 613 | processing task
slot update_slots: id  0 | task 613 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 3817
slot update_slots: id  0 | task 613 | kv cache rm [3520, end)
slot update_slots: id  0 | task 613 | prompt processing progress, n_past = 3817, n_tokens = 297, progress = 0.077810
slot update_slots: id  0 | task 613 | prompt done, n_past = 3817, n_tokens = 297
slot update_slots: id  0 | task 613 | slot context shift, n_keep = 0, n_left = 4095, n_discard = 2047
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
fatal error
fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
fatal error
fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant