You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
build/bin/./llama-server --version
version: 4384 (14b699e)
built with cc (Debian 14.2.0-11) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
When running llama-server with the following command: ./build/bin/llama-server -fa -ctk q8_0 -ctv q8_0 -m ../models/phi-4-Q6_K.gguf --host 0.0.0.0 --port 8085
the same happens with llama3.2-3b so I don't think its model specific
sending a large request with chat history (full context length) crashes the server with : llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
New requests to the server are ignored
I think its related to the function : ggml_compute_forward_dup
the dst->type and src->type (8 vs 0 ) mismatch and there is no q* handler
Name and Version
build/bin/./llama-server --version
version: 4384 (14b699e)
built with cc (Debian 14.2.0-11) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
When running llama-server with the following command:
./build/bin/llama-server -fa -ctk q8_0 -ctv q8_0 -m ../models/phi-4-Q6_K.gguf --host 0.0.0.0 --port 8085
the same happens with llama3.2-3b so I don't think its model specific
sending a large request with chat history (full context length) crashes the server with :
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
New requests to the server are ignored
I think its related to the function :
ggml_compute_forward_dup
the dst->type and src->type (8 vs 0 ) mismatch and there is no q* handler
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: