Misc. bug: The cache_prompt parameter not working properly #10993

feikiss · 2024-12-27T09:37:24Z

Name and Version

version: 4149 (1bb30bf)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Problem description & steps to reproduce

The param "cache_prompt" seems not working as expected.
I'm using the llama.cpp in pure CPU env. Command to start the server: /llama-server -m /models/qwen2.5-1.5b-q8/qwen2.5-1.5b-instruct-q8_0.gguf -c 1024 --host 0.0.0.0 --port 8000 -dkvc --metrics --file /models/promts/tool.txt --keep -1
The URI I'm using: /v1/completions , body is as below:

{
"prompt":"LONG_PROMPT + short_question1,
"cache_prompt": true,
xxxx,// other params
}

If I invoke the API with the prompt format "LONG_PROMPT" + short_question again and again, it can always work well, and reply quickly, but when I asked one another prompt, just like "short_question2", then back to call with "LONG_PROMPT" + short_question1， it will coast much a long time. It seems the prompt cache for the "LONG_PROMPT" is lost if I change the prompt without the "LONG_PROMPT".

First Bad Commit

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

feikiss added the bug-unconfirmed label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: The cache_prompt parameter not working properly #10993

Misc. bug: The cache_prompt parameter not working properly #10993

feikiss commented Dec 27, 2024 •

edited

Loading

Misc. bug: The cache_prompt parameter not working properly #10993

Misc. bug: The cache_prompt parameter not working properly #10993

Comments

feikiss commented Dec 27, 2024 • edited Loading

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

feikiss commented Dec 27, 2024 •

edited

Loading