llama-server prompt cache support #10937

0x0539 · 2024-12-21T16:46:46Z

0x0539
Dec 21, 2024

Does llama-server support prompt caching between requests, similar to llama-cli prompt file? I have a use case where the prefix stays the same between requests, with just a few characters changing at the end.

Answered by mashdragon

Dec 27, 2024

Yes, it's on by default. If you submit multiple requests sequentially with the same prefix, the prompt decoding for that common prefix will be reused for subsequent requests.

View full answer

0x0539 · 2024-12-21T16:51:46Z

0x0539
Dec 21, 2024
Author

Oh wait maybe that's on by default:

        params.cache_prompt     = json_value(data, "cache_prompt",       true);

0 replies

mashdragon · 2024-12-27T20:03:32Z

mashdragon
Dec 27, 2024

Yes, it's on by default. If you submit multiple requests sequentially with the same prefix, the prompt decoding for that common prefix will be reused for subsequent requests.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-server prompt cache support #10937

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

llama-server prompt cache support #10937

0x0539 Dec 21, 2024

Replies: 2 comments

0x0539 Dec 21, 2024 Author

mashdragon Dec 27, 2024

0x0539
Dec 21, 2024

0x0539
Dec 21, 2024
Author

mashdragon
Dec 27, 2024