llama-server prompt cache support #10937
-
Does llama-server support prompt caching between requests, similar to llama-cli prompt file? I have a use case where the prefix stays the same between requests, with just a few characters changing at the end. |
Beta Was this translation helpful? Give feedback.
Answered by
mashdragon
Dec 27, 2024
Replies: 2 comments
-
Oh wait maybe that's on by default:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Yes, it's on by default. If you submit multiple requests sequentially with the same prefix, the prompt decoding for that common prefix will be reused for subsequent requests. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
0x0539
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yes, it's on by default. If you submit multiple requests sequentially with the same prefix, the prompt decoding for that common prefix will be reused for subsequent requests.