Skip to content

llama-server prompt cache support #10937

Answered by mashdragon
0x0539 asked this question in Q&A
Discussion options

You must be logged in to vote

Yes, it's on by default. If you submit multiple requests sequentially with the same prefix, the prompt decoding for that common prefix will be reused for subsequent requests.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by 0x0539
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants