You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Related question - in the absence of quantization the KV cache workign reliabely, can I resize the KV cache size? I can't seem to load slots of 200MB (100MB is possible).
First Bad Commit
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered:
Name and Version
version: 4391 (9ba399d)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.1.0
Operating systems
Mac (M4 Max / 128 GB)
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
./build/bin/llama-server -m /Users/mattsinalco/.cache/huggingface/hub/models--unsloth--Llama-3.3-70B-Instruct-GGUF/snapshots/0c14ebbedd129fb190c8241facca9a360e81c650/Llama-3.3-70B-Instruct-Q4_K_M.gguf -md /Users/mattsinalco/.cache/huggingface/hub/models--unsloth--Llama-3.2-1B-Instruct-GGUF/snapshots/a5594fb18df5dfc6b43281423fcce6750cd92de5/Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 99 -ngld 99 -fa --port 8034 --ctx-size 8192 --ctx-size-draft 8192 --draft-min 0 --draft-max 16 -np 7 --host 0.0.0.0 --slots --slot-save-path /Users/mattsinalco/mathias/caching -ctk q4_1 -ctv q4_1
Sometimes (reproducibly) gives me this:
/Users/mattsinalco/mathias/llama.cpp/ggml/src/ggml-metal/ggml-metal.m:1263: unsupported op
ggml_metal_encode_node: error: unsupported op 'CPY'
Other quantizations give me this:
zsh: segmentation fault ./build/bin/llama-server -m -md -ngl 99 -ngld 99 -fa --port 8034 --ctx-size
Related question - in the absence of quantization the KV cache workign reliabely, can I resize the KV cache size? I can't seem to load slots of 200MB (100MB is possible).
First Bad Commit
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: