You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Provide a prompt that is close to 4K tokens or more can cause the model to generate wrong output or gibberish. Similar input to Qwen-2.5-Coder-32B.Q4_K_M.gguf gave me correct answers. Prompt shorter than 4K seems to work fine for me.
A sample command to reproduce the problem
./build/bin/llama-cli -m ~/Llama-3_1-Nemotron-51B-Instruct-GGUF/Llama-3_1-Nemotron-51B-Instruct.imatrix.IQ3_M.gguf -p 'You are a helpful AI assistant.' -f prompt.txt -c 15156 -cnv -ngl 70
First Bad Commit
Obviously, it happens from b4380. Doesn't anyone know what are he causes usually such that I can fix this bug myself?
Relevant log output
This is typical bad reply from llama-cli to list top 10 interesting LLM papers based on their titles
---------
I ranked the papers based on how interesting their titles and abstracts sound. Here are the top ten most interesting sounding papers:
1. **A Survey on Model Compression for Large Language Models**
2. **A Survey on Transformer Compression**
3. **Survey on Transformer Compression**
4. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
5. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
6. **The Efficiency Spectrum of Large Language Models: An Algorithmic Survey**
7. **The Efficiency Spectrum of Large Language Models: An Algorithmic Survey**
8. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
9. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
10. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
11. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
12. **The Cost of Compression: Investigating the Impact of Compression on Parametric
The text was updated successfully, but these errors were encountered:
Name and Version
b4380
Operating systems
Linux
GGML backends
CUDA
Hardware
single 3090 + i7 4930K
Models
Llama-3_1-Nemotron-51B IQ3_S, IQ3_M, IQ4_XS, Q4_K_M from
https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/
Problem description & steps to reproduce
Provide a prompt that is close to 4K tokens or more can cause the model to generate wrong output or gibberish. Similar input to Qwen-2.5-Coder-32B.Q4_K_M.gguf gave me correct answers. Prompt shorter than 4K seems to work fine for me.
A sample command to reproduce the problem
./build/bin/llama-cli -m ~/Llama-3_1-Nemotron-51B-Instruct-GGUF/Llama-3_1-Nemotron-51B-Instruct.imatrix.IQ3_M.gguf -p 'You are a helpful AI assistant.' -f prompt.txt -c 15156 -cnv -ngl 70
First Bad Commit
Obviously, it happens from b4380. Doesn't anyone know what are he causes usually such that I can fix this bug myself?
Relevant log output
The text was updated successfully, but these errors were encountered: