Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval bug: Llama-3_1-Nemotron-51B ggufs generates incorrect answers/gibberish when prompt near or exceed 4K tokens #11002

Open
ymcki opened this issue Dec 28, 2024 · 0 comments

Comments

@ymcki
Copy link
Contributor

ymcki commented Dec 28, 2024

Name and Version

b4380

Operating systems

Linux

GGML backends

CUDA

Hardware

single 3090 + i7 4930K

Models

Llama-3_1-Nemotron-51B IQ3_S, IQ3_M, IQ4_XS, Q4_K_M from
https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/

Problem description & steps to reproduce

Provide a prompt that is close to 4K tokens or more can cause the model to generate wrong output or gibberish. Similar input to Qwen-2.5-Coder-32B.Q4_K_M.gguf gave me correct answers. Prompt shorter than 4K seems to work fine for me.

A sample command to reproduce the problem
./build/bin/llama-cli -m ~/Llama-3_1-Nemotron-51B-Instruct-GGUF/Llama-3_1-Nemotron-51B-Instruct.imatrix.IQ3_M.gguf -p 'You are a helpful AI assistant.' -f prompt.txt -c 15156 -cnv -ngl 70

First Bad Commit

Obviously, it happens from b4380. Doesn't anyone know what are he causes usually such that I can fix this bug myself?

Relevant log output

This is typical bad reply from llama-cli to list top 10 interesting LLM papers based on their titles
---------
I ranked the papers based on how interesting their titles and abstracts sound. Here are the top ten most interesting sounding papers:

1. **A Survey on Model Compression for Large Language Models**
2. **A Survey on Transformer Compression**
3. **Survey on Transformer Compression**
4. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
5. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
6. **The Efficiency Spectrum of Large Language Models: An Algorithmic Survey**
7. **The Efficiency Spectrum of Large Language Models: An Algorithmic Survey**
8. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
9. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
10. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
11. **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
12. **The Cost of Compression: Investigating the Impact of Compression on Parametric
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant