Challenges in Quantizing llama.cpp Models on Windows #10730
Unanswered
jasonsu123
asked this question in
Q&A
Replies: 1 comment 3 replies
-
🤖: Sure, here's a concise guide to help you through the process on Windows 10:
This should help you quantize your model to 👨: Btw, if step 2 fails, you can download the pre-build executables from https://github.com/ggerganov/llama.cpp/releases |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
Previously, I asked how to convert the safetensors model from the Hugging Face website into a GGUF file. Later, someone provided instructional resources, and I'm currently able to convert it to a GGUF file using the convert_hf_to_gguf.py script from llama.cpp.
The process is as follows:
Enter the following commands in the CMD:
Then, I use the downloaded llama.cpp code in the llama.cpp folder to execute the following command:
python convert_hf_to_gguf.py D:\Ollama\TAIDE-LX-8B-Chat-Alpha1 --outfile D:\Ollama\TAIDE-LX-8B-Chat-Alpha1-q8_0.gguf --outtype q8_0
However, I'm unable to proceed further with the quantization.
For example, when I try to quantize to q4
I encounter an issue:
Copyerror: argument --outtype: invalid choice: 'q4_0' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto')
It seems that I need to use the ./quantize or ./llama-quantize command, such as the examples in the tutorials:
However, I'm using Windows 10, so how can I modify these commands to work in my terminal?
It seems that the quantization process can only be done in a Linux environment, but I'm a programming newbie and don't know how to compile the quantize tool and then use it to quantize the GGUF model.
Could someone please provide a simple tutorial on how to do this?
I would really appreciate it.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions