Skip to content

Latest commit

 

History

History
272 lines (264 loc) · 7.03 KB

llm_recipes.md

File metadata and controls

272 lines (264 loc) · 7.03 KB

LLMs Quantization Recipes

Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ), and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with PyTorch, Intel® Extension for PyTorch and Intel® Extension for Transformers.
This document aims to publish the specific recipes we achieved for the popular LLMs and help users to quickly get an optimized LLM with limited 1% accuracy loss.

Notes:

Large Language Models Recipes

Models SQ INT8 WOQ INT8 WOQ INT4
EleutherAI/gpt-j-6b
facebook/opt-1.3b
facebook/opt-30b
meta-llama/Llama-2-7b-hf
meta-llama/Llama-2-13b-hf
meta-llama/Llama-2-70b-hf
tiiuae/falcon-7b
tiiuae/falcon-40b
baichuan-inc/Baichuan-13B-Chat
baichuan-inc/Baichuan2-13B-Chat
baichuan-inc/Baichuan2-7B-Chat
bigscience/bloom-1b7
databricks/dolly-v2-12b
EleutherAI/gpt-neox-20b
mistralai/Mistral-7B-v0.1
THUDM/chatglm2-6b
THUDM/chatglm3-6b WIP WIP

Detail recipes can be found HERE.

Notes:

  • This model list comes from IPEX.
  • The WIP recipes will be published soon.

Large Language Models Accuracy

Model lambada_openai
FP32 SQ INT8 WOQ INT8 WOQ INT4 GPTQ WOQ INT4 AutoRound
ACC ACC Ratio ACC Ratio ACC Ratio ACC Ratio
baichuan-inc/Baichuan-13B-Chat 67.57% 69.07% 1.0222 67.55% 0.9997 68.12% 1.0081 66.93% 0.9905
baichuan-inc/Baichuan2-13B-Chat 71.51% 75.57% 1.0568 71.57% 1.0008 70.81% 0.9902 N/A N/A
baichuan-inc/Baichuan2-7B-Chat 67.67% 68.06% 1.0058 67.61% 0.9991 67.90% 1.0034 N/A N/A
bigscience/bloom-1b7 46.34% 47.99% 1.0356 46.21% 0.9972 46.90% 1.0121 N/A N/A
databricks/dolly-v2-12b 64.35% N/A N/A 63.92% 0.9933 N/A N/A N/A N/A
EleutherAI/gpt-j-6b 68.31% 68.27% 0.9994 68.27% 0.9994 68.35% 1.0006 68.02% 0.9958
EleutherAI/gpt-neox-20b 72.33% N/A N/A 72.29% 0.9994 71.74% 0.9918 N/A N/A
facebook/opt-1.3b 57.89% 57.68% 0.9964 58.12% 1.0040 58.26% 1.0064 N/A N/A
facebook/opt-30b 71.49% 71.78% 1.0041 71.53% 1.0006 71.59% 1.0014 71.80% 1.0043
meta-llama/Llama-2-13b-hf 76.77% 76.25% 0.9932 76.89% 1.0016 77.66% 1.0116 76.60% 0.9978
meta-llama/Llama-2-70b-hf 79.64% 79.14% 0.9937 79.62% 0.9997 80.09% 1.0057 79.68% 1.0005
meta-llama/Llama-2-7b-hf 73.92% 73.45% 0.9936 73.90% 0.9997 73.84% 0.9989 N/A N/A
mistralai/Mistral-7B-v0.1 75.90% N/A N/A 75.80% 0.9987 76.25% 1.0046 75.74% 0.9979
THUDM/chatglm2-6b 53.23% 52.86% 0.9930 53.00% 0.9957 52.90% 0.9938 52.92% 0.9942
THUDM/chatglm3-6b 59.09% N/A N/A 59.03% 0.9990 N/A N/A N/A N/A
tiiuae/falcon-40b 77.22% 76.95% 0.9965 77.18% 0.9995 77.55% 1.0043 77.82% 1.0078
tiiuae/falcon-7b 74.67% 76.63% 1.0262 74.73% 1.0008 75.06% 1.0052 74.00% 0.9910