Release GPTQModel v1.5.1 · ModelCloud/GPTQModel

What's Changed

🎉 2025!

⚡ Added QuantizeConfig.device to clearly define which device is used for quantization: default = auto. Non-quantized models are always loaded on cpu by-default and each layer is moved to QuantizeConfig.device during quantization to minimize vram usage.
💫 Improve QuantLinear selection from optimum.
🐛 Fix attn_implementation_autoset compat in latest transformers.

Add QuantizeConfig.device and use. by @Qubitium in #950
fix hf_select_quant_linear by @LRL-ModelCloud in #966
update vllm gptq_marlin code by @ZX-ModelCloud in #967
fix cuda:0 not a enum device by @CSY-ModelCloud in #968
fix marlin info for non-cuda device by @Qubitium in #972
fix backend str bug by @CL-ModelCloud in #973
hf select quant_linear with pack by @LRL-ModelCloud in #969
remove auto select BACKEND.IPEX by @CSY-ModelCloud in #975
fix autoround received a device_map by @CSY-ModelCloud in #976
use enum instead of magic number by @CSY-ModelCloud in #979
use new ci docker images by @CSY-ModelCloud in #980
fix flash attntion was auto loaded on cpu for pretrained model by @CSY-ModelCloud in #981
fix old transformer doesn't have _attn_implementation_autoset by @CSY-ModelCloud in #982
fix gptbigcode test temporally by @CSY-ModelCloud in #983
fix version parsing by @CSY-ModelCloud in #985

Full Changelog: v1.5.0...v1.5.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v1.5.1

What's Changed

Contributors