Skip to content

Releases: huggingface/optimum

v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ

13 Dec 18:23
Compare
Choose a tag to compare

Transformers 4.36 compatiblity

Notably, the ONNX exports aten::scaled_dot_product_attention in a standardized way for the compatible models.

Extended ONNX support: timm, sentence-transformers, Phi, ESM

GPTQ for Mixtral

Work in progress.

  • add modules_in_block_to_quantize arg for gptq by @SunMarc in #1585

What's Changed

Full Changelog: v1.15.0...v1.16.0

v1.15.0: ROCMExecutionProvider support

06 Dec 10:34
Compare
Choose a tag to compare

ROCMExecutionProvider support

The Optimum ONNX Runtime integration is extended to officially support ROCMExecutionProvider. See more details in the documentation.

Extended ONNX export

The Swin2sr, DPT, GLPN, ConvNextv2 are now supported in the ONNX export.

What's Changed

New Contributors

Full Changelog: v1.14.0...v1.15.0

v1.14.1: Patch release

14 Nov 17:50
Compare
Choose a tag to compare

v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

07 Nov 13:54
Compare
Choose a tag to compare

ONNX

New architectures

Falcon

SpeechT5

Mistral

TrOCR

LCMs

Enable LCMs (available in in diffusers since v0.22.0) ONNX export and ORT inference by @echarlaix in #1469

from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

pipe = ORTLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images

Also enable ONNX export using the CLI :

optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/

Decoder refactorization

  • Add position ids as input during ONNX export by @fxmarty in #1381
  • Enable the export of only one decoder for decoder-only models by @echarlaix in #1257

GPTQ

  • Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in #1419
  • Disable exllamav2 for quantization by @SunMarc in #1482
  • Default to exllama when exllamav2 is disabled by @SunMarc in #1494
  • Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in #1479
  • Add support for CPU Inference by @vivekkhandelwal1 in #1496
  • Fix minimum version of auto-gptq by @fxmarty in #1504
  • switch to exllama_config instead of disabling exllamav2 by @SunMarc in #1505

Other changes and bugfixes

New Contributors

v1.13.3: Patch release

03 Nov 19:13
Compare
Choose a tag to compare

Patch release for transformers==4.34.1 compatibility. We will do a release next week for transformers==4.35 compatibility and new features. Please bear with us!

v1.13.2: Patch release

21 Sep 18:33
Compare
Choose a tag to compare
  • Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
  • Fix ONNX Runtime quantization compatibility for onnxruntime v1.16.0 by @echarlaix in #1405

v1.13.1: Patch release

08 Sep 15:57
Compare
Choose a tag to compare

Fix ONNX fp16 export that broke in 1.13.0.

What's Changed

  • Fix wrong dtype in the ONNX export by @fxmarty in #1369
  • Fix tests collection for TFLite export and trigger TFLite tests only when relevant by @fxmarty in #1368
  • upgrade min compatible optimum-intel version by @echarlaix in #1371
  • Fix fp16 ONNX export test by @fxmarty in #1373

v1.13.0: ONNX weight deduplication, ONNX export and ORT extension

08 Sep 09:30
Compare
Choose a tag to compare

Deduplicate Embedding / LM head weight in the ONNX export

Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: pytorch/pytorch#108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.

  • Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in #1326
  • Fix initializer detection for weight deduplication by @fxmarty in #1333

Extended ONNX Runtime support

ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

Additionally, the model SAM is now be default exported as a vision_encoder.onnx, and prompt_encoder_mask_decoder.onnx.

BetterTransformer supports Falcon

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

The function exllama_set_max_input_length from auto-gptq can now be used with Transformers GPTQ models.

  • Version bump + add max_input_length to gptq by @SunMarc in #1329

Other changes and bugfixes

New Contributors

Full Changelog: v1.12.0...v1.13.0

v1.12.0: AutoGPTQ integration, extended BetterTransformer support

23 Aug 12:27
Compare
Choose a tag to compare

AutoGPTQ integration

Part of AutoGPTQ library has been integrated in Optimum, with utilities to ease the integration in other Hugging Face libraries. Reference: https://huggingface.co/docs/optimum/llm_quantization/usage_guides/quantization

Extended BetterTransformer support

BetterTransformer now supports BLOOM and GPT-BigCode architectures.

Other changes and bugfixes

New Contributors

Full Changelog: v1.11.2...v1.12.0

v1.11.2: Patch release

17 Aug 11:47
Compare
Choose a tag to compare

Remove the Transformers version constraint on optimum[habana].

  • Remove Transformers version constraint on Optimum Habana #1290 by @regisss

Full Changelog: v1.11.1...v1.11.2