Releases · huggingface/optimum

13 Dec 18:23

fxmarty

v1.16.0

c6ce536

v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ

Transformers 4.36 compatiblity

Notably, the ONNX exports aten::scaled_dot_product_attention in a standardized way for the compatible models.

Compatibility with Transformers 4.36 by @fxmarty in #1590

Extended ONNX support: timm, sentence-transformers, Phi, ESM

Add ONNX export for phi models by @xenova in #1579
Add ESM onnx support by @xenova in #1581
Add timm models export by @mht-sharma in #1587
Proper sentence-transformers ONNX export support by @fxmarty in #1589

GPTQ for Mixtral

Work in progress.

add modules_in_block_to_quantize arg for gptq by @SunMarc in #1585

What's Changed

Update version to 1.16.0.dev0 by @fxmarty in #1571
Use doc links in the README for subpackages by @fxmarty in #1572
Fix GPTQ compatibility with AutoGPTQ by @fxmarty in #1574
Refactoring EC2 CIs by @JingyaHuang in #1575
Remove inputs from sentence-transformers ONNX output by @fxmarty in #1593
Gptq tokenized dataset by @SunMarc in #1584
Run timm ONNX CI only once per day by @fxmarty in #1594
Run timm ONNX CI nightly v2 by @fxmarty in #1595

Full Changelog: v1.15.0...v1.16.0

Contributors

fxmarty, mht-sharma, and 3 other contributors

Assets 2

06 Dec 10:34

fxmarty

v1.15.0

8eaf54c

v1.15.0: ROCMExecutionProvider support

ROCMExecutionProvider support

The Optimum ONNX Runtime integration is extended to officially support ROCMExecutionProvider. See more details in the documentation.

Add AMD GPU support by @mht-sharma in #1546
Update ROCM ORT doc by @mht-sharma in #1564

Extended ONNX export

The Swin2sr, DPT, GLPN, ConvNextv2 are now supported in the ONNX export.

Swin2sr onnx by @baskrahmer in #1492
Add depth-estimation w/ DPT+GLPN by @xenova in #1529
Add convnextv2 onnx export by @xenova in #1560

What's Changed

Add OV export CLI to README by @echarlaix in #1526
Refactor NormalizedConfigs for GQA by @michaelbenayoun in #1539
Fix model patcher ONNX decoder export by @fxmarty in #1547
Add AMD to the documentation main page by @mfuntowicz in #1540
Add Optimum-amd documentation to the PR & release doc by @fxmarty in #1562
Add amd documentation by @echarlaix in #1557
Remove delete_doc_comment workflows by @regisss in #1565
optimum-nvidia by @mfuntowicz in #1566
Update installation instructions in README by @echarlaix in #1568
Update doc for AMD by @mht-sharma in #1570
Add amd extra to setup.py by @echarlaix in #1567

New Contributors

@xenova made their first contribution in #1529

Full Changelog: v1.14.0...v1.15.0

Contributors

mfuntowicz, fxmarty, and 6 other contributors

Assets 2

14 Nov 17:50

echarlaix

v1.14.1

f837dcd

v1.14.1: Patch release

Update optimum-intel required version by @echarlaix in #1521
Swin2sr onnx by @baskrahmer in #1492
Fix Falcon ONNX export with alibi by @fxmarty in #1524
Fix whisper v3 ONNX export by @fxmarty in #1525
Add new fusion argument to fix compatibility with onnxruntime v1.16.2 by @echarlaix in #1535
Add depth-estimation w/ DPT+GLPN by @xenova in #1529

Contributors

fxmarty, baskrahmer, and 2 other contributors

Assets 2

07 Nov 13:54

echarlaix

v1.14.0

076ecce

v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

ONNX

New architectures

Falcon

Add ONNX and ORT support for Falcon by @fxmarty in #1391

SpeechT5

SpeechT5 ONNX support by @fxmarty in #1404

Mistral

Add Mistral models ONNX export support by @echarlaix in #1425

TrOCR

Enable KV cache support by @fxmarty in #1456

LCMs

Enable LCMs (available in in diffusers since v0.22.0) ONNX export and ORT inference by @echarlaix in #1469

from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

pipe = ORTLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images

Also enable ONNX export using the CLI :

optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/

Decoder refactorization

Add position ids as input during ONNX export by @fxmarty in #1381
Enable the export of only one decoder for decoder-only models by @echarlaix in #1257

GPTQ

Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in #1419
Disable exllamav2 for quantization by @SunMarc in #1482
Default to exllama when exllamav2 is disabled by @SunMarc in #1494
Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in #1479
Add support for CPU Inference by @vivekkhandelwal1 in #1496
Fix minimum version of auto-gptq by @fxmarty in #1504
switch to exllama_config instead of disabling exllamav2 by @SunMarc in #1505

Other changes and bugfixes

Fix wrong dtype in the ONNX export by @fxmarty in #1369
Add support for loading quantization from config by @aarnphm #1363
Guard multiprocessing set start method by @fxmarty in #1377
Do not output KV cache when not using with-past in the ONNX export by @fxmarty in #1358
Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
Fix quantization for onnxruntime v1.16.0 by @echarlaix in #1405
Fix normalized config key for models architecture by @echarlaix in #1408
Fix arg in bettertransformer llama attention by @SunMarc in #1421
Ignore .xml files for Stable Diffusion ORT downloads by @baskrahmer in #1428
Falcon BetterTransformer requires transformers>=4.34 by @fxmarty in #1431
Fix llama ONNX export by @fxmarty in #1432
Update attention.py by @DongHande in #1416
Remove SharedDDP as it was deprecated from Transformers by @AdamLouly in #1443
Fix owlvit task detection by @fxmarty in #1453
Improve ONNX quantization doc by @fxmarty in #1451
Fix perceiver tests and dummy inputs for ONNX by @baskrahmer in #1449
Disable bart onnx export for text-classification and question-answering by @fxmarty in #1457
Fix ONNX exporter library_name by @baskrahmer in #1460
[ORT Training] Some important updates of ONNX Runtime training APIs by @JingyaHuang in #1335
Fix typo in BetterTransformer CLIP by @fxmarty in #1468
Fix custom architecture detection in onnx export by @fxmarty in #1472
Fix whisper export by @mht-sharma in #1503
Update Transformers dependency for Habana extra by @regisss in #1508
Fix argument error by @ranchlai in #1501
Remove attention mask patching by @fxmarty in #1509
Fix generation input by @echarlaix in #1512
Fix tests ORTModel by @fxmarty in #1517
Fix BT on transformers 4.35 release by @fxmarty in #1518

New Contributors

@aarnphm made their first contribution in #1363
@DongHande made their first contribution in #1416
@AlexKoff88 made their first contribution in #1479
@vivekkhandelwal1 made their first contribution in #1496
@ranchlai made their first contribution in #1501

Contributors

ranchlai, fxmarty, and 11 other contributors

Assets 2

03 Nov 19:13

fxmarty

v1.13.3

2e8308e

v1.13.3: Patch release

Patch release for transformers==4.34.1 compatibility. We will do a release next week for transformers==4.35 compatibility and new features. Please bear with us!

Falcon BetterTransformer requires transformers>=4.34 by @fxmarty #1431
Fix arg in bettertransformer llama attention by @SunMarc #1421
Update Transformers dependency for Habana extra by @regisss #1508
temporarily pin to transformers<4.35 by @fxmarty 6169310

Contributors

fxmarty, regisss, and SunMarc

Assets 2

21 Sep 18:33

echarlaix

v1.13.2

f105046

v1.13.2: Patch release

Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
Fix ONNX Runtime quantization compatibility for onnxruntime v1.16.0 by @echarlaix in #1405

Contributors

fxmarty and echarlaix

Assets 2

08 Sep 15:57

fxmarty

v1.13.1

1446e53

v1.13.1: Patch release

Fix ONNX fp16 export that broke in 1.13.0.

What's Changed

Fix wrong dtype in the ONNX export by @fxmarty in #1369
Fix tests collection for TFLite export and trigger TFLite tests only when relevant by @fxmarty in #1368
upgrade min compatible optimum-intel version by @echarlaix in #1371
Fix fp16 ONNX export test by @fxmarty in #1373

Contributors

fxmarty and echarlaix

Assets 2

08 Sep 09:30

fxmarty

v1.13.0

aaa07fe

v1.13.0: ONNX weight deduplication, ONNX export and ORT extension

Deduplicate Embedding / LM head weight in the ONNX export

Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: pytorch/pytorch#108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.

Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in #1326
Fix initializer detection for weight deduplication by @fxmarty in #1333

Extended ONNX Runtime support

ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.

Pix2Struct onnxruntime support by @krathul in #1296
Add MPT onnx and ORT support by @jiqing-feng in #1161
Donut iobinding by @IlyasMoutawwakil in #1209
Add encoder decoder model by @mht-sharma in #851

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

Additionally, the model SAM is now be default exported as a vision_encoder.onnx, and prompt_encoder_mask_decoder.onnx.

Add MPT onnx and ORT support by @jiqing-feng in #1161
Adds ONNX Export Support for Timm Models by @mht-sharma in #965
Add encoder decoder model by @mht-sharma in #851
Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately by @fxmarty in #1301

BetterTransformer supports Falcon

[BetterTransformer] Add falcon to BetterTransformer by @younesbelkada in #1343

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

The function exllama_set_max_input_length from auto-gptq can now be used with Transformers GPTQ models.

Version bump + add max_input_length to gptq by @SunMarc in #1329

Other changes and bugfixes

Update version to 1.12.1.dev0 following release by @fxmarty in #1312
Add GPTQ prefill benchmark by @fxmarty in #1313
Precise ORTModel documentation by @fxmarty in #1268
Improve BetterTransformer backward compatibility by @fxmarty in #1314
Improve ORTModel documentation by @fxmarty in #1245
Add bitsandbytes benchmark by @fxmarty in #1320
fix typo in log message by @AAnirudh07 in #1322
Support customize dtype for dummy generators by @JingyaHuang in #1307
Fix opset custom onnx export by @mht-sharma in #1331
Replace mpt to ernie custom export by @mht-sharma in #1332
Fix BT benchmark script by @fxmarty in #1344
Add name_or_path for donut generation by @fxmarty in #1345
send both negative prompt embeds to ORT SDXL by @ssube in #1339
add vae image processor by @echarlaix in #1219
add negative prompt test by @echarlaix in #1347
Add GPT BigCode to the BT documentation by @fxmarty in #1356
Add BT dummy objects by @fxmarty in #1355
Add text2text-generation-with-past test for encoder-decoder model by @mht-sharma in #1338
Fix sentence transformer export by @mht-sharma in #1366

New Contributors

@krathul made their first contribution in #1296
@AAnirudh07 made their first contribution in #1322
@jiqing-feng made their first contribution in #1161
@ssube made their first contribution in #1339

Full Changelog: v1.12.0...v1.13.0

Contributors

ssube, fxmarty, and 9 other contributors

Assets 2

23 Aug 12:27

fxmarty

v1.12.0

e00afaa

v1.12.0: AutoGPTQ integration, extended BetterTransformer support

AutoGPTQ integration

Part of AutoGPTQ library has been integrated in Optimum, with utilities to ease the integration in other Hugging Face libraries. Reference: https://huggingface.co/docs/optimum/llm_quantization/usage_guides/quantization

Add GPTQ Quantization by @SunMarc in #1216
Fix GPTQ doc by @regisss in #1267
Add AutoGPTQ benchmark by @fxmarty in #1292
Fix gptq params by @SunMarc in #1284

Extended BetterTransformer support

BetterTransformer now supports BLOOM and GPT-BigCode architectures.

Bt bloom by @baskrahmer in #1221
Support gpt_bigcode in bettertransformer by @fxmarty in #1252
Fix BetterTransformer starcoder init by @fxmarty in #1254
Fix BT starcoder fp16 by @fxmarty in #1255
SDPA dispatches to flash for MQA by @fxmarty in #1259
Check output_attentions is False in BetterTransformer by @fxmarty in #1306

Other changes and bugfixes

Update bug report template by @fxmarty in #1266
Fix ORTModule uses fp32 model issue by @jingyanwangms in #1264
Fix build PR doc workflow by @fxmarty in #1270
Avoid triggering stop job on label by @fxmarty in #1274
Update version following 1.11.1 patch by @fxmarty in #1275
Fix fp16 ONNX detection for decoder models by @fxmarty in #1276
Update version following 1.11.2 patch by @regisss in #1291
Pin tensorflow<=2.12.1 by @fxmarty in #1305
ONNX: disable text-generation models for sequence classification & fixes for transformers 4.32 by @fxmarty in #1308
Fix staging tests following transformers 4.32 release by @fxmarty in #1309
More fixes following transformers 4.32 release by @fxmarty in #1311

New Contributors

@SunMarc made their first contribution in #1216
@jingyanwangms made their first contribution in #1264

Full Changelog: v1.11.2...v1.12.0

Contributors

fxmarty, regisss, and 3 other contributors

Assets 2

17 Aug 11:47

regisss

v1.11.2

068789e

v1.11.2: Patch release

Remove the Transformers version constraint on optimum[habana].

Remove Transformers version constraint on Optimum Habana #1290 by @regisss

Full Changelog: v1.11.1...v1.11.2

Contributors

regisss

Assets 2

Releases: huggingface/optimum

v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ

Transformers 4.36 compatiblity

Extended ONNX support: timm, sentence-transformers, Phi, ESM

GPTQ for Mixtral

What's Changed

Contributors

v1.15.0: ROCMExecutionProvider support

ROCMExecutionProvider support

Extended ONNX export

What's Changed

New Contributors

Contributors

v1.14.1: Patch release

Contributors

v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

ONNX

New architectures

Falcon

SpeechT5

Mistral

TrOCR

LCMs

Decoder refactorization

GPTQ

Other changes and bugfixes

New Contributors

Contributors

v1.13.3: Patch release

Contributors

v1.13.2: Patch release

Contributors

v1.13.1: Patch release

What's Changed

Contributors

v1.13.0: ONNX weight deduplication, ONNX export and ORT extension

Deduplicate Embedding / LM head weight in the ONNX export

Extended ONNX Runtime support

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

BetterTransformer supports Falcon

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

Other changes and bugfixes

New Contributors

Contributors

v1.12.0: AutoGPTQ integration, extended BetterTransformer support

AutoGPTQ integration

Extended BetterTransformer support

Other changes and bugfixes

New Contributors

Contributors

v1.11.2: Patch release

Contributors