Releases: huggingface/optimum
v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ
Transformers 4.36 compatiblity
Notably, the ONNX exports aten::scaled_dot_product_attention
in a standardized way for the compatible models.
Extended ONNX support: timm, sentence-transformers, Phi, ESM
- Add ONNX export for phi models by @xenova in #1579
- Add ESM onnx support by @xenova in #1581
- Add timm models export by @mht-sharma in #1587
- Proper sentence-transformers ONNX export support by @fxmarty in #1589
GPTQ for Mixtral
Work in progress.
What's Changed
- Update version to 1.16.0.dev0 by @fxmarty in #1571
- Use doc links in the README for subpackages by @fxmarty in #1572
- Fix GPTQ compatibility with AutoGPTQ by @fxmarty in #1574
- Refactoring EC2 CIs by @JingyaHuang in #1575
- Remove inputs from sentence-transformers ONNX output by @fxmarty in #1593
- Gptq tokenized dataset by @SunMarc in #1584
- Run timm ONNX CI only once per day by @fxmarty in #1594
- Run timm ONNX CI nightly v2 by @fxmarty in #1595
Full Changelog: v1.15.0...v1.16.0
v1.15.0: ROCMExecutionProvider support
ROCMExecutionProvider support
The Optimum ONNX Runtime integration is extended to officially support ROCMExecutionProvider
. See more details in the documentation.
- Add AMD GPU support by @mht-sharma in #1546
- Update ROCM ORT doc by @mht-sharma in #1564
Extended ONNX export
The Swin2sr, DPT, GLPN, ConvNextv2 are now supported in the ONNX export.
- Swin2sr onnx by @baskrahmer in #1492
- Add depth-estimation w/ DPT+GLPN by @xenova in #1529
- Add
convnextv2
onnx export by @xenova in #1560
What's Changed
- Add OV export CLI to README by @echarlaix in #1526
- Refactor NormalizedConfigs for GQA by @michaelbenayoun in #1539
- Fix model patcher ONNX decoder export by @fxmarty in #1547
- Add AMD to the documentation main page by @mfuntowicz in #1540
- Add Optimum-amd documentation to the PR & release doc by @fxmarty in #1562
- Add amd documentation by @echarlaix in #1557
- Remove
delete_doc_comment
workflows by @regisss in #1565 - optimum-nvidia by @mfuntowicz in #1566
- Update installation instructions in README by @echarlaix in #1568
- Update doc for AMD by @mht-sharma in #1570
- Add amd extra to setup.py by @echarlaix in #1567
New Contributors
Full Changelog: v1.14.0...v1.15.0
v1.14.1: Patch release
- Update optimum-intel required version by @echarlaix in #1521
- Swin2sr onnx by @baskrahmer in #1492
- Fix Falcon ONNX export with alibi by @fxmarty in #1524
- Fix whisper v3 ONNX export by @fxmarty in #1525
- Add new fusion argument to fix compatibility with onnxruntime v1.16.2 by @echarlaix in #1535
- Add depth-estimation w/ DPT+GLPN by @xenova in #1529
v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization
ONNX
New architectures
Falcon
SpeechT5
Mistral
- Add Mistral models ONNX export support by @echarlaix in #1425
TrOCR
LCMs
Enable LCMs (available in in diffusers
since v0.22.0
) ONNX export and ORT inference by @echarlaix in #1469
from optimum.onnxruntime import ORTLatentConsistencyModelPipeline
pipe = ORTLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images
Also enable ONNX export using the CLI :
optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/
Decoder refactorization
- Add position ids as input during ONNX export by @fxmarty in #1381
- Enable the export of only one decoder for decoder-only models by @echarlaix in #1257
GPTQ
- Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in #1419
- Disable exllamav2 for quantization by @SunMarc in #1482
- Default to exllama when exllamav2 is disabled by @SunMarc in #1494
- Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in #1479
- Add support for CPU Inference by @vivekkhandelwal1 in #1496
- Fix minimum version of auto-gptq by @fxmarty in #1504
- switch to exllama_config instead of disabling exllamav2 by @SunMarc in #1505
Other changes and bugfixes
- Fix wrong dtype in the ONNX export by @fxmarty in #1369
- Add support for loading quantization from config by @aarnphm #1363
- Guard multiprocessing set start method by @fxmarty in #1377
- Do not output KV cache when not using
with-past
in the ONNX export by @fxmarty in #1358 - Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
- Fix quantization for onnxruntime v1.16.0 by @echarlaix in #1405
- Fix normalized config key for models architecture by @echarlaix in #1408
- Fix arg in bettertransformer llama attention by @SunMarc in #1421
- Ignore .xml files for Stable Diffusion ORT downloads by @baskrahmer in #1428
- Falcon BetterTransformer requires transformers>=4.34 by @fxmarty in #1431
- Fix llama ONNX export by @fxmarty in #1432
- Update attention.py by @DongHande in #1416
- Remove SharedDDP as it was deprecated from Transformers by @AdamLouly in #1443
- Fix owlvit task detection by @fxmarty in #1453
- Improve ONNX quantization doc by @fxmarty in #1451
- Fix perceiver tests and dummy inputs for ONNX by @baskrahmer in #1449
- Disable bart onnx export for text-classification and question-answering by @fxmarty in #1457
- Fix ONNX exporter library_name by @baskrahmer in #1460
- [ORT Training] Some important updates of ONNX Runtime training APIs by @JingyaHuang in #1335
- Fix typo in BetterTransformer CLIP by @fxmarty in #1468
- Fix custom architecture detection in onnx export by @fxmarty in #1472
- Fix whisper export by @mht-sharma in #1503
- Update Transformers dependency for Habana extra by @regisss in #1508
- Fix argument error by @ranchlai in #1501
- Remove attention mask patching by @fxmarty in #1509
- Fix generation input by @echarlaix in #1512
- Fix tests ORTModel by @fxmarty in #1517
- Fix BT on transformers 4.35 release by @fxmarty in #1518
New Contributors
- @aarnphm made their first contribution in #1363
- @DongHande made their first contribution in #1416
- @AlexKoff88 made their first contribution in #1479
- @vivekkhandelwal1 made their first contribution in #1496
- @ranchlai made their first contribution in #1501
v1.13.3: Patch release
Patch release for transformers==4.34.1
compatibility. We will do a release next week for transformers==4.35
compatibility and new features. Please bear with us!
v1.13.2: Patch release
- Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
- Fix ONNX Runtime quantization compatibility for onnxruntime v1.16.0 by @echarlaix in #1405
v1.13.1: Patch release
Fix ONNX fp16 export that broke in 1.13.0.
What's Changed
v1.13.0: ONNX weight deduplication, ONNX export and ORT extension
Deduplicate Embedding / LM head weight in the ONNX export
Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: pytorch/pytorch#108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.
- Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in #1326
- Fix initializer detection for weight deduplication by @fxmarty in #1333
Extended ONNX Runtime support
ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.
- Pix2Struct onnxruntime support by @krathul in #1296
- Add MPT onnx and ORT support by @jiqing-feng in #1161
- Donut iobinding by @IlyasMoutawwakil in #1209
- Add encoder decoder model by @mht-sharma in #851
Extended ONNX export: MPT, TIMM models, Encoder-Decoder
Additionally, the model SAM is now be default exported as a vision_encoder.onnx, and prompt_encoder_mask_decoder.onnx.
- Add MPT onnx and ORT support by @jiqing-feng in #1161
- Adds ONNX Export Support for Timm Models by @mht-sharma in #965
- Add encoder decoder model by @mht-sharma in #851
- Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately by @fxmarty in #1301
BetterTransformer supports Falcon
- [
BetterTransformer
] Add falcon toBetterTransformer
by @younesbelkada in #1343
Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration
The function exllama_set_max_input_length
from auto-gptq
can now be used with Transformers GPTQ models.
Other changes and bugfixes
-
Update version to 1.12.1.dev0 following release by @fxmarty in #1312
-
Improve BetterTransformer backward compatibility by @fxmarty in #1314
-
fix typo in log message by @AAnirudh07 in #1322
-
Support customize dtype for dummy generators by @JingyaHuang in #1307
-
Fix opset custom onnx export by @mht-sharma in #1331
-
Replace mpt to ernie custom export by @mht-sharma in #1332
-
send both negative prompt embeds to ORT SDXL by @ssube in #1339
-
add vae image processor by @echarlaix in #1219
-
add negative prompt test by @echarlaix in #1347
-
Add GPT BigCode to the BT documentation by @fxmarty in #1356
-
Add text2text-generation-with-past test for encoder-decoder model by @mht-sharma in #1338
-
Fix sentence transformer export by @mht-sharma in #1366
New Contributors
- @krathul made their first contribution in #1296
- @AAnirudh07 made their first contribution in #1322
- @jiqing-feng made their first contribution in #1161
- @ssube made their first contribution in #1339
Full Changelog: v1.12.0...v1.13.0
v1.12.0: AutoGPTQ integration, extended BetterTransformer support
AutoGPTQ integration
Part of AutoGPTQ library has been integrated in Optimum, with utilities to ease the integration in other Hugging Face libraries. Reference: https://huggingface.co/docs/optimum/llm_quantization/usage_guides/quantization
- Add GPTQ Quantization by @SunMarc in #1216
- Fix GPTQ doc by @regisss in #1267
- Add AutoGPTQ benchmark by @fxmarty in #1292
- Fix gptq params by @SunMarc in #1284
Extended BetterTransformer support
BetterTransformer now supports BLOOM and GPT-BigCode architectures.
- Bt bloom by @baskrahmer in #1221
- Support gpt_bigcode in bettertransformer by @fxmarty in #1252
- Fix BetterTransformer starcoder init by @fxmarty in #1254
- Fix BT starcoder fp16 by @fxmarty in #1255
- SDPA dispatches to flash for MQA by @fxmarty in #1259
- Check output_attentions is False in BetterTransformer by @fxmarty in #1306
Other changes and bugfixes
- Update bug report template by @fxmarty in #1266
- Fix ORTModule uses fp32 model issue by @jingyanwangms in #1264
- Fix build PR doc workflow by @fxmarty in #1270
- Avoid triggering stop job on label by @fxmarty in #1274
- Update version following 1.11.1 patch by @fxmarty in #1275
- Fix fp16 ONNX detection for decoder models by @fxmarty in #1276
- Update version following 1.11.2 patch by @regisss in #1291
- Pin tensorflow<=2.12.1 by @fxmarty in #1305
- ONNX: disable text-generation models for sequence classification & fixes for transformers 4.32 by @fxmarty in #1308
- Fix staging tests following transformers 4.32 release by @fxmarty in #1309
- More fixes following transformers 4.32 release by @fxmarty in #1311
New Contributors
- @SunMarc made their first contribution in #1216
- @jingyanwangms made their first contribution in #1264
Full Changelog: v1.11.2...v1.12.0
v1.11.2: Patch release
Remove the Transformers version constraint on optimum[habana]
.
Full Changelog: v1.11.1...v1.11.2