Releases: huggingface/optimum-neuron
v0.0.27: Qwen2 models, Neuron SDK 2.20.2
What's Changed
- Add support for Qwen2 models (#746)
- bump Neuron SDK to 2.20.2 (#743)
- NeuronX TGI: bump router version to 3.0.0 (#748)
Bug fixes
- training: Fixes consolidation issue when TP is enabled (#739)
- inference: Fix t5 decoder compilation error since Neuron sdk 2.20 (#732)
Full Changelog: v0.0.26...v0.0.27
v0.0.26: T5 parallel support and new NeuronORPOTrainer
What's Changed
Inference
Training
Bug fixes
- update TGI error message (#659)
- fix errors in vision/audio models docstring (#714)
- fix wrong inputs/model placement when using a single core (#725)
- fix model checkpoint saving issue when using PEFT (#727)
- fix non contiguous tensors in consolidation (#736)
Full Changelog: v0.0.25...v0.0.26
v0.0.25: SFT Trainer, Llama 3.1-3.2, ControlNet, AWS Neuron SDK 2.20
What's Changed
- Use AWS Neuron SDK 2.20 (#696) by @dacorvo
- Bump
optimum
to 1.22 (#686) by @JingyaHuang - Bump
transformers
to 4.43.2 (#665) by @dacorvo
Inference
- Add support for multiple ControlNet (#691) by @JingyaHuang
- Add ControlNet support for SDXL (#675) by @JingyaHuang
Training
- Support SFTTrainer (#682) by @michaelbenayoun
- LoRA finetuning tutorial (#671) by @michaelbenayoun
Full Changelog: v0.0.24...v0.0.25
v0.0.24: PEFT training support, ControlNet, InstructPix2Pix, Audio models, TGI benchmarks
What's Changed
Training
- Initial PEFT support by @michaelbenayoun in #612
- PEFT + TP support by @michaelbenayoun in #620
- Fix MPMD detected error during training with TP by @michaelbenayoun in #648
Inference
- Add Stable Diffusion ControlNet support by @JingyaHuang in #622
- Add InstructPix2Pix pipeline support. by @asntr in #625
- Add ViT export support and image classification by @JingyaHuang in #616
- Add wav2vec2 support - export and audio tasks modeling by @JingyaHuang in #645
- Add more audio models: ast, hubert, unispeech, unispeech-sat, wavlm by @JingyaHuang in #651
TGI
- Extending TGI benchmarking and documentation by @jimburtoft in #621
- Add support for TGI truncate parameter by @dacorvo in #647
Other changes
- enable unequal height and width by @yahavb in #592
- Skip invalid gen config by @dacorvo in #618
- Deprecate resume_download by @Wauplin in #586
- Remove a line non-intentionally merged by @JingyaHuang in #628
- Add secrets scanning workflow by @mfuntowicz in #631
- fix bad link to distributed-training how-to guide in optimum-neuron docs by @aws-amj in #627
- Do not copy local checkpoint by @dacorvo in #630
- Make neuron_cc_optlevel
None
by default by @michaelbenayoun in #632 - Remove print by @michaelbenayoun in #633
- Set bf16 to true when needed by @michaelbenayoun in #635
- Fix gradient checkpointing with PEFT by @michaelbenayoun in #634
- Refactor decoder tests by @dacorvo in #641
- CI cache builder by @dacorvo in #642
- Restore optimized attention score for sd15 & fix the generated images quality issue by @JingyaHuang in #646
- Add and remove some mark steps by @michaelbenayoun in #644
- Fix consolidation for TP by @michaelbenayoun in #649
- Fix spelling in error message by @jimburtoft in #656
- Update docs by @michaelbenayoun in #588
- Fixes NxDPPModel for Neuron SDK 2.19 by @michaelbenayoun in #663
- Various fixes for training by @michaelbenayoun in #654
- migrate ci by @XciD in #662
- ci: fix inference cache pipeline by @dacorvo in #667
- broken link by @pagezyhf in #669
- Bump TGI version and fix bugs by @dacorvo in #666
New Contributors
- @mfuntowicz made their first contribution in #631
- @aws-amj made their first contribution in #627
- @asntr made their first contribution in #625
- @XciD made their first contribution in #662
Full Changelog: v0.0.23...v0.0.24
v0.0.23: Bump transformers and optimum version
What's Changed
- bump required packages versions:
transformers==4.41.1
,accelerate==0.29.2
,optimum==1.20.*
Inference
- Fix diffusion caching by @oOraph in #594
- Fix inference latency issue when weights/neff are separated by @JingyaHuang in #584
- Enable caching for inlined models by @JingyaHuang in #604
- Patch attention score far off issue for sd 1.5 by @JingyaHuang in #611
TGI
- Fix excessive CPU memory consumption on TGI startup by @dacorvo in #595
- Avoid clearing all pending requests on early user cancellations by @dacorvo in #609
- Include tokenizer during export and simplify deployment by @dacorvo in #610
Training
- Performance improvements and neuron_parallel_compile and gradient checkpointing fixes by @michaelbenayoun in #602
New Contributors
Full Changelog: v0.0.22...v0.0.23
v0.0.22: Mixtral support, pipeline for sentence transformers, compatibility with Compel
What's Changed
Training
- Integrate new API for saving and loading with
neuronx_distributed
by @michaelbenayoun in #560
Inference
- Add support for Mixtral by @dacorvo in #569
- Improve Llama models performance by @dacorvo in #587
- Make Stable Diffusion pipelines compatible with compel by @JingyaHuang and @neo in #581 (with tests inspired by the snippets sent from @Suprhimp)
- Add
SentenceTransformers
support topipeline
forfeature-extration
by @philschmid in #583 - Allow download subfolder for caching models with subfolder by @JingyaHuang in #566
- Do not split decoder checkpoint files by @dacorvo in #567
TGI
- Set up TGI environment values with the ones used to build the model by @oOraph in #529
- TGI benchmark with llmperf by @dacorvo in #564
- Improve tgi env wrapper for neuron by @oOraph in #589
Caveat
Currently traced models with inline_weights_to_neff=False
have higher than expected latency during the inference. This is due to the weights are not automatically moved to Neuron devices. The issue will be fixed in #584, please avoid setting inline_weights_to_neff=False
in this release.
Other changes
- Improve installation guide by @JingyaHuang in #559
- upgrade optimum and then install optimum-neuron by @shub-kris in #533
- Cleanup obsolete code by @michaelbenayoun in #555
- Extend TGI integration tests by @dacorvo in #561
- Modify benchmarks by @dacorvo in #563
- Bump PyTorch to 2.1 by @JingyaHuang in #502
- fix(decoder): specify libraryname to suppress warning by @dacorvo in #570
- missing \ in quickstart inference guide by @yahavb in #574
- Use AWS 2.18.0 AMI as base by @dacorvo in #572
- Update TGI router version to 2.0.1 by @dacorvo in #577
- Add guide for LoRA adapters by @JingyaHuang in #582
- eos_token_id can be a list in configs by @dacorvo in #580
- Ease the tests when there is no hf token by @JingyaHuang in #585
- Change inline weights to Neff default value to True by @JingyaHuang in #590
New Contributors
Full Changelog: v0.0.21...v0.0.22
v0.0.21: Expand caching support for inference, GQA training support, TGI improved performance
What's Changed
Training
- Add GQA optimization for Tensor Parallel training to support the case
tp_size > num_key_value_heads
by @michaelbenayoun in #498 - Mixed-precision training with both
torch_xla
ortorch.autocast
by @michaelbenayoun in #523
Inference
- Add caching support for traced TorchScript models (eg. encoders, stable diffusion models) by @JingyaHuang in #510
- Support phi model on feature-extraction, text-classification, token-classification tasks by @JingyaHuang in #509
TGI
Caveat
AWS Neuron SDK 2.18 doesn't support the compilation of SDXL's unet with weights / neff separation, inline_weights_to_neff=True
is forced through:
- Disable weights / neff separation of SDXL's UNET for neuron sdk 2.18 by @JingyaHuang in #554
Other changes
- Fix/ami authorized keys by @shub-kris in #517
- Skip weight load during parallel compile by @michaelbenayoun in #524
- fixing format in getting-started.ipynb by @jimburtoft in #526
- Removing colab links in notebooks.mdx by @jimburtoft in #525
- ADD stale bot by @philschmid in #530
- Bump optimum version by @JingyaHuang in #534
- Fix style by @JingyaHuang in #538
- Fix GQA permutation computation and sequential weight initialization / loading when doing TP by @michaelbenayoun in #531
- Add setup runtime step for K8S by @glegendre01 in #541
- Disable logging during precompilation by @michaelbenayoun in #539
- Do not use deprecated list_files_info by @Wauplin in #536
- Adding link to existing Fine-tuning example in Notebooks by @jimburtoft in #527
- Add missing notebooks to doc by @JingyaHuang in #543
- fix: bug in get_available_cores within container by @oOraph in #546
- Init on the
xla
device by @michaelbenayoun in #521 - Adding CodeLlama-7B inference and compilation example notebook by @jimburtoft in #549
- Add tools for auto filling traced models cache by @JingyaHuang in #537
- Remove print that should not be there by @michaelbenayoun in #552
- Use AWS Neuron sdk 2.18 by @dacorvo in #547
- Cache utils related cleanup by @michaelbenayoun in #553
New Contributors
- @glegendre01 made their first contribution in #541
- @Wauplin made their first contribution in #536
Full Changelog: v0.0.20...v0.0.21
v0.0.20: Multi-node training, SD Lora, sentence transformers clip, TGI improvements
What's Changed
Training
- Multi-node training support by @michaelbenayoun (#440)
TGI
- optimize continuous batching and improve export (#506)
Inference
- Add Lora support to stable diffusion by @JingyaHuang (#483)
- Support sentence transformers clip by @JingyaHuang (#495)
- Inference compile cache script by @philschmid and @dacorvo (#496, #504)
Doc
- Update Inference supported models list by @JingyaHuang (#501)
Bug fixes
- inference cache: omit irrelevant config parameters in lookup dy @dacorvo (#494)
- Optimize disk usage when fetching model checkpoints by @dacorvo (#505)
Full Changelog: v0.0.19...v0.0.20
v0.0.19: AWS Neuron SDK 2.17.0, training cache system, TGI improved batching
What's Changed
Training
- Integrate new cache system for training by @michaelbenayoun in #472
TGI
- Support higher batch sizes using transformers-neuronx continuous batching by @dacorvo in #488
- Lift max-concurrent-request limitation usingTGI 1.4.1 by @dacorvo in #488
AMI
- Add packer support for building AWS AMI by @shub-kris in #441
- [AMI] Updates base ami to new id by @philschmid in #482
Major bugfixes
- Fix sdxl inpaint pipeline for diffusers 0.26.* by @JingyaHuang in #458
- TGI: update to controller version 1.4.0 & bug fixes by @dacorvo in #470
- Fix optimum-cli export for inf1 by @JingyaHuang in #474
Other changes
- Add TGI tests and CI workflow by @dacorvo in #355
- Bump to optimum 1.17 - Adapt to optimum exporter refactoring by @JingyaHuang in #414
- [Training] Support for Transformers 4.37 by @michaelbenayoun in #459
- Add contribution guide for Neuron exporter by @JingyaHuang in #461
- Fix path, update versions by @shub-kris in #462
- Add issue and PR templates & build optimum env cli for Neuron by @JingyaHuang in #463
- Fix trigger for actions by @philschmid in #468
- TGI: bump rust version by @dacorvo in #477
- [documentation] Add Container overview page. by @philschmid in #481
- Bump to Neuron sdk 2.17.0 by @JingyaHuang in #487
New Contributors
- @shub-kris made their first contribution in #441
Full Changelog: v0.0.18...v0.0.19
v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training
What's Changed
AWS SDK
- Use AWS Neuron SDK 2.16.1 (#449)
Inference
- Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
- Allow exporting decoder models using optimum-cli by @dacorvo (#422)
- Add Neuron X cache registry by @dacorvo (#442)
- Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)
Training
- Initial support for pipeline parallelism by @michaelbenayoun (#279)
TGI
Tutorials and doc improvement
- Various fixes by @jimburtoft @michaelbenayoun @JingyaHuang (#428 #429 #432)
- Improve Stable Diffusion Notebooks by @JingyaHuang (#431)
- Add Sentence Transformers Guide and Notebook by @philschmid (#434)
- Add benchmark section by @dacorvo (#435)
Major bugfixes
- TGI: correctly identify special tokens during generation by @dacorvo (#438)
- TGI: do not include the input_text in generated text by @dacorvo (#454)
Other changes
- API change to be compatible to Optimum by @JingyaHuang (#421)
New Contributors
- @jimburtoft made their first contribution in #432
Full Changelog: v0.0.17...v0.0.18