Skip to content

Releases: huggingface/optimum-neuron

v0.0.27: Qwen2 models, Neuron SDK 2.20.2

13 Dec 10:00
Compare
Choose a tag to compare

What's Changed

  • Add support for Qwen2 models (#746)
  • bump Neuron SDK to 2.20.2 (#743)
  • NeuronX TGI: bump router version to 3.0.0 (#748)

Bug fixes

  • training: Fixes consolidation issue when TP is enabled (#739)
  • inference: Fix t5 decoder compilation error since Neuron sdk 2.20 (#732)

Full Changelog: v0.0.26...v0.0.27

v0.0.26: T5 parallel support and new NeuronORPOTrainer

15 Nov 15:54
Compare
Choose a tag to compare

What's Changed

Inference

  • refactoring Diffusers pipelines (#711)
  • Add tensor parallel support to T5 via NxD (#697)

Training

  • support resizing embeddings (#670)
  • NeuronORPOTrainer (#719)

Bug fixes

  • update TGI error message (#659)
  • fix errors in vision/audio models docstring (#714)
  • fix wrong inputs/model placement when using a single core (#725)
  • fix model checkpoint saving issue when using PEFT (#727)
  • fix non contiguous tensors in consolidation (#736)

Full Changelog: v0.0.25...v0.0.26

v0.0.25: SFT Trainer, Llama 3.1-3.2, ControlNet, AWS Neuron SDK 2.20

01 Oct 09:49
Compare
Choose a tag to compare

What's Changed

Inference

Training

Full Changelog: v0.0.24...v0.0.25

v0.0.24: PEFT training support, ControlNet, InstructPix2Pix, Audio models, TGI benchmarks

12 Aug 11:41
Compare
Choose a tag to compare

What's Changed

Training

Inference

TGI

Other changes

New Contributors

Full Changelog: v0.0.23...v0.0.24

v0.0.23: Bump transformers and optimum version

31 May 10:09
Compare
Choose a tag to compare

What's Changed

  • bump required packages versions: transformers==4.41.1, accelerate==0.29.2, optimum==1.20.*

Inference

TGI

  • Fix excessive CPU memory consumption on TGI startup by @dacorvo in #595
  • Avoid clearing all pending requests on early user cancellations by @dacorvo in #609
  • Include tokenizer during export and simplify deployment by @dacorvo in #610

Training

  • Performance improvements and neuron_parallel_compile and gradient checkpointing fixes by @michaelbenayoun in #602

New Contributors

Full Changelog: v0.0.22...v0.0.23

v0.0.22: Mixtral support, pipeline for sentence transformers, compatibility with Compel

07 May 16:51
Compare
Choose a tag to compare

What's Changed

Training

Inference

TGI

  • Set up TGI environment values with the ones used to build the model by @oOraph in #529
  • TGI benchmark with llmperf by @dacorvo in #564
  • Improve tgi env wrapper for neuron by @oOraph in #589

Caveat

Currently traced models with inline_weights_to_neff=False have higher than expected latency during the inference. This is due to the weights are not automatically moved to Neuron devices. The issue will be fixed in #584, please avoid setting inline_weights_to_neff=False in this release.

Other changes

New Contributors

Full Changelog: v0.0.21...v0.0.22

v0.0.21: Expand caching support for inference, GQA training support, TGI improved performance

09 Apr 08:46
Compare
Choose a tag to compare

What's Changed

Training

  • Add GQA optimization for Tensor Parallel training to support the case tp_size > num_key_value_heads by @michaelbenayoun in #498
  • Mixed-precision training with both torch_xla or torch.autocast by @michaelbenayoun in #523

Inference

  • Add caching support for traced TorchScript models (eg. encoders, stable diffusion models) by @JingyaHuang in #510
  • Support phi model on feature-extraction, text-classification, token-classification tasks by @JingyaHuang in #509

TGI

Caveat

AWS Neuron SDK 2.18 doesn't support the compilation of SDXL's unet with weights / neff separation, inline_weights_to_neff=True is forced through:

  • Disable weights / neff separation of SDXL's UNET for neuron sdk 2.18 by @JingyaHuang in #554

Other changes

New Contributors

Full Changelog: v0.0.20...v0.0.21

v0.0.20: Multi-node training, SD Lora, sentence transformers clip, TGI improvements

07 Mar 10:14
Compare
Choose a tag to compare

What's Changed

Training

TGI

  • optimize continuous batching and improve export (#506)

Inference

Doc

Bug fixes

  • inference cache: omit irrelevant config parameters in lookup dy @dacorvo (#494)
  • Optimize disk usage when fetching model checkpoints by @dacorvo (#505)

Full Changelog: v0.0.19...v0.0.20

v0.0.19: AWS Neuron SDK 2.17.0, training cache system, TGI improved batching

19 Feb 15:48
Compare
Choose a tag to compare

What's Changed

Training

TGI

  • Support higher batch sizes using transformers-neuronx continuous batching by @dacorvo in #488
  • Lift max-concurrent-request limitation usingTGI 1.4.1 by @dacorvo in #488

AMI

Major bugfixes

Other changes

New Contributors

Full Changelog: v0.0.18...v0.0.19

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

01 Feb 10:18
Compare
Choose a tag to compare

What's Changed

AWS SDK

  • Use AWS Neuron SDK 2.16.1 (#449)

Inference

  • Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
  • Allow exporting decoder models using optimum-cli by @dacorvo (#422)
  • Add Neuron X cache registry by @dacorvo (#442)
  • Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)

Training

TGI

  • TGI: support vanilla transformer models whose configuration is cached by @dacorvo (#445)

Tutorials and doc improvement

Major bugfixes

  • TGI: correctly identify special tokens during generation by @dacorvo (#438)
  • TGI: do not include the input_text in generated text by @dacorvo (#454)

Other changes

New Contributors

Full Changelog: v0.0.17...v0.0.18