Release v1.7: Llama 2, Falcon, LoRA, Transformers v4.31, SynapseAI v1.11 · huggingface/optimum-habana

Transformers v4.31

Transformers v4.31 (latest stable release) is fully supported.

Upgrade to Transformers v4.31 #312 @regisss

SynapseAI v1.11

SynapseAI v1.11 (latest stable release) is fully supported.

Upgrade to Synapse 1.11 #333 @regisss

Optimizations for Llama 2, Falcon, StarCoder, OPT, GPT-NeoX, CodeGen

Added support for OPT-66B #285 @ZhaiFeiyue
Llama #296 @yeonsily
Improve Llama2 and gpt_neox performance with Habana fused RoPE and RMSNorm #321 @mandy-li
Enable Falcon-7b #326 @schoi-habana
Fix inference with Llama-2-70B #342 @regisss
Add model optimizations for codegen and gpt_bigcode #322 @PhillipHoward

Torch Autocast

⚠️ Habana Mixed Precision is deprecated and will be removed in SynapseAI v1.12.
Torch Autocast is becoming the default for managing mixed-precision runs.

Fix autocast for BERT-like models #287 @ANSHUMAN87
Add support for autocast in gradient checkpointing #307 @regisss

Improved text-generation example

Added constrained beam search #281 @vivekgoe
Fix padding error #282 @sywangyi
Various improvements for faster checkpoint downloading #284 #286 #294 @regisss
Add deepspeed TP policy for llama #303 @sywangyi
Add token and model_revision args for the text-generation example #331 @regisss

LoRA examples

Two new LoRA examples for fine-tuning and inference.

Add lora example for clm and text generation #305 @sywangyi

LDM3D

New Stable Diffusion pipeline that enables to generate images and depth maps.

Support for Ldm3d #304 @estelleafl

Added support for Text Generation Inference (TGI)

TGI is now supported on Gaudi.

Add support for TGI on Gaudi #297 @regisss

`GaudiGenerationConfig`

Transformers' GenerationConfig has been extended to be fully compatible with Gaudi. It adds two fields to better control generation with static shapes.

Add GaudiGenerationConfig #293 @regisss

Various fixes and improvements

Fix generation sampling when using repetition_penalty #301 @sywangyi
Remove kv cache wa #302 @ZhaiFeiyue
Fix T5 inference performance regression #310 @libinta
Fix gptj HCCL issue occured in DDP #318 @sywangyi
Revert partially Enable/Optimize flan t5 xxl on deepspeed z3 #320 @hsubramony
Modify flan-t5 deepspeed configuration #328 @yeonsily
Add commands for gptj and gptneox #325 @ankurhabana
Disable FusedRMSNorm for training #343 @hsubramony
Enable hpu rms fused kernel for t5 #344 @ZhaiFeiyue
Remove two workarounds on esmfold #334 @bzhu-habana

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.7: Llama 2, Falcon, LoRA, Transformers v4.31, SynapseAI v1.11