v1.7: Llama 2, Falcon, LoRA, Transformers v4.31, SynapseAI v1.11
Transformers v4.31
Transformers v4.31 (latest stable release) is fully supported.
SynapseAI v1.11
SynapseAI v1.11 (latest stable release) is fully supported.
Optimizations for Llama 2, Falcon, StarCoder, OPT, GPT-NeoX, CodeGen
- Added support for OPT-66B #285 @ZhaiFeiyue
- Llama #296 @yeonsily
- Improve Llama2 and gpt_neox performance with Habana fused RoPE and RMSNorm #321 @mandy-li
- Enable Falcon-7b #326 @schoi-habana
- Fix inference with Llama-2-70B #342 @regisss
- Add model optimizations for codegen and gpt_bigcode #322 @PhillipHoward
Torch Autocast
Torch Autocast is becoming the default for managing mixed-precision runs.
- Fix autocast for BERT-like models #287 @ANSHUMAN87
- Add support for autocast in gradient checkpointing #307 @regisss
Improved text-generation example
- Added constrained beam search #281 @vivekgoe
- Fix padding error #282 @sywangyi
- Various improvements for faster checkpoint downloading #284 #286 #294 @regisss
- Add deepspeed TP policy for llama #303 @sywangyi
- Add token and model_revision args for the text-generation example #331 @regisss
LoRA examples
Two new LoRA examples for fine-tuning and inference.
LDM3D
New Stable Diffusion pipeline that enables to generate images and depth maps.
- Support for Ldm3d #304 @estelleafl
Added support for Text Generation Inference (TGI)
TGI is now supported on Gaudi.
GaudiGenerationConfig
Transformers' GenerationConfig
has been extended to be fully compatible with Gaudi. It adds two fields to better control generation with static shapes.
Various fixes and improvements
- Fix generation sampling when using
repetition_penalty
#301 @sywangyi - Remove kv cache wa #302 @ZhaiFeiyue
- Fix T5 inference performance regression #310 @libinta
- Fix gptj HCCL issue occured in DDP #318 @sywangyi
- Revert partially Enable/Optimize flan t5 xxl on deepspeed z3 #320 @hsubramony
- Modify flan-t5 deepspeed configuration #328 @yeonsily
- Add commands for gptj and gptneox #325 @ankurhabana
- Disable FusedRMSNorm for training #343 @hsubramony
- Enable hpu rms fused kernel for t5 #344 @ZhaiFeiyue
- Remove two workarounds on esmfold #334 @bzhu-habana