v1.13.0: Stable Diffusion 3, Sentence Transformers, SAM, DETR, Kubernetes example
SynapseAI 1.17
- Upgrade SynapseAI version to 1.17.0 #1217
Transformers 4.43
Diffusers 0.29
Stable Diffusion 3
Training with Sentence Transformers
- Enable Sentence Transformer Trainer with Gaudi #1111 @ZhengHongming888
Model optimizations
- Fix starcoder2 accuracy issue and optimize performance with fused rope #1095 @mandy-li
- Enable FusedRoPE using float32 for gpt-neox model #1104 @yeonsily
- Mamba initial enablement. #1122 @libinta
- Adding fused qkv support along with config #1102 @bhargaveede
- Enhance Qwen2 with fastsoftmax and bf16 RoPE and cache optimization #1087 @Zhiwei35
- Enable fp8 inference for Llava-Next and add Fused_SDPA #1120 @tthakkal
- Support bucket_internal for MPT #1137 @pk1d3v
- Enable Flash Attention (Fused SDPA) for Starcoder #1114 @abhilash1910
- gpt_bigcode: added FusedSDPA kernel #1138 @mgonchar
- Enable torch.compile for Granite20B #1185 @dvarshney-habana
- Refine use cache for mpt model #1158 @Jing1Ling
- GPT-J support reuse_cache #1094 @atakaha
- Use fast softmax only on prefill #1159 @jaygala223
- Starcoder2 : KVCache and flash attention (FusedSDPA) enablement #1149 @abhatkal
- Gpt bigcode fused sdpa #1260 @yeonsily
SAM, FastVIT, VideoMAE, OpenCLIP, DETR, Table Transformer, deciLM
- Add an example of Segment Anything Model [Inference] #814 @cfgfung
- Add an example of FastViT model (Infernece) #826 @cfgfung
- VideoMAE Model Enabling and Examples #922 @pi314ever
- OpenCLIP sample for visual question answering #977 @vidyasiv
- Enabled DETR (Object Detection) model #1046 @cfgfung
- Table transformer enabling #978 @pi314ever
- deciLM support #1133 @sywangyi
Stable Diffusion inpainting, unconditional image generation
- Add the Stable diffusion inpaint support #869 @yuanwu2017
- Enable Unconditional Image Generation on Gaudi 2 [Diffuser/Tasks] #859 @cfgfung
Text feature extraction example
- Feature extraction enabling #994 @pi314ever
Tensor parallelism
- Tensor parallel distributed strategy without using deepspeed #1121 @kalyanjk
- Disable torch.compile for all_reduce when parallel_strategy is set to "tp" #1174 @kalyanjk
Kubernetes cluster example
- Adds a helm chart, dockerfile, and instructions for running examples using a Kubernetes cluster #1099 @dmsuehir
- Fix PyTorch version in the Kubernetes docker-compose to match image #1246 @dmsuehir
FP8 training
- TE FP8 integration #1096 @SanjuCSudhakaran
Other
- Updates run_lora_clm.py with enhanced dataset support #955 @dmsuehir
- Fix prefix tuning finetune issue and update test #975 @sywangyi
- Fix throughput calculation in image-to-text example #1070 @regisss
- SDXL-trainig: fixed ci, changed gated dataset, fixes for non-square datasets #1038 @imangohari1
- Updating batch_size of Albert-XXL in README #1063 @vineethanandh
- Fix the error of running run_pipeline.py of text_generation example #1055 @yuanwu2017
- Add a test for llama finetuning with FP8 precision #1106 @SanjuCSudhakaran
- Beam-search fix #1113 @ssarkar2
- Add chat format support dataset in SFT #1066 @libinta
- Fix nan loss of gemma and crash if dataset_concatenation is not set #1088 @sywangyi
- torch.compile keep input mutation in graph this avoids unnecessary memcpy #1069 @sushildubey171
- Updated langchain text-generation pipeline to work with latest release 0.2.5 #1084 @rbrugaro
- Add the MC example #891 @yuanwu2017
- Fix recompiles if limit_hpu_graph is False #1129 @ssarkar2
- Update examples batchsize in README #1123 @shepark
- Fix OOM error in SDXL Fine-Tuning validation stage #1134 @dsocek
- Added an example code to demonstrate how to use deterministic image generation #878 @cfgfung
- SD image variation/InstructPix2Pix/StableDiffusionXLImg2ImgPipeline pipeline #988 @sywangyi
- Add ci test for trl rewarding and ppo, fix backward failure in ppo caused by rmsfusion #1020 @sywangyi
- Llama adapter #983 @sywangyi
- torch.flip issue is fixed in SynapseAI 1.16, so remove the WA #1092 @sywangyi
- Fix test CausalLanguageModelingLORAExampleTester KeyError #1139 @dmsuehir
- fix(ci): new runs-on #1136 @XciD
- Add trust_remote_code for loading datasets in the audio classification example #1074 @regisss
- Generation example: print number of warmup iterations #1145 @mgonchar
- CI Updates: text-gen to recieve ranks/bs, Updated bs/metric for baselines #1140 @imangohari1
- Support for custom files for run_lora_clm.py #1039 @vidyasiv
- Change the device_id for FSDP plugin #1086 @ckvermaAI
- Set KV Cache update as static method #1160 @ulivne
- To fix CPU tensor issue #1157 @mkumargarg
- Adding missing init.py to mistral and mixtral test package #1188 @rkumar2patel
- Add example of multitask_prompt/poly tuning #915 @sywangyi
- Fix data-type mismatch for mlperf_inference accuracy test #1146 @kalyanjk
- Fix spawn MP context, limit cpu and download data #1131 @polisettyvarma
- T5 multi card #1222 @yafshar
- Add trust_remote_code for t5 poly-tuning test #1220 @yafshar
- Resolve "empty tensor optional" error with hpu_graphs + kv cache for StarCoder #1181 @vidyasiv
- Fix VIT, add wav2vec comment #1223 @ssarkar2
- Roberta tests were running on CPU #1229 @ssarkar2
- Fix bert/roberta contrastive search tests #1226 @skavulya
- Remove the default env variable to trust remote code by default #1225 @yafshar
- Improve style check workflow #1230 @regisss
- Added scheduler selection for SDXL fine-tuning #867 @kplau1128
- Clear help msg for ignore_eos to avoid misunderstanding @sywangyi
- Support loading hugging face checkpoint #1165 @ulivne
- Change triggering event for code style check #1238 @regisss
- gptj: fix missing token_idx #1234 @envsp
- fix(nltk): fixed the version to working one #1247 @imangohari1
- Updating to avoid hardcoding tests in CI framework #1221 @vidyasiv
- Fix FSDP graph error due to Tranformer 4.43 update #1251 @jiminha
- Fix SD README commands #1250 @imangohari1
- Fix spelling errors #1252 @changwangss
- Set HLS_MODULE_ID only if it wasn't set previously #1254 @astachowiczhabana
- Fix overflow of steps in SDXL for default diffusers scheduler @dsocek
- fix(test_diffusers): automated the checking for tests without upstream HF #1232 @imangohari1
- fix(nltk): Revert 1247. Updated the version. added the punkt_tab download #1258 @imangohari1
- Set input_embeds before it gets used #1261 @tthakkal
- Update README and more changes, rebase to main #1259 @shepark
Known limitations
- For Llama, some big batch sizes lead to out-of-memory errors whereas they used to work