Releases: huggingface/tgi-gaudi
Releases · huggingface/tgi-gaudi
v2.3.1: SynapseAI v1.19.0
SynapseAI v1.19.0
The codebase is validated with SynapseAI 1.19.0 and optimum-habana 1.15.0.
Tested models and configurations
Model | BF16 | FP8 | ||
---|---|---|---|---|
Single Card | Multi-Card | Single Card | Multi-Card | |
Llama2-7B | ✔ | ✔ | ✔ | ✔ |
Llama2-70B | ✔ | ✔ | ||
Llama3-8B | ✔ | ✔ | ✔ | ✔ |
Llama3-70B | ✔ | ✔ | ||
Llama3.1-8B | ✔ | ✔ | ✔ | ✔ |
Llama3.1-70B | ✔ | ✔ | ||
CodeLlama-13B | ✔ | ✔ | ✔ | ✔ |
Mixtral-8x7B | ✔ | ✔ | ✔ | ✔ |
Mistral-7B | ✔ | ✔ | ✔ | ✔ |
Falcon-180B | ✔ | ✔ | ||
Qwen2-72B | ✔ | ✔ | ||
Starcoder2-3b | ✔ | ✔ | ✔ | |
Starcoder2-15b | ✔ | ✔ | ✔ | |
Starcoder | ✔ | ✔ | ✔ | ✔ |
Gemma-7b | ✔ | ✔ | ✔ | ✔ |
Llava-v1.6-Mistral-7B | ✔ | ✔ | ✔ | ✔ |
Full Changelog: v2.0.6...v2.3.1
v2.0.6: SynapseAI v1.18.0
SynapseAI v1.18.0
The codebase is validated with SynapseAI 1.18.0 and optimum-habana 1.14.1.
Tested models and configurations
Model | BF16 | FP8 | ||
---|---|---|---|---|
Single Card | Multi-Card | Single Card | Multi-Card | |
Llama2-7B | ✔ | ✔ | ✔ | ✔ |
Llama2-70B | ✔ | ✔ | ||
Llama3-8B | ✔ | ✔ | ✔ | ✔ |
Llama3-70B | ✔ | ✔ | ||
Llama3.1-8B | ✔ | ✔ | ✔ | ✔ |
Llama3.1-70B | ✔ | ✔ | ||
CodeLlama-13B | ✔ | ✔ | ✔ | ✔ |
Mixtral-8x7B | ✔ | ✔ | ✔ | ✔ |
Mistral-7B | ✔ | ✔ | ✔ | ✔ |
Falcon-180B | ✔ | ✔ | ||
Qwen2-72B | ✔ | ✔ | ||
Starcoder2-3b | ✔ | ✔ | ✔ | |
Starcoder2-15b | ✔ | ✔ | ✔ | |
Starcoder | ✔ | ✔ | ✔ | ✔ |
Gemma-7b | ✔ | ✔ | ✔ | ✔ |
Llava-v1.6-Mistral-7B | ✔ | ✔ | ✔ | ✔ |
Full Changelog: v2.0.5...v2.0.6
v2.0.5: Llava multi-card support
Llava multi-card support
- Only Apply the TP in language_model #219 @yuanwu2017
- Llava-next: Added flash_attention_recompute option #220 @tthakkal
- Update README.md with changes related to LLava-next multi card support #221 @tthakkal
- Upgrade to Optimum Habana v1.13.2 #222 @regisss
Tested models and configurations
Model | BF16 | FP8 | Single Card | Multi-Cards |
---|---|---|---|---|
Llama2-7B | ✔ | ✔ | ✔ | ✔ |
Llama2-70B | ✔ | ✔ | ✔ | |
Llama3-8B | ✔ | ✔ | ✔ | ✔ |
Llama3-70B | ✔ | ✔ | ✔ | |
Llama3.1-8B | ✔ | ✔ | ✔ | ✔ |
Llama3.1-70B | ✔ | ✔ | ✔ | |
CodeLlama-13B | ✔ | ✔ | ✔ | |
Mixtral-8x7B | ✔ | ✔ | ✔ | ✔ |
Mistral-7B | ✔ | ✔ | ✔ | ✔ |
Llava-v1.6-Mistral-7B | ✔ | ✔ | ✔ | ✔ |
Full Changelog: v2.0.4...v2.0.5
v2.0.4: SynapseAI v1.17.0
SynapseAI v1.17.0
The codebase is validated with SynapseAI 1.17.0 and optimum-habana 1.13.1.
Tested models and configurations
Model | BF16 | FP8 | Single Card | Multi-Cards |
---|---|---|---|---|
Llama2-7B | ✔ | ✔ | ✔ | ✔ |
Llama2-70B | ✔ | ✔ | ✔ | |
Llama3-8B | ✔ | ✔ | ✔ | ✔ |
Llama3-70B | ✔ | ✔ | ✔ | |
Llama3.1-8B | ✔ | ✔ | ✔ | ✔ |
Llama3.1-70B | ✔ | ✔ | ✔ | |
CodeLlama-13B | ✔ | ✔ | ✔ | |
Mixtral-8x7B | ✔ | ✔ | ✔ | ✔ |
Mistral-7B | ✔ | ✔ | ✔ | ✔ |
Llava-v1.6-Mistral-7B | ✔ | ✔ | ✔ |
Highlights
- Added support for vision-language models
Full Changelog: v2.0.1...v2.0.4
v2.0.1: SynapseAI v1.16.0
SynapseAI v1.16.0
The codebase is validated with SynapseAI 1.16.0 and optimum-habana 1.12.0.
Tested configurations
- LLama2 7B BF16 / FP8 on 1xGaudi2
- LLama2 70B BF16 / FP8 on 8xGaudi2
- Falcon 180B BF16 / FP8 on 8xGaudi2
- Mistral 7B BF16 / FP8 on 1xGaudi2
- Mixtral 8x7B BF16 / FP8 on 1xGaudi2
Highlights
- Add support for grammar feature
- Add support for Habana Flash Attention
Full Changelog: v2.0.0...v2.0.1
v2.0.0: SynapseAI v1.15.0
SynapseAI v1.15.0
The codebase is validated with SynapseAI 1.15.0 and optimum-habana 1.11.1.
Tested configurations
- LLama2 70B BF16 / FP8 on 8xGaudi2
Highlights
- Add support for FP8 precision
Full Changelog: v1.2.1...v2.0.0
v1.2.1: SynapseAI v1.14.0
SynapseAI v1.14
The codebase is validated with SynapseAI 1.14.0 and optimum-habana 1.10.4.
Tested configuration
- LLama2 70B BF16 on 8xGaudi2
Highlights
- Add support for continuous batching on Intel Gaudi
- Add batch size bucketing
- Add sequence bucketing for prefill operation
- Optimize concatenate operation
- Add speculative scheduling
Full Changelog: v1.2.0...v1.2.1