Releases · huggingface/optimum-tpu

19 Dec 10:43

v0.2.3

060b709

v0.2.3 Latest

Latest

Holidays season release! 🎄
This Optimum TPU release comes in with a larger support for models, in particular newer Llamas 🦙 for serving and fine-tuning, as well as initial support for the all recent TPU v6e and few fixes here and there.

What's Changed

fix(ci): correct TGI_VERSION definition in workflow by @tengomucho in #122
Fix nightlies again by @tengomucho in #123
⚙️ Fix Integration Test for TGI by @baptistecolle in #124
🔂 Fix repetition penalty by @tengomucho in #125
Allow sharding fine tuned misaligned models by @tengomucho in #126
🦙 Newer Llamas support by @tengomucho in #129
🦙 Add llama fine-tuning notebook example by @baptistecolle in #130
doc(v6e): mention initial v6e support by @tengomucho in #131
⚙️ Refactor TGI Dockerfile to support Google-Cloud-Containers as a target by @baptistecolle in #127
🐛 Fix the convergence of loss function for the llama fine tuning example by @baptistecolle in #132
chore: update version to v0.2.3 by @tengomucho in #133

Full Changelog: v0.2.1...v0.2.3

Contributors

tengomucho and baptistecolle

Assets 2

29 Nov 15:59

tengomucho

v0.2.1

a1919c2

v0.2.1

This is a release to further simplify TGI usage with Jetstream, making it the default backend, and correcting an environment variable usage. Finally, dependencies are updated to guarantee we use the latest features of the frameworks we are based on.

What's Changed

fix(test): correct nightly tests with random sampling by @tengomucho in #117
Jetstream by default by @tengomucho in #118
🧹 Cleanup of the batch size environment variables by @baptistecolle in #121
⬆️ Update dependencies by @tengomucho in #120

Full Changelog: v0.2.0...v0.2.1

Contributors

tengomucho and baptistecolle

Assets 2

20 Nov 13:06

tengomucho

v0.2.0

1fc59ce

v0.2.0

This is the first release of Optimum TPU that includes support for Jetstream Pytorch engine as backend for Test Generation Inference (TGI).
JetStream is a throughput and memory optimized engine for LLM inference on TPUs, and its Pytorch implementation allows for a seamless integration in the TGI code. The supported models (for now Llama 2 and Llama 3, Gemma 1 and Mixtral, and serving inference on these models resulted has given results close to 10x in terms of tokens/sec compared to the previously used backend (Pytorch XLA/transformers).
On top of that, it is possible to use quantization to serve using even less resources while maintaining a similar throughput and quality.
Details follow.

What's Changed

Update colab examples by @wenxindongwork in #86
ci(docker): update torch-xla to 2.4.0 by @tengomucho in #89
✈️ Introduce Jetstream/Pytorch in TGI by @tengomucho in #88
🦙 Llama3 on TGI - Jetstream Pytorch by @tengomucho in #90
☝️ Update Jetstream Pytorch revision by @tengomucho in #91
Correct extra token, start preparing docker image for TGI/Jetstream Pt by @tengomucho in #93
Fix generation using Jetstream Pytorch by @tengomucho in #94
Fix slow tests by @tengomucho in #95
🧹 Cleanup and fixes for TGI by @tengomucho in #96
Small TGI enhancements by @tengomucho in #97
fix(TGI Jetstream Pt): prefill should be done with max input size by @tengomucho in #98
💎 Gemma on TGI Jetstream Pytorch by @tengomucho in #99
Fix ci nightly jetstream by @tengomucho in #101
CI ephemeral TPUs by @tengomucho in #102
🍃 Added Mixtral on TGI / Jetstream Pytorch by @tengomucho in #103
Add CLI to install dependencies by @tengomucho in #104
⛰ CI: mount hub cache and fix issues with cli by @tengomucho in #106
fix(docker): correct jetstream installation in TGI docker image by @tengomucho in #107
✏️ docs: Add training guide and improve documentation consistency by @baptistecolle in #110
Quantization Jetstream Pytorch by @tengomucho in #111
fix: graceful shutdown was not working with entrypoint, exec launcher by @co42 in #112
fix(doc): correct link to deploy page by @tengomucho in #115
More Jetstream Pytorch fixes, prepare for release by @tengomucho in #116

New Contributors

@wenxindongwork made their first contribution in #86
@baptistecolle made their first contribution in #110
@co42 made their first contribution in #112

Full Changelog: v0.1.5...v0.2.0

Contributors

co42, tengomucho, and 2 other contributors

Assets 2

08 Aug 14:56

tengomucho

v0.1.5

426d7be

v0.1.5

This release is essentially the same as the previous one (v0.1.4), but it allows correct PyPI package publication.

Assets 2

23 Jul 16:20

tengomucho

v0.1.4

7f5b0cc

v0.1.4

These changes focus on improving support for instruct models and solve an issue appearing when using those models through the web ui interface with invalid settings.

What's Changed

Fix secret leak workflow by @tengomucho in #72
Handle selector exception by @tengomucho in #73
chore(tgi): update TGI base image by @tengomucho in #75
Fix instruct models UI issue by @tengomucho in #78

Full Changelog: v0.1.3...v0.1.4

Contributors

tengomucho

Assets 2

09 Jul 10:31

tengomucho

v0.1.3

e09a66b

v0.1.3

Cleanup of previous fixed and lower batch size to prevent memory issues on Inference Endpoints with some models.

What's Changed

Few more Inference Endpoints fixes by @tengomucho in #69
feat(cache): use optimized StaticCache class for XLA by @tengomucho in #70
Lower TGI IE batch size by @tengomucho in #71

Full Changelog: v0.1.2...v0.1.3

Contributors

tengomucho

Assets 2

08 Jul 08:31

tengomucho

v0.1.2

fd29591

v0.1.2

What's Changed

This Release contains only few small fixes, mainly for Inference Endpoints.

Several Inference Endpoint fixes by @tengomucho in #66
More Inference Endpoints features and fixes by @tengomucho in #68

Full Changelog: v0.1.1...v0.1.2

Contributors

tengomucho

Assets 2

25 Jun 14:20

tengomucho

v0.1.1

7050cf4

v0.1.1

TPU first release, allowing to have TPU Text Generation Inference and Inference Endpoints container images available.

What's Changed

Basic TGI server on XLA by @tengomucho in #1
Enable CI/CD by @tengomucho in #2
Fix TGI Dockerfile by @shub-kris in #3
Add static KV cache and test on Gemma-2B by @tengomucho in #4
Small optimizations by @tengomucho in #5
Enable compilation by @tengomucho in #6
Revert "fix: attention mask should be 1 or 0" by @tengomucho in #8
feat: use dynamic batching when generating by @tengomucho in #9
Repo layout by @tengomucho in #10
Add PyPI release workflow by @regisss in #11
Xla parallel proxy by @tengomucho in #12
Add documentation to the repository by @mfuntowicz in #13
Adopt naming convention of transformers API by @mfuntowicz in #14
Fix main doc build workflow by @regisss in #15
Improve readme by @mfuntowicz in #16
Fix layout in README by @mfuntowicz in #17
Fix rule and instructions for TGI by @mfuntowicz in #18
Fix typo in index.mdx by @mfuntowicz in #19
Added some links to Cloud TPU documentation by @mikegre-google in #20
Parallel sharding by @tengomucho in #21
Bump version to 0.1.0.dev1 by @mfuntowicz in #24
Bump version to 0.1.0.dev2 by @mfuntowicz in #25
Fix TGI missing import by @mfuntowicz in #27
Forward arguments from TGI launcher to the model by @mfuntowicz in #28
Fix optimum-tpu pip install instructions by @mfuntowicz in #29
Fix tests with do_sample=True by @tengomucho in #30
Sharding in tgi by @tengomucho in #31
Fix missing '=' to assign environment variables in the default case w… by @mfuntowicz in #33
Include two different stages for building TGI image: by @mfuntowicz in #34
Llama support by @tengomucho in #32
chore(ci): added workflow for nightly tests by @tengomucho in #35
fix(build): setup.py removed from build_dist dependencies by @tengomucho in #36
Try again to fix nightly builds by @tengomucho in #37
Basic Llama2 Tuning by @tengomucho in #39
Bug doc builder by @pagezyhf in #40
Fix typo ; Update llama_tuning.md by @furkanakkurt1335 in #42
Update to Pytorch 2.3.0 and transformers v4.40.2 by @tengomucho in #41
Fine tuning with FSDP v2 by @tengomucho in #44
Minor fix for mispelled stage in TGI dockerfile. by @thealmightygrant in #46
Align to Transformers 4.41.1 by @tengomucho in #45
chore(training): Allow training on torch xla > 2.3.0, add warning by @tengomucho in #48
fix(build): add missing setuptools_scm section by @tengomucho in #49
fix(logging): correct logging usage by @tengomucho in #50
fix(tests): fix decode sample expected outputs again by @tengomucho in #52
fix(doc): update server and port when serving TGI by @tengomucho in #53
fix(ci): correct secrets leak workflow check by @tengomucho in #55
Add Mistral support 💨 by @tengomucho in #54
Mistral nits by @tengomucho in #57
chore: bump to version v0.1.0a1 by @tengomucho in #60
feat(TGI): add release docker image build and push to registry workflow by @tengomucho in #62
chore: bump to version v0.1.1 by @tengomucho in #63

New Contributors

@tengomucho made their first contribution in #1
@shub-kris made their first contribution in #3
@regisss made their first contribution in #11
@mfuntowicz made their first contribution in #13
@mikegre-google made their first contribution in #20
@pagezyhf made their first contribution in #40
@furkanakkurt1335 made their first contribution in #42
@thealmightygrant made their first contribution in #46

Full Changelog: https://github.com/huggingface/optimum-tpu/commits/v0.1.1

Contributors

mfuntowicz, thealmightygrant, and 6 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: huggingface/optimum-tpu

v0.2.3

What's Changed

Contributors

v0.2.1

What's Changed

Contributors

v0.2.0

What's Changed

New Contributors

Contributors

v0.1.5

v0.1.4

What's Changed

Contributors

v0.1.3

What's Changed

Contributors

v0.1.2

What's Changed

Contributors

v0.1.1

What's Changed

New Contributors

Contributors