Release Release v4.46.0 · huggingface/transformers

New model additions

Moshi

The Moshi model was proposed in Moshi: a speech-text foundation model for real-time dialogue by Alexandre Défossez,
Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour.

Moshi is a speech-text foundation model that casts spoken dialogue as speech-to-speech generation. Starting from a
text language model backbone, Moshi generates speech as tokens from the residual quantizer of a neural audio codec,
while modeling separately its own speech and that of the user into parallel streams. This allows for the removal of
explicit speaker turns, and the modeling of arbitrary conversational dynamics. Moshi also predicts time-aligned text
tokens as a prefix to audio tokens. This “Inner Monologue” method significantly improves the linguistic quality of
generated speech and provides streaming speech recognition and text-to-speech. As a result, Moshi is the first
real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice.

Moshi integration by @ylacombe in #33624

Zamba

Zamba-7B-v1 is a hybrid between state-space models (Specifically Mamba) and transformer, and was trained using
next-token prediction. Zamba uses a shared transformer layer after every 6 mamba blocks. It uses the Mistral
v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was
pre-trained on 1T tokens of text and code data.

Add Zamba by @pglorio in #30950

GLM

The GLM Model was proposed in ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools by GLM Team,
THUDM & ZhipuAI.

The abstract from the paper starts with the following:

We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This
report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B.

add Glm by @Cyrilvallez in #33823

Idefics 3

The Idefics3 model was proposed in Building and better understanding vision-language models: insights and future directions by Hugo Laurençon, Andrés Marafioti, Victor Sanh, and Léo Tronchon.

Idefics3 is an adaptation of the Idefics2 model with three main differences:

It uses Llama3 for the text model.
It uses an updated processing logic for the images.
It removes the perceiver.

Add Idefics 3! by @andimarafioti in #32473

PhiMoE

The PhiMoE model was proposed in Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone by Microsoft.

This model is very similar to Mixtral with the main difference of Phi3LongRoPEScaledRotaryEmbedding, where they are
used to extend the context of the rotary embeddings. The query, key and values are fused, and the MLP’s up and gate
projection layers are also fused.

PhiMoE by @garg-amit in #33363

Watermarking

This release adds SynthID, a novel state-of-the-art watermarking technique by Google DeepMind. SynthID has a low generation-time computational cost and can be configured to be nearly imperceptible (at the cost of harder watermarking detection). The release also comes with the code to train and run the corresponding detector, which is a machine learning model itself.

from transformers import AutoModelForCausalLM, AutoTokenizer, SynthIDTextWatermarkingConfig

tokenizer = AutoTokenizer.from_pretrained('google/gemma-2-2b', padding_side="left")
model = AutoModelForCausalLM.from_pretrained('google/gemma-2-2b')

# SynthID Text configuration
watermarking_config = SynthIDTextWatermarkingConfig(
    keys=[654, 400, 836, 123, 340, 443, 597, 160, 57],
    ngram_len=5,
)

# Generation with watermarking
tokenized_prompts = tokenizer(["Once upon a time, "], return_tensors="pt", padding=True)
output_sequences = model.generate(
    **tokenized_prompts, watermarking_config=watermarking_config, do_sample=True, max_new_tokens=10
)
watermarked_text = tokenizer.batch_decode(output_sequences, skip_special_tokens=True)
print(watermarked_text)

Docs for applying SynthID watermarking: https://huggingface.co/docs/transformers/internal/generation_utils#transformers.SynthIDTextWatermarkLogitsProcessor
Docs for detecting SynthID watermarking: https://huggingface.co/docs/transformers/internal/generation_utils#transformers.SynthIDTextWatermarkDetector

Add SynthID (watermerking by Google DeepMind) by @gante in #34350

Quantization

BitNet

BitNet is an architecture introduced by Microsoft Research that uses extreme quantization, representing each parameter with only three values: -1, 0, and 1. This results in a model that uses just 1.58 bits per parameter, significantly reducing computational and memory requirements. It replaces traditional Linear layers in Multi-Head Attention and Feed-Forward Networks with specialized layers called BitLinears that use ternary precision (or even binary, in the initial version)

FEAT : Adding BitNet quantization method to HFQuantizer by @MekkCyber in #33410

GGUF loading in transformers

More architectures are now supported in our GGUF loader; GGUF files saved with this architecture can now
be loaded directly in transformers to be fine-tuned. We recommend using tooling from llama.cpp to requantize
the models after further training has been done.

Add gguf support for bloom by @VladOS95-cyber in #33473
Add falcon gguf by @g-prz in #33437
Add gguf support for StableLM by @VladOS95-cyber in #33793
Add gguf support for gpt2 by @VladOS95-cyber in #34044
Add GGUF for starcoder2 by @VladOS95-cyber in #34094

Notable improvements and additions

Pipeline API synchronisation

We are pushing for a unified inference API across multiple libraries. As part of this, we are cleaning up the input and output signatures for our pipeline classes and deprecating some rarely-used arguments. This is still a work-in-progress, but when it's finished, transformers pipelines should exactly match workflows in deployment libraries like transformers.js or TGI, allowing you to seamlessly move from development to production.

Sync video classification pipeline with huggingface_hub spec by @Rocketknight1 in #34288
Image pipelines spec compliance by @Rocketknight1 in #33899
Make ASR pipeline compliant with Hub spec + add tests by @Rocketknight1 in #33769
Cleanup return_text and return_full_text options in TextGenerationPipeline by @Rocketknight1 in #33542
Make audio classification pipeline spec-compliant and add test by @Rocketknight1 in #33730
Sync QuestionAnsweringPipeline by @Rocketknight1 in #34039

Also, pipelines now fully support the Processor class, used by vision-language models. Expect full pipeline support for chatting with VLMs in the very near future!

Make pipeline able to load processor by @qubvel in #32514

Executorch compatibility

ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch ecosystem and supports the deployment of PyTorch models with a focus on portability, productivity, and performance.

We are collaborating with the executorch team so that 🤗 Transformers models can be exported using torch.export. The goal of this integration is not only to enable export but also to ensure that the exported artifact can be further lowered and optimized to run efficiently in ExecuTorch, particularly for mobile and edge use cases.

Generate using exported model and enable gemma2-2b in ExecuTorch by @guangy10 in #33707
Qwen2.5 is ExecuTorch Compatible by @guangy10 in #34102
Olmo is ExecuTorch Compatible by @guangy10 in #34181
Llama3 and Llama2 are ExecuTorch compatible by @guangy10 in #34101

Gradient accumulation bugfix

Fix Gradient Accumulation issue by @ArthurZucker in #34191
Enable users to use their own loss functions + deal with prefetching for grad accum by @muellerzr in #34198
Enable Gradient Accumulation fix across all models + trainer fully in forward() by @muellerzr #34283

Bugfixes and improvements

adding positional encoder changes and tests by @manuelsh in #32600
Uniformize kwargs for chameleon processor by @leloykun in #32181
[MllamaProcessor] Update errors and API with multiple image by @ArthurZucker in #33715
fix: use correct var names for check_tokenizers script by @niqodea in #33702
Fix docs and docstrings Omdet-Turbo by @yonigozlan in #33726
Fix position embeddings singular/plural by @molbap in #33678
Generate: can_generate() recursive check by @gante in #33718
clean_up_tokenization_spaces=False if unset by @itazap in #31938
fix: add docstring for image_size in Convnextv2 config by @lucianosrp in #33734
Fix modular model converter unable to generate Processor classes by @tonywu71 in #33737
fix trainer tr_loss add error by @Wang-Xiaodong1899 in #33651
Update Albumentations Versions by @vasqu in #33704
Doc and config mismatch for DeBERTa by @fkrasnov2 in #33713
[clean_up_tokenization_spaces] Pl bart was failing, updating by @ArthurZucker in #33735
[MllamaImageProcessing] Update doc by @ArthurZucker in #33747
Make siglip examples clearer and error free by @jbn in #33667
Paligemma support for multi-image by @zucchini-nlp in #33447
remove warning v2 by @itazap in #33761
Model addition timeline by @LysandreJik in #33762
Fix typing in load_balancing_loss_func function of modeling_mixtral.py. by @PhilipMay in #33641
Enable non-safetensor ser/deser for TorchAoConfig quantized model 🔴 by @jerryzh168 in #33456
Fix typo in documentation by @qgallouedec in #33805
Hqq serialization by @mobicham in #33141
Add Slow CI reminder bot by @ydshieh in #33506
[modular] fixes! by @ArthurZucker in #33820
Fix ViT-MAE decoder interpolate by @xenova in #33330
Fixes for issue #33763 in idefics2 model by @aroun-coumar in #33766
Fix link in gguf.md by @pogpog in #33768
minor typo fix by @a-r-r-o-w in #33784
Fix Mamba slow path bug with dtype mismatch. by @Adibvafa in #32691
Fix passing str dtype to static cache by @guangy10 in #33741
fix check for hidden size in text model for deepspeed zero3 auto entries by @winglian in #33829
post reminder comment only once by @ydshieh in #33848
Generate: move llama prepare_inputs_for_generation to GenerationMixin by @gante in #33677
Refactor image features selection in LlaVa by @kenza-bouzid in #33696
fix: skip dropout in eval for flash_attn in various models by @fdschmidt93 in #33844
add attention weight up-cast to float32 in chameleon by @francescortu in #33822
Workaround for bark issue in pipelines by @Rocketknight1 in #33824
Fix device mismatch errors by @zucchini-nlp in #33851
This PR contains additional changes for #33143 by @aroun-coumar in #33581
Raise accelerate dependency error in case of defaulting low_cpu_mem_usage=True by @kylesayrs in #33830
Validate the eval dataset in advance. by @jackyjinjing in #33743
Add include_loss_for_metrics by @Manalelaidouni in #33088
Avoid using context that is not accessable from external contributors by @ydshieh in #33866
fix: repair depth estimation multiprocessing by @niqodea in #33759
Move weight initilization deformabledetr by @g-prz in #33339
[Fix] ViViT interpolate_pos_encoding by @RUFFY-369 in #33815
Repo consistency fix after #33339 by @amyeroberts in #33873
Add support for custom inputs and batched inputs in ProcessorTesterMixin by @yonigozlan in #33711
Fix: typo by @TrickEye in #33880
Uniformize model processors by @molbap in #31368
Don't run reminder bot for now by @ydshieh in #33883
populate quantization_config for kv-cache-scheme only configs by @horheynm in #33874
Allow for nightly packages of compressed_tensors by @kylesayrs in #33828
Fix kwargs passed by AutoQuantizationConfig.from_pretrained by @kylesayrs in #33798
Add sdpa for DistilBert by @OmarManzoor in #33724
Trainer - deprecate tokenizer for processing_class by @amyeroberts in #32385
[Quantization] Switch to optimum-quanto by @SunMarc in #31732
Optim deformable detr by @yonigozlan in #33600
Handle Trainer tokenizer kwarg deprecation with decorator by @qubvel in #33887
rename all test_processing_.py to test_processor_.py by @yonigozlan in #33878
uniformize processor Mllama by @yonigozlan in #33876
Fix dt proj bias reassigned by @HofitBata in #33314
Update an keyerror on _save_check_point prevent confusion of missing … by @fadingNA in #33832
VLM Generate: tag test_static_cache_matches_dynamic as flaky by @gante in #33630
Migrate the CI runners to the new clusters by @glegendre01 in #33849
Fix module initialization for root module under Zero3 by @Ben-Schneider-code in #33632
Add SplinterTokenizer unit test by @ariepratama in #32652
Generate tests: modality-agnostic input preparation by @gante in #33685
Fix: use unidic-lite instead of ipadic as the tokenizer dictionary for Japanese by @KanTakahiro in #33372
[Tests] Diverse Whisper fixes by @ylacombe in #33665
[PEFT] Support low_cpu_mem_usage option for PEFT loading adapters by @BenjaminBossan in #33725
add setter for trainer processor by @ArthurZucker in #33911
Add support for weights_only flag when loading state_dict by @jerryzh168 in #32481
Config: lower save_pretrained exception to warning by @gante in #33906
Uniformize kwargs for Idefics/2 processors by @yonigozlan in #32568
Remove logits.float() by @ringohoffman in #33902
Minor error condition bug fix by @htahboub in #33781
Fix distil whisper segment computation by @ylacombe in #33920
[Doc]: Broken link in Kubernetes doc by @saldanhad in #33879
[i18n-ru] Fixes typo in the README_ru.md by @Artanias in #33882
Ignore keys on validate_rope by @zucchini-nlp in #33753
[PR run-slow] by @ArthurZucker in #33939
Add a section on writing tool templates to the chat template docs by @Rocketknight1 in #33924
Enables CPU AWQ model with IPEX version. by @jiqing-feng in #33460
🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. by @abuelnasr0 in #33325
Removed unnecessary transpose in Switch Transformer Routing by @karan-uppal3 in #33582
Fix attn mask ignore logic in training-time trace by @zhenglongjiepheonix in #32613
hot fix self.position_embeddings->self.position_embedding by @ArthurZucker in #33958
fix red check-copies by @ArthurZucker in #33964
Cache: revert DynamicCache init for BC by @gante in #33861
Paligemma: fix static cache test by @zucchini-nlp in #33941
Updating char_to_token documentation to note behaviour when trim_offsets is True by @Craigacp in #33919
add test for Jamba with new model jamba-tiny-dev by @yecohn in #33863
Bug fix gguf qwen2moe by @VladOS95-cyber in #33940
[TF] Fix Tensorflow XLA Generation on limited seq_len models by @vasqu in #33903
[WIP] Add Tokenizer for MyT5 Model by @tomlimi in #31286
Add position ids in forward pass to opt model by @avishaiElmakies in #33121
Flash-attn performance: remove cuda sync during inference by @Cyrilvallez in #33570
[Docs] Improve VLM docs by @NielsRogge in #33393
[Docs] Add Developer Guide: How to Hack Any Transformers Model by @MagnusS0 in #33979
[Red CIs] Fix hub failures by @ArthurZucker in #34001
Fix Tensor + Embedding error in some cases when using SiglipVisionModel by @kaitolucifer in #33994
properly fix and RUN_SLOW by @ArthurZucker in #33965
Enable customized optimizer for DeepSpeed by @dataKim1201 in #32049
[pytes collection] Fix flax test collection by @ArthurZucker in #34004
Fix undefined default_config in configuration_utils.py by @mgoin in #33934
🌐 [i18n-KO] Translated gguf.md to Korean by @yijun-lee in #33764
🌐 [i18n-KO] Translated swinv2.md to Korean by @mreraser in #33566
🌐 [i18n-KO] Translated audio_utils.md to Korean by @yijun-lee in #33802
🌐 [i18n-KO] Translated esm.md to Korean by @yijun-lee in #33796
🌐 [i18n-KO] Translated time_series_utils.md to Korean by @yijun-lee in #33806
🌐 [i18n-KO] Translated pipelines_utils.md to Korean by @yijun-lee in #33809
🌐 [i18n-KO] Translated trainer.md to Korean by @yijun-lee in #33797
🌐 [i18n-KO] Translated chameleon.md to Korean by @yijun-lee in #33799
🌐 [i18n-KO] Translated logging.md to Korean by @chhaewxn in #33543
🌐 [i18n-KO] Translated auto.md to Korean by @boyunJang in #33590
🌐 [i18n-KO] Translated swin2sr.md to Korean by @mreraser in #33795
🌐 [i18n-KO] Translated vit.md to Korean by @mreraser in #33884
🌐 [i18n-KO] Translated gemma.md to Korean by @yijun-lee in #33936
Cache: slight change in naming by @zucchini-nlp in #32421
Add support for all and potentilly deleting functions by @ArthurZucker in #33859
Processors: don't default padding side by @zucchini-nlp in #33942
Add auto model for image-text-to-text by @yonigozlan in #32472
BatchFeature.to() supports non-tensor keys by @Rocketknight1 in #33918
Improve modular converter by @Cyrilvallez in #33991
Fixup DeepSpeed things by @muellerzr in #34007
Fix typing issue by @SunMarc in #34012
fix awq tests due to ipex backend by @SunMarc in #34011
Remove decoder_config=None by @SunMarc in #34014
Fix trainer_seq2seq.py's __init__ type annotations by @benglewis in #34021
🌐 [i18n-KO] Translated feature_extractor.md to Korean by @yijun-lee in #33775
🌐 [i18n-KO] Translated bertweet.md to Korean by @ahnjj in #33891
🌐 [i18n-KO] Translated gpt_neox_japanese.md to Korean by @ahnjj in #33894
🌐 [i18n-KO] Translated rag.md to Korean by @chhaewxn in #33989
🌐 [i18n-KO] Translated main_classes/quantization.md to Korean by @fabxoe in #33959
🌐 [i18n-KO] Translated main_classes/configuration.md to Korean by @fabxoe in #33952
🌐 [i18n-KO] Translated model_doc/mamba.md to Korean by @fabxoe in #33626
🌐 [i18n-KO] Translated model_doc/autoformer.md to Korean by @fabxoe in #33574
🌐 [i18n-KO] Translated model_doc/patchtsmixer.md to Korean by @fabxoe in #33587
🌐 [i18n-KO] Translated �model_doc/clip.md to Korean by @fabxoe in #33610
🌐 [i18n-KO] Translated model_doc/paligemma.md to Korean by @fabxoe in #33612
🌐 [i18n-KO] Translated model_doc/llama3.md to Korean by @fabxoe in #33635
🌐 [i18n-KO] Translated model_doc/mistral.md to Korean by @fabxoe in #33648
🌐 [i18n-KO] Translated model_doc/cohere.md to Korean by @fabxoe in #33885
🌐 [i18n-KO] Translated model_doc/dbrx.md to Korean by @fabxoe in #33951
🌐 [i18n-KO] Translated model_doc/deberta-v2.md to Korean by @fabxoe in #33968
🌐 [i18n-KO] Translated main_classes/onnx.md to Korean by @fabxoe in #33601
🌐 [i18n-KO] Translated tokenization_utils.md to Korean by @yijun-lee in #33813
🌐 [i18n-KO] Translated swin.md to Korean by @mreraser in #33510
🌐 [i18n-KO] Translated file_utils.md to Korean by @yijun-lee in #33803
🌐 [i18n-KO] Translated openai-gpt.md to Korean by @yijun-lee in #33801
🌐 [i18n-KO] Translated biogpt.md to Korean by @yijun-lee in #33773
🌐 [i18n-KO] Translated blip.md to Korean by @cjfghk5697 in #33515
🌐 [i18n-KO] Translated output.md to Korean by @4N3MONE in #33607
🌐 [i18n-KO] Translated image_processing_utils.md to Korean by @yijun-lee in #33804
🌐 [i18n-KO] Translated modular_transformers.md to Korean by @yijun-lee in #33772
[Patch helper] update to not have to checkout main by @ArthurZucker in #34006
Fix Failed tests with mobile bert resize tokens embedding by @abuelnasr0 in #33950
Generate: remove most decoder-only LLMs prepare_inputs_for_generation by @gante in #33870
Mllama: fix tests by @zucchini-nlp in #34000
Fix PIL dep for tests by @muellerzr in #34028
🌐 [i18n-KO] Translated model_doc/bart.md to Korean by @fabxoe in #33893
🌐 [i18n-KO] Translated model_doc/deberta.md to Korean by @fabxoe in #33967
🌐 [i18n-KO] Translated main_classes/keras_callbacks.md to Korean by @fabxoe in #33955
🌐 [i18n-KO] Translated model_doc/mamba2.md to Korean by @fabxoe in #33629
🌐 [i18n-KO] Translated main_classes/model.md to Korean by @fabxoe in #33606
🌐 [i18n-KO] Translated model_doc/trajectory_transformer.md to Korean by @fabxoe in #33597
🌐 [i18n-KO] Translated model_doc/time_series_transformer.md to Korean by @fabxoe in #33596
🌐 [i18n-KO] Translated model_doc/informer.md to Korean by @fabxoe in #33585
🌐 [i18n-KO] Translated model_doc/graphormer.md to Korean by @fabxoe in #33569
🌐 [i18n-KO] Translated modeling_utils.md to Korean by @yijun-lee in #33808
🌐 [i18n-KO] Translated main_classes/data_collator.md to Korean by @fabxoe in #33954
🌐 [i18n-KO] Translated model_doc/patchtst.md to Korean by @fabxoe in #33589
🌐 [i18n-KO] Translated text_generation.md to Korean by @yijun-lee in #33777
🌐 [i18n-KO] Translated main_classes/callback.md to Korean by @Jwaminju in #33572
🌐 [i18n-KO] Translated generation_utils.md to Korean by @yijun-lee in #33818
Add Translate docs into Arabic - section files CONCEPTUAL GUIDES by @AhmedAlmaghz in #33982
add sdpa to OPT by @avishaiElmakies in #33298
Phi3: fix attn for sliding window by @zucchini-nlp in #33586
HfArgumentParser: allow for hyhenated field names in long-options by @djmarti in #33990
Fix pipelines tests by @qubvel in #34049
Specifying torch dtype in Qwen2VLForConditionalGeneration by @htahboub in #33953
Universal Assisted Generation: Assisted generation with any assistant model (by Intel Labs) by @danielkorat in #33383
check if eigenvalues of covariance matrix are complex. by @abuelnasr0 in #34037
[Docs] Update compressed_tensors.md by @mgoin in #33961
Fix data_seed unused by @MekkCyber in #33731
[TESTS] ASR pipeline by @ylacombe in #33925
Update Blip2 is_pipeline_test_to_skip method signature by @qubvel in #34067
provide trust_remote_code for search feat extractor in model config by @eaidova in #34036
Small Fix to modular converter by @MekkCyber in #34051
Default synced_gpus to True when using FullyShardedDataParallel by @ringohoffman in #33483
Idefics: fix position ids by @zucchini-nlp in #33907
Update SSH workflow file by @ydshieh in #34084
Tests: upcast logits to float() by @gante in #34042
Fix flax failures by @LysandreJik in #33912
Fix DAC slow tests by @ylacombe in #34088
Fix failing conversion by @LysandreJik in #34010
Fix PushToHubMixin when pusing to a PR revision by @Wauplin in #34090
avoid many failures for ImageGPT by @ydshieh in #34071
Fix NaNs in cost_matrix for mask2former by @ducha-aiki in #34074
Fix flaky tests by @zucchini-nlp in #34069
Generate: move prepare_inputs_for_generation in encoder-decoder llms by @gante in #34048
Avoid many test failures for LlavaNextVideoForConditionalGeneration by @ydshieh in #34070
refactor: benchmarks by @McPatate in #33896
fix(ci): benchmarks dashboard was failing due to missing quotations by @McPatate in #34100
Generate: Fix modern llm generate calls with synced_gpus by @gante in #34095
Mistral-related models for QnA by @vasqu in #34045
Fix a typo by @PengWeixuan in #34148
Fixed error message in mllama by @dmgcsilva in #34106
Specify that users should be careful with their own files by @LysandreJik in #34153
Add documentation for docker by @ArthurZucker in #33156
Update README.md with Enterprise Hub by @gary149 in #34150
Idefics: enable generation tests by @zucchini-nlp in #34062
Add sdpa for Vivit by @RUFFY-369 in #33757
Fix FSDP resume Initialization issue by @Itssshikhar in #34032
Fix default behaviour in TextClassificationPipeline for regression problem type by @subhalingamd in #34066
Generate: move logits to same device as input_ids by @gante in #34076
Add support for inheritance from class with different suffix in modular by @yonigozlan in #34077
Fix optuna ddp hp search by @SunMarc in #34073
[feat] LlavaNext add feature size check to avoid CUDA Runtime Error by @laurentd-lunit in #33608
🌐 [i18n-KO] Translated vivit.md to Korean by @mreraser in #33935
🌐 [i18n-KO] Translated gemma2.md to Korean by @yijun-lee in #33937
🌐 [i18n-KO] Translated trainer_utils.md to Korean by @yijun-lee in #33817
🌐 [i18n-KO] Translated blip-2.md to Korean by @cjfghk5697 in #33516
IDEFICS: support inputs embeds by @zucchini-nlp in #34043
[fix] fix token healing tests and usage errors by @alpertunga-bile in #33931
Revert accelerate error caused by 46d09af by @steveepreston in #34197
Fix wrong name for llava onevision and qwen2_vl in tokenization auto by @yonigozlan in #34177
Avoid using torch's Tensor or PIL's Image in chat template utils if not available by @RezaRahemtola in #34165
Revert "Fix FSDP resume Initialization issue" by @SunMarc in #34193
Update trainer._get_eval_sampler() to support group_by_length arg by @larin92 in #33514
Fix warning message for fp32_cpu_offloading in bitsandbytes configs by @amosyou in #34079
Ping team members for new failed tests in daily CI by @ydshieh in #34171
fix(Wav2Vec2ForCTC): torch export by @chrsmcgrr in #34023
Fix for tokenizer.apply_chat_template with continue_final_message=True by @schoennenbeck in #34214
removes decord by @vrnvu in #33987
Fix bus error when using GPT2 on M1 macs by @chanind in #34031
Generate: visit non-llm prepare_inputs_for_generation by @gante in #34199
Support Llama 3.2 conversion (text models) by @pcuenca in #33778
Fix-red-ci by @ArthurZucker in #34230
BLIP: fix input expansion logic by @zucchini-nlp in #34225
Fix broken test decorator require_torch_up_to_2_accelerators by @byi8220 in #34201
Informative 2 by @LysandreJik in #34154
Fix UDOP dtype issue by @Rocketknight1 in #34180
Only cast logits to float when computing loss by @ringohoffman in #34147
Generation tests: don't rely on main input name by @zucchini-nlp in #34228
Change Paligemma import logging to work with modular by @yonigozlan in #34211
Add DetrImageProcessorFast by @yonigozlan in #34063
Add a doc section on writing generation prompts by @Rocketknight1 in #34248
Fix method name which changes in tutorial by @andimarafioti in #34252
Attn implementation for composite models by @zucchini-nlp in #32238
VLM: add more modularity by @zucchini-nlp in #34175
T5 compile compatibilty by @zucchini-nlp in #34089
[docs] Fix GenerationConfig params by @stevhliu in #34299
Fix Korean doc _toctree.yml by @regisss in #34293
Update PR templates by @SunMarc in #34065
[RT-DETR] Fix onnx inference bug for Optype (Where) by @YHallouard in #33877
Fix FA2 attention for models supporting sliding window by @Cyrilvallez in #34093
Fix: tensor of examples of the same length triggers invalid stacking by @pbelcak in #34166
Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies by @alex-bene in #32550
Add option for running ffmpeg_microphone_live as a background process by @mikamerath in #32838
Feature: Add MLFLOW_MAX_LOG_PARAMS to MLflowCallback by @cecheta in #34279
Fix continue_final_message for image-text-to-text chat templates by @yonigozlan in #34236
fix error in _get_eval_sampler when group_by_length enabled by @akakakakakaa in #34237
[docs] fix typo by @faaany in #34235
🌐 [i18n-KO] Translated executorch.md to Korean by @ahnjj in #33888
🌐 [i18n-KO] Translated bert japanese.md to Korean by @ahnjj in #33890
🌐 [i18n-KO] Translated model_doc/bartpho.md to Korean by @Jwaminju in #33981
Example doc for token classification of Llama and Dependent/Copied Models by @h3110Fr13nd in #34139
[docs] Fix Korean toctree by @stevhliu in #34324
Added Deberta model type support by @FilipposVentirozos in #34308

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@manuelsh
- adding positional encoder changes and tests (#32600)
@ArthurZucker
- [MllamaProcessor] Update errors and API with multiple image (#33715)
- [clean_up_tokenization_spaces] Pl bart was failing, updating (#33735)
- [MllamaImageProcessing] Update doc (#33747)
- [modular] fixes! (#33820)
- add setter for trainer processor (#33911)
- [PR run-slow] (#33939)
- hot fix self.position_embeddings->self.position_embedding (#33958)
- fix red check-copies (#33964)
- [Red CIs] Fix hub failures (#34001)
- properly fix and RUN_SLOW (#33965)
- [pytes collection] Fix flax test collection (#34004)
- Add support for all and potentilly deleting functions (#33859)
- [Patch helper] update to not have to checkout main (#34006)
- Add documentation for docker (#33156)
- Fix Gradient Accumulation issue (#34191)
- Fix-red-ci (#34230)
@molbap
- Fix position embeddings singular/plural (#33678)
- Uniformize model processors (#31368)
@vasqu
- Update Albumentations Versions (#33704)
- [TF] Fix Tensorflow XLA Generation on limited seq_len models (#33903)
- Mistral-related models for QnA (#34045)
@VladOS95-cyber
- Add gguf support for bloom (#33473)
- Bug fix gguf qwen2moe (#33940)
- Add gguf support for StableLM (#33793)
- Add gguf support for gpt2 (#34044)
- Add GGUF for starcoder2 (#34094)
@ydshieh
- Add Slow CI reminder bot (#33506)
- post reminder comment only once (#33848)
- Avoid using context that is not accessable from external contributors (#33866)
- Don't run reminder bot for now (#33883)
- Update SSH workflow file (#34084)
- avoid many failures for ImageGPT (#34071)
- Avoid many test failures for LlavaNextVideoForConditionalGeneration (#34070)
- Ping team members for new failed tests in daily CI (#34171)
@amyeroberts
- Repo consistency fix after #33339 (#33873)
- Trainer - deprecate tokenizer for processing_class (#32385)
@ylacombe
- [Tests] Diverse Whisper fixes (#33665)
- Fix distil whisper segment computation (#33920)
- [TESTS] ASR pipeline (#33925)
- Fix DAC slow tests (#34088)
- Moshi integration (#33624)
@ringohoffman
- Remove logits.float() (#33902)
- Default synced_gpus to True when using FullyShardedDataParallel (#33483)
- Only cast logits to float when computing loss (#34147)
@garg-amit
- PhiMoE (#33363)
@pglorio
- Add Zamba (#30950)
@tomlimi
- [WIP] Add Tokenizer for MyT5 Model (#31286)
@yijun-lee
- 🌐 [i18n-KO] Translated gguf.md to Korean (#33764)
- 🌐 [i18n-KO] Translated audio_utils.md to Korean (#33802)
- 🌐 [i18n-KO] Translated esm.md to Korean (#33796)
- 🌐 [i18n-KO] Translated time_series_utils.md to Korean (#33806)
- 🌐 [i18n-KO] Translated pipelines_utils.md to Korean (#33809)
- 🌐 [i18n-KO] Translated trainer.md to Korean (#33797)
- 🌐 [i18n-KO] Translated chameleon.md to Korean (#33799)
- 🌐 [i18n-KO] Translated gemma.md to Korean (#33936)
- 🌐 [i18n-KO] Translated feature_extractor.md to Korean (#33775)
- 🌐 [i18n-KO] Translated tokenization_utils.md to Korean (#33813)
- 🌐 [i18n-KO] Translated file_utils.md to Korean (#33803)
- 🌐 [i18n-KO] Translated openai-gpt.md to Korean (#33801)
- 🌐 [i18n-KO] Translated biogpt.md to Korean (#33773)
- 🌐 [i18n-KO] Translated image_processing_utils.md to Korean (#33804)
- 🌐 [i18n-KO] Translated modular_transformers.md to Korean (#33772)
- 🌐 [i18n-KO] Translated modeling_utils.md to Korean (#33808)
- 🌐 [i18n-KO] Translated text_generation.md to Korean (#33777)
- 🌐 [i18n-KO] Translated generation_utils.md to Korean (#33818)
- 🌐 [i18n-KO] Translated gemma2.md to Korean (#33937)
- 🌐 [i18n-KO] Translated trainer_utils.md to Korean (#33817)
@fabxoe
- 🌐 [i18n-KO] Translated main_classes/quantization.md to Korean (#33959)
- 🌐 [i18n-KO] Translated main_classes/configuration.md to Korean (#33952)
- 🌐 [i18n-KO] Translated model_doc/mamba.md to Korean (#33626)
- 🌐 [i18n-KO] Translated model_doc/autoformer.md to Korean (#33574)
- 🌐 [i18n-KO] Translated model_doc/patchtsmixer.md to Korean (#33587)
- 🌐 [i18n-KO] Translated �model_doc/clip.md to Korean (#33610)
- 🌐 [i18n-KO] Translated model_doc/paligemma.md to Korean (#33612)
- 🌐 [i18n-KO] Translated model_doc/llama3.md to Korean (#33635)
- 🌐 [i18n-KO] Translated model_doc/mistral.md to Korean (#33648)
- 🌐 [i18n-KO] Translated model_doc/cohere.md to Korean (#33885)
- 🌐 [i18n-KO] Translated model_doc/dbrx.md to Korean (#33951)
- 🌐 [i18n-KO] Translated model_doc/deberta-v2.md to Korean (#33968)
- 🌐 [i18n-KO] Translated main_classes/onnx.md to Korean (#33601)
- 🌐 [i18n-KO] Translated model_doc/bart.md to Korean (#33893)
- 🌐 [i18n-KO] Translated model_doc/deberta.md to Korean (#33967)
- 🌐 [i18n-KO] Translated main_classes/keras_callbacks.md to Korean (#33955)
- 🌐 [i18n-KO] Translated model_doc/mamba2.md to Korean (#33629)
- 🌐 [i18n-KO] Translated main_classes/model.md to Korean (#33606)
- 🌐 [i18n-KO] Translated model_doc/trajectory_transformer.md to Korean (#33597)
- 🌐 [i18n-KO] Translated model_doc/time_series_transformer.md to Korean (#33596)
- 🌐 [i18n-KO] Translated model_doc/informer.md to Korean (#33585)
- 🌐 [i18n-KO] Translated model_doc/graphormer.md to Korean (#33569)
- 🌐 [i18n-KO] Translated main_classes/data_collator.md to Korean (#33954)
- 🌐 [i18n-KO] Translated model_doc/patchtst.md to Korean (#33589)
@MekkCyber
- FEAT : Adding BitNet quantization method to HFQuantizer (#33410)
- Fix data_seed unused (#33731)
- Small Fix to modular converter (#34051)
@AhmedAlmaghz
- Add Translate docs into Arabic - section files CONCEPTUAL GUIDES (#33982)
@alex-bene
- Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v4.46.0