Skip to content

v4.38.2

Compare
Choose a tag to compare
@ArthurZucker ArthurZucker released this 01 Mar 03:24
· 2496 commits to main since this release

Fix backward compatibility issues with Llama and Gemma:

We mostly made sure that performances are not affected by the new change of paradigm with ROPE. Fixed the ROPE computation (should always be in float32) and the causal_mask dtype was set to bool to take less RAM.

YOLOS had a regression, and Llama / T5Tokenizer had a warning popping for random reasons

  • FIX [Gemma] Fix bad rebase with transformers main (#29170)
  • Improve _update_causal_mask performance (#29210)
  • [T5 and Llama Tokenizer] remove warning (#29346)
  • [Llama ROPE] Fix torch export but also slow downs in forward (#29198)
  • RoPE loses precision for Llama / Gemma + Gemma logits.float() (#29285)
  • Patch YOLOS and others (#29353)
  • Use torch.bool instead of torch.int64 for non-persistant causal mask buffer (#29241)