v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes
ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes
This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer
, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details
ORPO Trainer
CPO Trainer
- Add CPOTrainer by @fe1ixxu in #1382
- Add
use_cache=False
in{ORPO,CPO}Trainer.concatenated_forward
by @alvarobartt in #1478 - [ORPO] Update NLL loss to use
input_ids
instead by @alvarobartt in #1516
VLLMs support for SFTTrainer
You can now use SFTTrainer
to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details
- Adds VLM Training support to SFTTrainer + VSFT script by @edbeeching in #1518
KTO Fixes
Many fixes were introduced for the KTOTrainer:
- Update KTO example to use better model and ChatML support by @lewtun in #1485
- [KTO] Use batching to speed up data processing by @lewtun in #1470
- Update KTO example with good dataset & chat format by @lewtun in #1481
- [KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in #1499
- [KTO] fix metric logging by @claralp in #1514
10x PPO !
Other fixes
- set dev version by @younesbelkada in #1463
- Use the standard dataset for DPO CLI by @vwxyzjn in #1456
- [peft] Update test_reward_trainer.py to fix tests by @kashif in #1471
- Fix hyperparameters in KTO example by @lewtun in #1474
- docs: add missing Trainer classes and sort alphabetically by @anakin87 in #1479
- hackey update to ModelConfig to allow lora_target_modules="all-linear" by @galtay in #1488
- Ignore chat files by @lewtun in #1486
- Add DPO link in README by @qgallouedec in #1502
- Fix typo in how_to_train.md by @ftorres16 in #1503
- Fix DPO Unsloth example in Docs by @arnavgarg1 in #1494
- Correct ppo_epochs usage by @muhammed-shihebi in #1480
- Fix
RichProgressCallback
by @eggry in #1496 - Change the device index to device:index by @yuanwu2017 in #1490
- FIX: use kwargs for RMTrainer by @younesbelkada in #1515
- Allow streaming (datasets.IterableDataset) by @BramVanroy in #1468
- Allow pre-tokenized datasets in SFTTrainer by @BramVanroy in #1520
- [DOC] Add data description for sfttrainer doc by @BramVanroy in #1521
- Release: v0.8.2 by @younesbelkada in #1522
New Contributors
- @fe1ixxu made their first contribution in #1382
- @anakin87 made their first contribution in #1479
- @galtay made their first contribution in #1488
- @qgallouedec made their first contribution in #1502
- @ftorres16 made their first contribution in #1503
- @arnavgarg1 made their first contribution in #1494
- @muhammed-shihebi made their first contribution in #1480
- @eggry made their first contribution in #1496
- @claralp made their first contribution in #1514
Full Changelog: v0.8.1...v0.8.2