Releases: huggingface/trl
v0.9.6 release
We are excited to introduce the new v0.9.6 release. Many new exciting features and algorithms. The highlights are as follows:
- Support for SimPO by @fe1ixxu, a reference-free method that also regularizes output length. To use this loss, the users can input
loss_type="simpo"
andcpo_alpha=0
in theCPOConfig
and use it with theCPOTrainer
.
- Added AlignProp by @mihirp1998, a method for finetuning Stable Diffusion model using reward gradients.
- Added Efficient Exact Optimization (EXO) by @haozheji
We also included many important fixes and improvements such as fixing prints in the CLI with GCP containers by @alvarobartt. Enjoy the release!
What's Changed
- set dev version by @younesbelkada in #1710
- Add a variant of CPO, SimPO by @fe1ixxu in #1703
- [RPO] fix nll loss by @kashif in #1705
- fix yaml parser for derived config classes by @mnoukhov in #1713
- Fix default padding_value in dpo_config.py by @mnoukhov in #1692
- feat(ci): add trufflehog secrets detection by @McPatate in #1721
- ktotrainer: Refuse datasets which contain only one class of labels by @jetlime in #1724
- adds AOT by @imelnyk in #1701
- Workflow: Notify tests results on slack channel by @younesbelkada in #1744
- better trl parser with yaml config by @mnoukhov in #1739
- CI / core: Pin
numpy
to!=2.0.0
for CI and to users by @younesbelkada in #1747 TrlParser
: Add ignore extra args option by @younesbelkada in #1748- small KTO fixes by @kawine in #1734
- CPO / DPO: Fix red CI by @younesbelkada in #1749
- prepare deepspeed accomodate fp16 and bf16 by @mnoukhov in #1728
- CI /
KTOTrainer
: Remove old tests by @younesbelkada in #1750 - change the
process
function in the example of DPO by @AIR-hl in #1753 - Integrate f-divergence to DPO (Follow up) by @1485840691 in #1610
- Support for returning past_key_values from the model by @idanshen in #1742
- Fix masking of response tokens by @mertsayar8 in #1718
- Support num_train_epochs by @vwxyzjn in #1743
- Fix: Add dataset_text_field in examples/scripts/sft.py by @scottsuk0306 in #1758
- New sentiment and descriptiveness dataset by @vwxyzjn in #1757
- Add CPO-SimPO method by @fe1ixxu in #1760
- Added Reward Backpropogation Support by @mihirp1998 in #1585
- MoE Models: option to add load balancing loss by @claralp in #1765
evaluation_strategy
toeval_strategy
by @qgallouedec in #1771- add Efficient Exact Optimization (EXO) by @haozheji in #1735
- Remove the leading space in the tldr preference dataset by @vwxyzjn in #1773
- Fix Documentation Overflow Issues for Long URLs in SFTConfig by @Mubin17 in #1774
- Visual DPO by @qgallouedec in #1647
- [DOCS] fix docs and cli example script by @kashif in #1780
- Fixed typo in SFT trainer docs by @detsutut in #1788
- [SFT] add model_init_kwargs to training_args by @kashif in #1787
- Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig by @noahlt in #1794
- Clean examples by @qgallouedec in #1791
- Remove extra print in reward_trainer.py by @mnoukhov in #1799
- Fix
torch_dtype
handling in{DPO,SFT}Trainer
when provided via CLI by @alvarobartt in #1807 - Fix
TRL_USE_RICH
environment variable handling by @alvarobartt in #1808 - 0.9.6 release by @vwxyzjn in #1816
New Contributors
- @McPatate made their first contribution in #1721
- @jetlime made their first contribution in #1724
- @imelnyk made their first contribution in #1701
- @AIR-hl made their first contribution in #1753
- @1485840691 made their first contribution in #1610
- @idanshen made their first contribution in #1742
- @mertsayar8 made their first contribution in #1718
- @scottsuk0306 made their first contribution in #1758
- @mihirp1998 made their first contribution in #1585
- @haozheji made their first contribution in #1735
- @Mubin17 made their first contribution in #1774
- @detsutut made their first contribution in #1788
- @noahlt made their first contribution in #1794
Full Changelog: v0.9.4...v0.9.6
v0.9.4
Mainly backward compatibility fixes with SFTTrainer.
What's Changed
- Fixed doc string and related docs for the SFTConfig update by @GuilhermeFreire in #1706
- SFTTrainer: Fix backward Compatibility issue with
TrainingArguments
by @younesbelkada in #1707 - 0.9.4 release by @vwxyzjn in #1708
New Contributors
- @GuilhermeFreire made their first contribution in #1706
Full Changelog: v0.9.3...v0.9.4
v0.9.3 RLOO / PPOv2 Trainer, RM Visualization
We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:
- RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
- PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
- Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.
Screen.Recording.2024-05-09.at.2.37.44.PM.mov
- New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
- New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)
What's Changed
- set dev version by @younesbelkada in #1568
- fix add_special_tokens issue for data with template by @edixiong in #1509
- [DPO] add 'bco_pair' loss_type by @seanexp in #1524
- [DPO] DPOConfig class by @kashif in #1554
- [SFT] add SFT Trainer Config dataclass by @kashif in #1530
- FIX: Fix CI on transformers main by @younesbelkada in #1576
- [
SFTTrainer
] Add warning in SFTTrainer when dataset already processed by @younesbelkada in #1577 - Fix typo detoxifying doc by @qgallouedec in #1594
- Core: removed unexisting
SftArgumentParser
by @younesbelkada in #1602 - [
KTOTrainer
] add BCO (reward shift and underlying distribution matching) by @seanexp in #1599 - [CLI] Use auto device map for model load by @lewtun in #1596
- Removing
tests/
from package data by @jamesbraza in #1607 - Docs: Fix build main documentation by @younesbelkada in #1604
- support loss function for Self-play Preference Optimization by @winglian in #1612
- Update HH dataset on helpful only subset by @vwxyzjn in #1613
- corrects loss function for Self-play Preference Optimization hard label version by @angelahzyuan in #1615
- Fix ZeRO-3 generation context manager by @lewtun in #1617
- fixed adding bos and eos token unconditionally by @jasonyux in #1591
- visualize rm prediction by @vwxyzjn in #1636
- [ORPO] Correct label mask for pad tokens by @IlyaGusev in #1625
- Update sft_llama2.py to work with the latest API by @xianbaoqian in #1637
- Fixed wrong logs prefixes in KTOTrainer by @bartoszzuk in #1641
- Pairwise Noise Contrastive Alignment by @winglian in #1632
- don't cast the trainable lora layers to half precision by @pacman100 in #1644
- PPO / Reinforce Trainers by @vwxyzjn in #1540
- Apply deprecated
evaluation_strategy
by @muellerzr in #1559 - FEAT: Add support for training collator in PPOTrainer by @younesbelkada in #1658
- Correct Documentation for cDPO Usage by @AliBakly in #1655
- Fix inheritance order in PPOv2Config by @Nicolinho in #1659
- [DPO] Add 'robust' loss_type by @Abilityguy in #1653
- 🤫 TR-DPO implementation by @syrn1k in #1593
- Do not upcast adapters when using FSDP+QLoRA by @pacman100 in #1654
- [Tests] update eval_strategy API by @kashif in #1662
- Fix ppov2 test case by @vwxyzjn in #1661
- FIX / PPO: Fix
enable_input_require_grads
issues with PPO models by @younesbelkada in #1664 - fix dataset load error by @sywangyi in #1670
- FIX / SFTTrainer: Fix SFTTrainer with
args=None
by @younesbelkada in #1678 - Fix max_completion_length for encoder_decoder models in KTO Trainer by @samuki in #1588
- intial RPO loss by @kashif in #1686
- Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig by @alexisrozhkov in #1690
- Skip packing validation by @alex-jw-brooks in #1673
- Fix typo in DPOTrainer's warnings by @qgallouedec in #1688
- Quick fix on GPT4-eval by @vwxyzjn in #1696
- Release 0.9.2 by @vwxyzjn in #1697
New Contributors
- @edixiong made their first contribution in #1509
- @seanexp made their first contribution in #1524
- @jamesbraza made their first contribution in #1607
- @winglian made their first contribution in #1612
- @angelahzyuan made their first contribution in #1615
- @jasonyux made their first contribution in #1591
- @IlyaGusev made their first contribution in #1625
- @xianbaoqian made their first contribution in #1637
- @bartoszzuk made their first contribution in #1641
- @muellerzr made their first contribution in #1559
- @AliBakly made their first contribution in #1655
- @Nicolinho made their first contribution in #1659
- @Abilityguy made their first contribution in #1653
- @syrn1k made their first contribution in #1593
- @alexisrozhkov made their first contribution in #1690
- @alex-jw-brooks made their first contribution in #1673
Full Changelog: v0.8.6...v0.9.2
v0.8.6: Fixes for CLI
What's Changed
- set dev version by @younesbelkada in #1556
- [CLI] Update init.py imports by @kashif in #1557
- CLI: Add warning when ignored params are passed + parse config file if config if passed by @younesbelkada in #1565
- Release: v0.8.6 by @younesbelkada in #1567
Full Changelog: v0.8.5...v0.8.6
v0.8.5: Important fixes for CLIs
What's Changed
- set dev version by @younesbelkada in #1548
- FIX: make the train / test fields modulable by @younesbelkada in #1551
- enable multiple eos tokens by @lvwerra in #1553
- Release: v0.8.5 by @younesbelkada in #1555
Full Changelog: v0.8.4...v0.8.5
v0.8.4: CLI / CPO / KTO important fixes
This patch release includes important fixes for the CLI and KTO & CPO trainers
What's Changed
- set dev version by @younesbelkada in #1529
- [CPO] fix memory leak due to retained value by @kashif in #1531
- VSFT hotfix - adds gen prompt to template and processor to hub by @edbeeching in #1532
- save_model -> save_pretrained in ppo_trainer.mdx by @ejmejm in #1537
- [KTO] support to load the adapter twice by @claralp in #1542
- CLI: Set
dataset_text_field
toNone
to allow ChatML automatic template by @younesbelkada in #1545 - FIX: Fix slow test by @younesbelkada in #1546
- Fixed ref model not used in PPO generation by @ejmejm in #1534
- Release: v0.8.4 by @younesbelkada in #1547
New Contributors
Full Changelog: v0.8.3...v0.8.4
v0.8.3: Patch release for CLI
What's Changed
This is a patch release that includes an import fix for CLIs
- set dev version by @younesbelkada in #1523
- [CLI] fix imports by @kashif in #1527
- Release: v0.8.3 by @younesbelkada in #1528
Full Changelog: v0.8.2...v0.8.3
v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes
ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes
This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer
, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details
ORPO Trainer
CPO Trainer
- Add CPOTrainer by @fe1ixxu in #1382
- Add
use_cache=False
in{ORPO,CPO}Trainer.concatenated_forward
by @alvarobartt in #1478 - [ORPO] Update NLL loss to use
input_ids
instead by @alvarobartt in #1516
VLLMs support for SFTTrainer
You can now use SFTTrainer
to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details
- Adds VLM Training support to SFTTrainer + VSFT script by @edbeeching in #1518
KTO Fixes
Many fixes were introduced for the KTOTrainer:
- Update KTO example to use better model and ChatML support by @lewtun in #1485
- [KTO] Use batching to speed up data processing by @lewtun in #1470
- Update KTO example with good dataset & chat format by @lewtun in #1481
- [KTO] fix interleaving, reporting, and hanging bugs by @kawine and @claralp in #1499
- [KTO] fix metric logging by @claralp in #1514
10x PPO !
Other fixes
- set dev version by @younesbelkada in #1463
- Use the standard dataset for DPO CLI by @vwxyzjn in #1456
- [peft] Update test_reward_trainer.py to fix tests by @kashif in #1471
- Fix hyperparameters in KTO example by @lewtun in #1474
- docs: add missing Trainer classes and sort alphabetically by @anakin87 in #1479
- hackey update to ModelConfig to allow lora_target_modules="all-linear" by @galtay in #1488
- Ignore chat files by @lewtun in #1486
- Add DPO link in README by @qgallouedec in #1502
- Fix typo in how_to_train.md by @ftorres16 in #1503
- Fix DPO Unsloth example in Docs by @arnavgarg1 in #1494
- Correct ppo_epochs usage by @muhammed-shihebi in #1480
- Fix
RichProgressCallback
by @eggry in #1496 - Change the device index to device:index by @yuanwu2017 in #1490
- FIX: use kwargs for RMTrainer by @younesbelkada in #1515
- Allow streaming (datasets.IterableDataset) by @BramVanroy in #1468
- Allow pre-tokenized datasets in SFTTrainer by @BramVanroy in #1520
- [DOC] Add data description for sfttrainer doc by @BramVanroy in #1521
- Release: v0.8.2 by @younesbelkada in #1522
New Contributors
- @fe1ixxu made their first contribution in #1382
- @anakin87 made their first contribution in #1479
- @galtay made their first contribution in #1488
- @qgallouedec made their first contribution in #1502
- @ftorres16 made their first contribution in #1503
- @arnavgarg1 made their first contribution in #1494
- @muhammed-shihebi made their first contribution in #1480
- @eggry made their first contribution in #1496
- @claralp made their first contribution in #1514
Full Changelog: v0.8.1...v0.8.2
v0.8.1: Patch release for CLIs
This patch release includes some important fixes for CLIs
What's Changed
- set dev version by @younesbelkada in #1454
- Fix chat CLI for model revisions by @lewtun in #1458
- [chat] add eos token to generate by @lvwerra in #1459
- Release: v0.8.1 by @younesbelkada in #1462
Full Changelog: v0.8.0...v0.8.1
v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !
New Trainer: KTOTrainer:
We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !
- fix bugs in KTO implementation by @kawine in #1380
- [KTO] merge eval dataset only if it exists by @kashif in #1383
- [KTO] prevent nans from appearing in metrics by @kawine in #1386
- Kto trainer by @kashif in #1181
- [KTO] fix tokenization bugs by @kawine in #1418
- [KTO] model init when args are given by @kashif in #1413
- [KTO] fix various bugs by @kawine in #1402
TRL Command Line Interfaces (CLIs):
Run SFT, DPO and chat with your aligned model directly from the terminal:
SFT:
trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb
DPO:
trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf
Chat:
trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat
Read more about CLI in the relevant documentation section or use --help
for more details.
- FEAT: Add CLIs in TRL ! by @younesbelkada in #1419
- CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in #1446
- chat cli by @lvwerra in #1431
- Fix yaml parsing issue by @younesbelkada in #1450
model
-->model_name_or_path
by @lvwerra in #1452- FEAT: Update README to add DPO + CLIs by @younesbelkada in #1448
FSDP + QLoRA:
SFTTrainer now supports FSDP + QLoRA
- Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416
Other fixes
- set dev version by @younesbelkada in #1332
- Update stack llama 2 example to reflect #aa35fec by @nautsimon in #1333
- FIX: More user friendly error when users don't have PEFT by @younesbelkada in #1350
- fix 8-bit multi-gpu training bug by @fancyerii in #1353
- set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in #1357
- Fix transformers version checking for Python < 3.8 by @samuki in #1363
- Add some arguments for support XPU by @yuanwu2017 in #1366
- ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in #1370
- FEAT: [
SFTTrainer
] Addeval_packing
by @younesbelkada in #1369 - FEAT:
force_use_ref_model
for power users by @younesbelkada in #1367 - FIX: fix after #1370 by @younesbelkada in #1372
- FIX: Change ci to fail-fast=False by @younesbelkada in #1373
- FIX: Fix the CI again .. by @younesbelkada in #1374
- Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in #1391
- Fix the pad_token_id error by @yuanwu2017 in #1394
- FIX [
RewardModeling
] Fix RM script for PEFT by @younesbelkada in #1393 - Fix import error from deprecation in transformers by @lewtun in #1415
- CI: Fix CI on main by @younesbelkada in #1422
- [Kto] torch_dtype kwargs fix by @kashif in #1429
- Create standard dataset for TRL by @vwxyzjn in #1424
- FIX: fix doc build on main by @younesbelkada in #1437
- Fix PPOTrainer README example by @nikihowe in #1441
- Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in #1439
- Release: v0.8.0 by @younesbelkada in #1453
New Contributors
- @nautsimon made their first contribution in #1333
- @fancyerii made their first contribution in #1353
- @samuki made their first contribution in #1363
- @yuanwu2017 made their first contribution in #1366
- @kawine made their first contribution in #1380
- @skavulya made their first contribution in #1391
- @pengwei715 made their first contribution in #1439
Full Changelog: v0.7.11...v0.8.0