v0.9.3 RLOO / PPOv2 Trainer, RM Visualization
We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:
- RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started
- PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started
- Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.
Screen.Recording.2024-05-09.at.2.37.44.PM.mov
- New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
- New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)
What's Changed
- set dev version by @younesbelkada in #1568
- fix add_special_tokens issue for data with template by @edixiong in #1509
- [DPO] add 'bco_pair' loss_type by @seanexp in #1524
- [DPO] DPOConfig class by @kashif in #1554
- [SFT] add SFT Trainer Config dataclass by @kashif in #1530
- FIX: Fix CI on transformers main by @younesbelkada in #1576
- [
SFTTrainer
] Add warning in SFTTrainer when dataset already processed by @younesbelkada in #1577 - Fix typo detoxifying doc by @qgallouedec in #1594
- Core: removed unexisting
SftArgumentParser
by @younesbelkada in #1602 - [
KTOTrainer
] add BCO (reward shift and underlying distribution matching) by @seanexp in #1599 - [CLI] Use auto device map for model load by @lewtun in #1596
- Removing
tests/
from package data by @jamesbraza in #1607 - Docs: Fix build main documentation by @younesbelkada in #1604
- support loss function for Self-play Preference Optimization by @winglian in #1612
- Update HH dataset on helpful only subset by @vwxyzjn in #1613
- corrects loss function for Self-play Preference Optimization hard label version by @angelahzyuan in #1615
- Fix ZeRO-3 generation context manager by @lewtun in #1617
- fixed adding bos and eos token unconditionally by @jasonyux in #1591
- visualize rm prediction by @vwxyzjn in #1636
- [ORPO] Correct label mask for pad tokens by @IlyaGusev in #1625
- Update sft_llama2.py to work with the latest API by @xianbaoqian in #1637
- Fixed wrong logs prefixes in KTOTrainer by @bartoszzuk in #1641
- Pairwise Noise Contrastive Alignment by @winglian in #1632
- don't cast the trainable lora layers to half precision by @pacman100 in #1644
- PPO / Reinforce Trainers by @vwxyzjn in #1540
- Apply deprecated
evaluation_strategy
by @muellerzr in #1559 - FEAT: Add support for training collator in PPOTrainer by @younesbelkada in #1658
- Correct Documentation for cDPO Usage by @AliBakly in #1655
- Fix inheritance order in PPOv2Config by @Nicolinho in #1659
- [DPO] Add 'robust' loss_type by @Abilityguy in #1653
- 🤫 TR-DPO implementation by @syrn1k in #1593
- Do not upcast adapters when using FSDP+QLoRA by @pacman100 in #1654
- [Tests] update eval_strategy API by @kashif in #1662
- Fix ppov2 test case by @vwxyzjn in #1661
- FIX / PPO: Fix
enable_input_require_grads
issues with PPO models by @younesbelkada in #1664 - fix dataset load error by @sywangyi in #1670
- FIX / SFTTrainer: Fix SFTTrainer with
args=None
by @younesbelkada in #1678 - Fix max_completion_length for encoder_decoder models in KTO Trainer by @samuki in #1588
- intial RPO loss by @kashif in #1686
- Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig by @alexisrozhkov in #1690
- Skip packing validation by @alex-jw-brooks in #1673
- Fix typo in DPOTrainer's warnings by @qgallouedec in #1688
- Quick fix on GPT4-eval by @vwxyzjn in #1696
- Release 0.9.2 by @vwxyzjn in #1697
New Contributors
- @edixiong made their first contribution in #1509
- @seanexp made their first contribution in #1524
- @jamesbraza made their first contribution in #1607
- @winglian made their first contribution in #1612
- @angelahzyuan made their first contribution in #1615
- @jasonyux made their first contribution in #1591
- @IlyaGusev made their first contribution in #1625
- @xianbaoqian made their first contribution in #1637
- @bartoszzuk made their first contribution in #1641
- @muellerzr made their first contribution in #1559
- @AliBakly made their first contribution in #1655
- @Nicolinho made their first contribution in #1659
- @Abilityguy made their first contribution in #1653
- @syrn1k made their first contribution in #1593
- @alexisrozhkov made their first contribution in #1690
- @alex-jw-brooks made their first contribution in #1673
Full Changelog: v0.8.6...v0.9.2