Highlights

Orthogonal Fine-Tuning (OFT): A new adapter that is similar to LoRA and shows a lot of promise for Stable Diffusion, especially with regard to controllability and compositionality. Give it a try! By @okotaku in #1160
Support for parallel linear LoRA layers using Megatron. This should lead to a speed up when using LoRA with Megatron. By @zhangsheng377 in #1092
LoftQ provides a new method to initialize LoRA layers of quantized models. The big advantage is that the LoRA layer weights are chosen in a way to minimize the quantization error, as described here: https://arxiv.org/abs/2310.08659. By @yxli2123 in #1150.

Other notable additions

It is now possible to choose which adapters are merged when calling merge (#1132)
IA³ now supports adapter deletion, by @alexrs (#1153)
A new initialization method for LoRA has been added, "gaussian" (#1189)
When training PEFT models with new tokens being added to the embedding layers, the embedding layer is now saved by default (#1147)
It is now possible to mix certain adapters like LoRA and LoKr in the same model, see the docs (#1163)
We started an initiative to improve the documenation, some of which should already be reflected in the current docs. Still, help by the community is always welcome. Check out this issue to get going.

Migration to v0.7.0

Safetensors are now the default format for PEFT adapters. In practice, users should not have to change anything in their code, PEFT takes care of everything -- just be aware that instead of creating a file adapter_model.bin, calling save_pretrained now creates adapter_model.safetensors. Safetensors have numerous advantages over pickle files (which is the PyTorch default format) and well supported on Hugging Face Hub.
When merging multiple LoRA adapter weights together using add_weighted_adapter with the option combination_type="linear", the scaling of the adapter weights is now performed differently, leading to improved results.
There was a big refactor of the inner workings of some PEFT adapters. For the vast majority of users, this should not make any difference (except making some code run faster). However, if your code is relying on PEFT internals, be aware that the inheritance structure of certain adapter layers has changed (e.g. peft.lora.Linear is no longer a subclass of nn.Linear, so isinstance checks may need updating). Also, to retrieve the original weight of an adapted layer, now use self.get_base_layer().weight, not self.weight (same for bias).

What's Changed

As always, a bunch of small improvements, bug fixes and doc improvements were added. We thank all the external contributors, both new and recurring. Below is the list of all changes since the last release.

After release: Bump version to 0.7.0.dev0 by @BenjaminBossan in #1074
FIX: Skip adaption prompt tests with new transformers versions by @BenjaminBossan in #1077
FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) by @younesbelkada in #1084
Improve documentation for IA³ by @SumanthRH in #984
[Docker] Update Dockerfile to force-use transformers main by @younesbelkada in #1085
Update the release checklist by @BenjaminBossan in #1075
fix-gptq-training by @SunMarc in #1086
fix the failing CI tests by @pacman100 in #1094
Fix f-string in import_utils by @KCFindstr in #1091
Fix IA3 config for Falcon models by @SumanthRH in #1007
FIX: Failing nightly CI tests due to IA3 config by @BenjaminBossan in #1100
[core] Fix safetensors serialization for shared tensors by @younesbelkada in #1101
Change to 0.6.1.dev0 by @younesbelkada in #1102
Release: 0.6.1 by @younesbelkada in #1103
set dev version by @younesbelkada in #1104
avoid unnecessary import by @winglian in #1109
Refactor adapter deletion by @BenjaminBossan in #1105
Added num_dataloader_workers arg to fix Windows issue by @lukaskuhn-lku in #1107
Fix import issue transformers with id_tensor_storage by @younesbelkada in #1116
Correctly deal with ModulesToSaveWrapper when using Low-level API by @younesbelkada in #1112
fix doc typo by @coding-famer in #1121
Release: v0.6.2 by @pacman100 in #1125
Release: v0.6.3.dev0 by @pacman100 in #1128
FIX: Adding 2 adapters when target_modules is a str fails by @BenjaminBossan in #1111
Prompt tuning: Allow to pass additional args to AutoTokenizer.from_pretrained by @BenjaminBossan in #1053
Fix: TorchTracemalloc ruins Windows performance by @lukaskuhn-lku in #1126
TST: Improve requires grad testing: by @BenjaminBossan in #1131
FEAT: Make safe serialization the default one by @younesbelkada in #1088
FEAT: Merging only specified adapter_names when calling merge by @younesbelkada in #1132
Refactor base layer pattern by @BenjaminBossan in #1106
[Tests] Fix daily CI by @younesbelkada in #1136
[core / LoRA] Add adapter_names in bnb layers by @younesbelkada in #1139
[Tests] Do not stop tests if a job failed by @younesbelkada in #1141
CI Add Python 3.11 to test matrix by @BenjaminBossan in #1143
FIX: A few issues with AdaLora, extending GPU tests by @BenjaminBossan in #1146
Use huggingface_hub.file_exists instead of custom helper by @Wauplin in #1145
Delete IA3 adapter by @alexrs in #1153
[Docs fix] Relative path issue by @mishig25 in #1157
Dataset was loaded twice in 4-bit finetuning script by @lukaskuhn-lku in #1164
fix add_weighted_adapter method by @pacman100 in #1169
(minor) correct type annotation by @vwxyzjn in #1166
Update release checklist about release notes by @BenjaminBossan in #1170
[docs] Migrate doc files to Markdown by @stevhliu in #1171
Fix dockerfile build by @younesbelkada in #1177
FIX: Wrong use of base layer by @BenjaminBossan in #1183
[Tests] Migrate to AWS runners by @younesbelkada in #1185
Fix code example in quicktour.md by @merveenoyan in #1181
DOC Update a few places in the README by @BenjaminBossan in #1152
Fix issue where you cannot call PeftModel.from_pretrained with a private adapter by @elyxlz in #1076
Added lora support for phi by @umarbutler in #1186
add options to save or push model by @callanwu in #1159
ENH: Different initialization methods for LoRA by @BenjaminBossan in #1189
Training PEFT models with new tokens being added to the embedding layers and tokenizer by @pacman100 in #1147
LoftQ: Add LoftQ method integrated into LoRA. Add example code for LoftQ usage. by @yxli2123 in #1150
Parallel linear Lora by @zhangsheng377 in #1092
[Feature] Support OFT by @okotaku in #1160
Mixed adapter models by @BenjaminBossan in #1163
[DOCS] README.md by @Akash190104 in #1054
Fix parallel linear lora by @zhangsheng377 in #1202
ENH: Enable OFT adapter for mixed adapter models by @BenjaminBossan in #1204
DOC: Update & improve docstrings and type annotations for common methods and classes by @BenjaminBossan in #1201
remove HF tokens by @yxli2123 in #1207
[docs] Update index and quicktour by @stevhliu in #1191
[docs] API docs by @stevhliu in #1196
MNT: Delete the delete doc workflows by @BenjaminBossan in #1213
DOC: Initialization options for LoRA by @BenjaminBossan in #1218
Fix an issue with layer merging for LoHa and OFT by @lukaskuhn-lku in #1210
DOC: How to configure new transformers models by @BenjaminBossan in #1195
Raise error when modules_to_save is specified and multiple adapters are being unloaded by @pacman100 in #1137
TST: Add regression tests 2 by @BenjaminBossan in #1115
Release: 0.7.0 by @BenjaminBossan in #1214