Releases: huggingface/peft
Patch Release v0.8.1
This is a small patch release of PEFT that should:
- Fix breaking change related to support for saving resized embedding layers and Diffusers models. Contributed by @younesbelkada in #1414
What's Changed
- Release 0.8.1.dev0 by @pacman100 in #1412
- Fix breaking change by @younesbelkada in #1414
- Patch Release v0.8.1 by @pacman100 in #1415
Full Changelog: v0.8.0...v0.8.1
v0.8.0: Poly PEFT method, LoRA improvements, Documentation improvements and more
Highlights
Poly PEFT method
Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists of pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (𝙿𝚘𝚕𝚢) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. To put simply, you can think of it as Mixture of Expert Adapters.
𝙼𝙷𝚁 (Multi-Head Routing) combines subsets of adapter parameters and outperforms 𝙿𝚘𝚕𝚢 under a comparable parameter budget; by only fine-tuning the routing function and not the adapters (𝙼𝙷𝚁-z) they achieve competitive performance with extreme parameter efficiency.
- Add Poly by @TaoSunVoyage in #1129
LoRA improvements
Now, you can specify all-linear
to target_modules
param of LoraConfig
to target all the linear layers which has shown to perform better in QLoRA paper than only targeting query and valuer attention layers
- Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295
Embedding layers of base models are now automatically saved when the embedding layers are resized when fine-tuning with PEFT approaches like LoRA. This enables extending the vocabulary of tokenizer to include special tokens. This is a common use-case when doing the following:
- Instruction finetuning with new tokens being added such as <|user|>, <|assistant|>, <|system|>, <|im_end|>, <|im_start|>, </s>, <s> to properly format the conversations
- Finetuning on a specific language wherein language specific tokens are added, e.g., Korean tokens being added to vocabulary for finetuning LLM on Korean datasets.
- Instruction finetuning to return outputs in a certain format to enable agent behaviour of new tokens such as <|FUNCTIONS|>, <|BROWSE|>, <|TEXT2IMAGE|>, <|ASR|>, <|TTS|>, <|GENERATECODE|>, <|RAG|>.
A good blogpost to learn more about this https://www.philschmid.de/fine-tune-llms-in-2024-with-trl.
- save the embeddings even when they aren't targetted but resized by @pacman100 in #1383
New option use_rslora
in LoraConfig. Use it for ranks greater than 32 and see the increase in fine-tuning performance (same or better performance for ranks lower than 32 as well).
- Added the option to use the corrected scaling factor for LoRA, based on new research. by @Damjan-Kalajdzievski in #1244
Documentation improvements
- Refactoring and updating of the concept guides. [docs] Concept guides by @stevhliu in #1269
- Improving task guides to focus more on how to use different PEFT methods and related nuances instead of focusing more on different type of tasks. It condenses the individual guides into a single one to highlight the commonalities and differences, and to refer to existing docs to avoid duplication. [docs] Task guides by @stevhliu in #1332
- DOC: Update docstring for the config classes by @BenjaminBossan in #1343
- LoftQ: edit README.md and example files by @yxli2123 in #1276
- [Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
- DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
- [docs] Docstring link by @stevhliu in #1356
- QOL improvements and doc updates by @pacman100 in #1318
- Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
- DOC: Improve target modules description by @BenjaminBossan in #1290
- DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
- DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
- Improve documentation for the
all-linear
flag by @SumanthRH in #1357 - Fix various typos in LoftQ docs. by @arnavgarg1 in #1408
What's Changed
- Bump version to 0.7.2.dev0 post release by @BenjaminBossan in #1258
- FIX Error in log_reports.py by @BenjaminBossan in #1261
- Fix ModulesToSaveWrapper getattr by @zhangsheng377 in #1238
- TST: Revert device_map for AdaLora 4bit GPU test by @BenjaminBossan in #1266
- remove a duplicated description in peft BaseTuner by @butyuhao in #1271
- Added the option to use the corrected scaling factor for LoRA, based on new research. by @Damjan-Kalajdzievski in #1244
- feat: add apple silicon GPU acceleration by @NripeshN in #1217
- LoftQ: Allow quantizing models loaded on the CPU for LoftQ initialization by @hiyouga in #1256
- LoftQ: edit README.md and example files by @yxli2123 in #1276
- TST: Extend LoftQ tests to check CPU initialization by @BenjaminBossan in #1274
- Refactor and a couple of fixes for adapter layer updates by @BenjaminBossan in #1268
- [
Tests
] Add bitsandbytes installed from source on new docker images by @younesbelkada in #1275 - TST: Enable LoftQ 8bit tests by @BenjaminBossan in #1279
- [
bnb
] Add bnb nightly workflow by @younesbelkada in #1282 - Fixed several errors in StableDiffusion adapter conversion script by @kovalexal in #1281
- [docs] Concept guides by @stevhliu in #1269
- DOC: Improve target modules description by @BenjaminBossan in #1290
- [
bnb-nightly
] Address final comments by @younesbelkada in #1287 - [BNB] Fix bnb dockerfile for latest version by @SunMarc in #1291
- fix fsdp auto wrap policy by @pacman100 in #1302
- [BNB] fix dockerfile for single gpu by @SunMarc in #1305
- Fix bnb lora layers not setting active adapter by @tdrussell in #1294
- Mistral IA3 config defaults by @pacman100 in #1316
- fix the embedding saving for adaption prompt by @pacman100 in #1314
- fix diffusers tests by @pacman100 in #1317
- FIX Use torch.long instead of torch.int in LoftQ for PyTorch versions <2.x by @BenjaminBossan in #1320
- Extend merge_and_unload to offloaded models by @blbadger in #1190
- Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295
- Refactor dispatching logic of LoRA layers by @BenjaminBossan in #1319
- Fix bug when load the prompt tuning in inference. by @yileld in #1333
- DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
- ENH: Add attribute to show targeted module names by @BenjaminBossan in #1330
- fix some args desc by @zspo in #1338
- Fix logic in target module finding by @s-k-yx in #1263
- Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
- DOC: Update docstring for the config classes by @BenjaminBossan in #1343
- fix
prepare_inputs_for_generation
logic for Prompt Learning methods by @pacman100 in #1352 - QOL improvements and doc updates by @pacman100 in #1318
- New transformers caching ETA now v4.38 by @BenjaminBossan in #1348
- FIX Setting active adapter for quantized layers by @BenjaminBossan in #1347
- DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
- [Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
- DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
- Add Poly by @TaoSunVoyage in #1129
- [docs] Docstring link by @stevhliu in #1356
- Added missing getattr dunder methods for mixed model by @kovalexal in #1365
- Handle resizing of embedding layers for AutoPeftModel by @pacman100 in #1367
- account for the new merged/unmerged weight to perform the quantization again by @pacman100 in #1370
- add mixtral in LoRA mapping by @younesbelkada in https://github.com/h...
v0.7.1 patch release
This is a small patch release of PEFT that should handle:
- Issues with loading multiple adapters when using quantized models (#1243)
- Issues with transformers v4.36 and some prompt learning methods (#1252)
What's Changed
- [docs] OFT by @stevhliu in #1221
- Bump version to 0.7.1.dev0 post release by @BenjaminBossan in #1227
- Don't set config attribute on custom models by @BenjaminBossan in #1200
- TST: Run regression test in nightly test runner by @BenjaminBossan in #1233
- Lazy import of bitsandbytes by @BenjaminBossan in #1230
- FIX: Pin bitsandbytes to <0.41.3 temporarily by @BenjaminBossan in #1234
- [docs] PeftConfig and PeftModel by @stevhliu in #1211
- TST: Add tolerance for regression tests by @BenjaminBossan in #1241
- Bnb integration test tweaks by @Titus-von-Koeller in #1242
- [docs] PEFT integrations by @stevhliu in #1224
- Revert "FIX Pin bitsandbytes to <0.41.3 temporarily (#1234)" by @Titus-von-Koeller in #1250
- Fix model argument issue (#1198) by @ngocbh in #1205
- TST: Add tests for 4bit LoftQ by @BenjaminBossan in #1208
- [docs] Quantization by @stevhliu in #1236
- FIX: Truncate slack message to not exceed 3000 chars by @BenjaminBossan in #1251
- Issue with transformers 4.36 by @BenjaminBossan in #1252
- Fix: Multiple adapters with bnb layers by @BenjaminBossan in #1243
- Release: 0.7.1 by @BenjaminBossan in #1257
New Contributors
- @Titus-von-Koeller made their first contribution in #1242
- @ngocbh made their first contribution in #1205
Full Changelog: v0.7.0...v0.7.1
v0.7.0: Orthogonal Fine-Tuning, Megatron support, better initialization, safetensors, and more
Highlights
- Orthogonal Fine-Tuning (OFT): A new adapter that is similar to LoRA and shows a lot of promise for Stable Diffusion, especially with regard to controllability and compositionality. Give it a try! By @okotaku in #1160
- Support for parallel linear LoRA layers using Megatron. This should lead to a speed up when using LoRA with Megatron. By @zhangsheng377 in #1092
- LoftQ provides a new method to initialize LoRA layers of quantized models. The big advantage is that the LoRA layer weights are chosen in a way to minimize the quantization error, as described here: https://arxiv.org/abs/2310.08659. By @yxli2123 in #1150.
Other notable additions
- It is now possible to choose which adapters are merged when calling
merge
(#1132) - IA³ now supports adapter deletion, by @alexrs (#1153)
- A new initialization method for LoRA has been added,
"gaussian"
(#1189) - When training PEFT models with new tokens being added to the embedding layers, the embedding layer is now saved by default (#1147)
- It is now possible to mix certain adapters like LoRA and LoKr in the same model, see the docs (#1163)
- We started an initiative to improve the documenation, some of which should already be reflected in the current docs. Still, help by the community is always welcome. Check out this issue to get going.
Migration to v0.7.0
- Safetensors are now the default format for PEFT adapters. In practice, users should not have to change anything in their code, PEFT takes care of everything -- just be aware that instead of creating a file
adapter_model.bin
, callingsave_pretrained
now createsadapter_model.safetensors
. Safetensors have numerous advantages over pickle files (which is the PyTorch default format) and well supported on Hugging Face Hub. - When merging multiple LoRA adapter weights together using
add_weighted_adapter
with the optioncombination_type="linear"
, the scaling of the adapter weights is now performed differently, leading to improved results. - There was a big refactor of the inner workings of some PEFT adapters. For the vast majority of users, this should not make any difference (except making some code run faster). However, if your code is relying on PEFT internals, be aware that the inheritance structure of certain adapter layers has changed (e.g.
peft.lora.Linear
is no longer a subclass ofnn.Linear
, soisinstance
checks may need updating). Also, to retrieve the original weight of an adapted layer, now useself.get_base_layer().weight
, notself.weight
(same forbias
).
What's Changed
As always, a bunch of small improvements, bug fixes and doc improvements were added. We thank all the external contributors, both new and recurring. Below is the list of all changes since the last release.
- After release: Bump version to 0.7.0.dev0 by @BenjaminBossan in #1074
- FIX: Skip adaption prompt tests with new transformers versions by @BenjaminBossan in #1077
- FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) by @younesbelkada in #1084
- Improve documentation for IA³ by @SumanthRH in #984
- [
Docker
] Update Dockerfile to force-use transformers main by @younesbelkada in #1085 - Update the release checklist by @BenjaminBossan in #1075
- fix-gptq-training by @SunMarc in #1086
- fix the failing CI tests by @pacman100 in #1094
- Fix f-string in import_utils by @KCFindstr in #1091
- Fix IA3 config for Falcon models by @SumanthRH in #1007
- FIX: Failing nightly CI tests due to IA3 config by @BenjaminBossan in #1100
- [
core
] Fix safetensors serialization for shared tensors by @younesbelkada in #1101 - Change to 0.6.1.dev0 by @younesbelkada in #1102
- Release: 0.6.1 by @younesbelkada in #1103
- set dev version by @younesbelkada in #1104
- avoid unnecessary import by @winglian in #1109
- Refactor adapter deletion by @BenjaminBossan in #1105
- Added num_dataloader_workers arg to fix Windows issue by @lukaskuhn-lku in #1107
- Fix import issue transformers with
id_tensor_storage
by @younesbelkada in #1116 - Correctly deal with
ModulesToSaveWrapper
when using Low-level API by @younesbelkada in #1112 - fix doc typo by @coding-famer in #1121
- Release: v0.6.2 by @pacman100 in #1125
- Release: v0.6.3.dev0 by @pacman100 in #1128
- FIX: Adding 2 adapters when target_modules is a str fails by @BenjaminBossan in #1111
- Prompt tuning: Allow to pass additional args to AutoTokenizer.from_pretrained by @BenjaminBossan in #1053
- Fix: TorchTracemalloc ruins Windows performance by @lukaskuhn-lku in #1126
- TST: Improve requires grad testing: by @BenjaminBossan in #1131
- FEAT: Make safe serialization the default one by @younesbelkada in #1088
- FEAT: Merging only specified
adapter_names
when callingmerge
by @younesbelkada in #1132 - Refactor base layer pattern by @BenjaminBossan in #1106
- [
Tests
] Fix daily CI by @younesbelkada in #1136 - [
core
/LoRA
] Addadapter_names
in bnb layers by @younesbelkada in #1139 - [
Tests
] Do not stop tests if a job failed by @younesbelkada in #1141 - CI Add Python 3.11 to test matrix by @BenjaminBossan in #1143
- FIX: A few issues with AdaLora, extending GPU tests by @BenjaminBossan in #1146
- Use
huggingface_hub.file_exists
instead of custom helper by @Wauplin in #1145 - Delete IA3 adapter by @alexrs in #1153
- [Docs fix] Relative path issue by @mishig25 in #1157
- Dataset was loaded twice in 4-bit finetuning script by @lukaskuhn-lku in #1164
- fix
add_weighted_adapter
method by @pacman100 in #1169 - (minor) correct type annotation by @vwxyzjn in #1166
- Update release checklist about release notes by @BenjaminBossan in #1170
- [docs] Migrate doc files to Markdown by @stevhliu in #1171
- Fix dockerfile build by @younesbelkada in #1177
- FIX: Wrong use of base layer by @BenjaminBossan in #1183
- [
Tests
] Migrate to AWS runners by @younesbelkada in #1185 - Fix code example in quicktour.md by @merveenoyan in #1181
- DOC Update a few places in the README by @BenjaminBossan in #1152
- Fix issue where you cannot call PeftModel.from_pretrained with a private adapter by @elyxlz in #1076
- Added lora support for phi by @umarbutler in #1186
- add options to save or push model by @callanwu in #1159
- ENH: Different initialization methods for LoRA by @BenjaminBossan in #1189
- Training PEFT models with new tokens being added to the embedding layers and tokenizer by @pacman100 in #1147
- LoftQ: Add LoftQ method integrated into LoRA. Add example code for LoftQ usage. by @yxli2123 in #1150
- Parallel linear Lora by @zhangsheng377 in #1092
- [Feature] Support OFT by @okotaku in #1160
- Mixed adapter models by @BenjaminBossan in #1163
- [DOCS] README.md by @Akash190104 in #1054
- Fix parallel linear lora by @zhangsheng377 in #1202
- ENH: Enable OFT adapter for mixed adapter models by @BenjaminBossan in #1204
- DOC: Update & improve docstrings and type annotations for common methods and classes by @BenjaminBossan in https://g...
v0.6.2 Patch Release: Refactor of adapter deletion API and fixes to `ModulesToSaveWrapper` when using Low-level API
This patch release refactors the adapter deletion API and fixes to ModulesToSaveWrapper
when using Low-level API.
Refactor adapter deletion
- Refactor adapter deletion by @BenjaminBossan in #1105
Fix ModulesToSaveWrapper
when using Low-level API
- Correctly deal with
ModulesToSaveWrapper
when using Low-level API by @younesbelkada in #1112
What's Changed
What's Changed
- Release: 0.6.1 by @younesbelkada in #1103
- set dev version by @younesbelkada in #1104
- avoid unnecessary import by @winglian in #1109
- Refactor adapter deletion by @BenjaminBossan in #1105
- Added num_dataloader_workers arg to fix Windows issue by @lukaskuhn-lku in #1107
- Fix import issue transformers with
id_tensor_storage
by @younesbelkada in #1116 - Correctly deal with
ModulesToSaveWrapper
when using Low-level API by @younesbelkada in #1112 - fix doc typo by @coding-famer in #1121
New Contributors
- @winglian made their first contribution in #1109
- @lukaskuhn-lku made their first contribution in #1107
- @coding-famer made their first contribution in #1121
Full Changelog: v0.6.1...v0.6.2
0.6.1 Patch Release: compatibility of Adaptation Prompt with transformers 4.35.0
This patch release fixes the compatbility issues with Adaptation Prompt that users faced with transformers 4.35.0. Moreover, it fixes an issue with token classification PEFT models when saving them using safetensors
Adaptation prompt fixes
- FIX: Skip adaption prompt tests with new transformers versions by @BenjaminBossan in #1077
- FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) by @younesbelkada in #1084
Safetensors fixes:
- [
core
] Fix safetensors serialization for shared tensors by @younesbelkada in #1101
What's Changed
- After release: Bump version to 0.7.0.dev0 by @BenjaminBossan in #1074
- Improve documentation for IA³ by @SumanthRH in #984
- [
Docker
] Update Dockerfile to force-use transformers main by @younesbelkada in #1085 - Update the release checklist by @BenjaminBossan in #1075
- fix-gptq-training by @SunMarc in #1086
- fix the failing CI tests by @pacman100 in #1094
- Fix f-string in import_utils by @KCFindstr in #1091
- Fix IA3 config for Falcon models by @SumanthRH in #1007
- FIX: Failing nightly CI tests due to IA3 config by @BenjaminBossan in #1100
- Change to 0.6.1.dev0 by @younesbelkada in #1102
New Contributors
- @KCFindstr made their first contribution in #1091
Full Changelog: v0.6.0...v0.6.1
🧨 Diffusers now uses 🤗 PEFT, new tuning methods, better quantization support, higher flexibility and more
Highlights
Integration with diffusers
🧨 Diffusers now leverage PEFT as a backend for LoRA inference for Stable Diffusion models (#873, #993, #961). Relevant PRs on 🧨 Diffusers are huggingface/diffusers#5058, huggingface/diffusers#5147, huggingface/diffusers#5151 and huggingface/diffusers#5359. This helps in unlocking a vast number of practically demanding use cases around adapter-based inference 🚀. Now you can do the following with easy-to-use APIs and it supports different checkpoint formats (Diffusers format, Kohya format ...):
- use multiple LoRAs
- switch between them instantaneously
- scale and combine them
- merge/unmerge
- enable/disable
For details, refer to the documentation at Inference with PEFT.
New tuning methods
- Multitask Prompt Tuning: Thanks @mayank31398 for implementing this method from https://arxiv.org/abs/2303.02861 (#400)
- LoHa (low-rank Hadamard product): @kovalexal did a great job adding LoHa from https://arxiv.org/abs/2108.06098 (#956)
- LoKr (Kronecker Adapter): Not happy with just one new adapter, @kovalexal also added LoKr from https://arxiv.org/abs/2212.10650 to PEFT (#978)
Other notable additions
- Allow merging of LoRA weights when using 4bit and 8bit quantization (bitsandbytes), thanks to @jiqing-feng (#851, #875)
- IA³ now supports 4bit quantization thanks to @His-Wardship (#864)
- We increased the speed of adapter layer initialization: This should be most notable when creating a PEFT LoRA model on top of a large base model (#887, #915, #994)
- More fine-grained control when configuring LoRA: It is now possible to have different ranks and alpha values for different layers (#873)
Experimental features
- For some adapters like LoRA, it is now possible to activate multiple adapters at the same time (#873)
Breaking changes
- It is no longer allowed to create a LoRA adapter with rank 0 (
r=0
). This used to be possible, in which case the adapter was ignored.
What's Changed
As always, a bunch of small improvements, bug fixes and doc improvements were added. We thank all the external contributors, both new and recurring. Below is the list of all changes since the last release.
- Fixed typos in custom_models.mdx by @Psancs05 in #847
- Release version 0.6.0.dev0 by @pacman100 in #849
- DOC: Add a contribution guide by @BenjaminBossan in #848
- clarify the new model size by @stas00 in #839
- DOC: Remove backlog section from README.md by @BenjaminBossan in #853
- MNT: Refactor tuner forward methods for simplicity by @BenjaminBossan in #833
- 🎉 Add Multitask Prompt Tuning by @mayank31398 in #400
- Fix typos in ia3.py by @metaprotium in #844
- Support merge lora module for 4bit and 8bit linear by @jiqing-feng in #851
- Fix seq2seq prompt tuning (#439) by @glerzing in #809
- MNT: Move tuners to subpackages by @BenjaminBossan in #807
- FIX: Error in forward of 4bit linear lora layer by @BenjaminBossan in #878
- MNT: Run tests that were skipped previously by @BenjaminBossan in #884
- FIX: PeftModel save_pretrained Doc (#881) by @houx15 in #888
- Upgrade docker actions to higher versions by @younesbelkada in #889
- Fix error using deepspeed zero2 + load_in_8bit + lora by @tmm1 in #874
- Fix doc for semantic_segmentation_lora by @raghavanone in #891
- fix_gradient_accumulation_steps_in_examples by @zspo in #898
- FIX: linting issue in example by @BenjaminBossan in #908
- ENH Remove redundant initialization layer calls by @BenjaminBossan in #887
- [docs] Remove duplicate section by @stevhliu in #911
- support prefix tuning for starcoder models by @pacman100 in #913
- Merge lora module to 8bit model by @jiqing-feng in #875
- DOC: Section on common issues encountered with PEFT by @BenjaminBossan in #909
- Enh speed up init emb conv2d by @BenjaminBossan in #915
- Make base_model.peft_config single source of truth by @BenjaminBossan in #921
- Update accelerate dependency version by @rohithkrn in #892
- fix lora layer init by @SunMarc in #928
- Fixed LoRA conversion for kohya_ss by @kovalexal in #916
- [
CI
] Pin diffusers by @younesbelkada in #936 - [
LoRA
] Add scale_layer / unscale_layer by @younesbelkada in #935 - TST: Add GH action to run unit tests with torch.compile by @BenjaminBossan in #943
- FIX: torch compile gh action installs pytest by @BenjaminBossan in #944
- Fix NotImplementedError for no bias. by @Datta0 in #946
- TST: Fix some tests that would fail with torch.compile by @BenjaminBossan in #949
- ENH Allow compile GH action to run on torch nightly by @BenjaminBossan in #952
- Install correct PyTorch nightly in GH action by @BenjaminBossan in #954
- support multiple ranks and alphas for LoRA by @pacman100 in #873
- feat: add type hints by @SauravMaheshkar in #858
- FIX: setting requires_grad on adapter layers by @BenjaminBossan in #905
- [
tests
] add transformers & diffusers integration tests by @younesbelkada in #962 - Fix integrations_tests.yml by @younesbelkada in #965
- Add 4-bit support to IA3 - Outperforms QLoRA in both speed and memory consumption by @His-Wardship in #864
- Update integrations_tests.yml by @younesbelkada in #966
- add the lora target modules for Mistral Models by @pacman100 in #974
- TST: Fix broken save_pretrained tests by @BenjaminBossan in #969
- [tests] add multiple active adapters tests by @pacman100 in #961
- Fix missing tokenizer attribute in test by @BenjaminBossan in #977
- Add implementation of LyCORIS LoHa (FedPara-like adapter) for SD&SDXL models by @kovalexal in #956
- update BibTeX by @pacman100 in #989
- FIX: issues with (un)merging multiple LoRA and IA³ adapters by @BenjaminBossan in #976
- add lora target modules for stablelm models by @kbulutozler in #982
- Correct minor errors in example notebooks for causal language modelling by @SumanthRH in #926
- Fix typo in custom_models.mdx by @Pairshoe in #964
- Add base model metadata to model card by @BenjaminBossan in #975
- MNT Make .merged a property by @BenjaminBossan in #979
- Fix lora creation by @pacman100 in #993
- TST: Comment out flaky LoHA test by @BenjaminBossan in #1002
- ENH Support Conv2d layers for IA³ by @BenjaminBossan in #972
- Fix word_embeddings match for deepspeed wrapped model by @mayank31398 in #1000
- FEAT: Add
safe_merge
option inmerge
by @younesbelkada in #1001 - [
core
/LoRA
] Addsafe_merge
to bnb layers by @younesbelkada in #1009 - ENH: Refactor LoRA bnb layers for faster initialization by @BenjaminBossan in #994
- FIX Don't assume model_config contains the key model_type by @BenjaminBossan in #1012
- FIX stale.py uses timezone-aware datetime by @BenjaminBossan in #1016
- FEAT: Add fp16 + cpu merge support by @younesbelkada in #1017
- fix lora scaling and unscaling by @pacman100 in #1027
- [
LoRA
] Revert original behavior for scale / unscale by @younesbelkada in #1029 - [
LoRA
] Raise error when adapter name not found inset_scale
by @you...
GPTQ Quantization, Low-level API
GPTQ Integration
Now, you can finetune GPTQ quantized models using PEFT. Here are some examples of how to use PEFT with a GPTQ model: colab notebook and finetuning script.
Low-level API
Enables users and developers to use PEFT as a utility library, at least for injectable adapters (LoRA, IA3, AdaLoRA). It exposes an API to modify the model in place to inject the new layers into the model.
- [
core
] PEFT refactor + introducing inject_adapter_in_model public method by @younesbelkada #749 - [
Low-level-API
] Add docs about LLAPI by @younesbelkada in #836
Support for XPU and NPU devices
Leverage the support for more devices for loading and fine-tuning PEFT adapters.
- Support XPU adapter loading by @abhilash1910 in #737
- Support Ascend NPU adapter loading by @statelesshz in #772
Mix-and-match LoRAs
Stable support and new ways of merging multiple LoRAs. There are currently 3 ways of merging loras supported: linear
, svd
and cat
.
- Added additional parameters to mixing multiple LoRAs through SVD, added ability to mix LoRAs through concatenation by @kovalexal in #817
What's Changed
- Release version 0.5.0.dev0 by @pacman100 in #717
- Fix subfolder issue by @younesbelkada in #721
- Add falcon to officially supported LoRA & IA3 modules by @younesbelkada in #722
- revert change by @pacman100 in #731
- fix(pep561): include packaging type information by @aarnphm in #729
- [
Llama2
] Add disabling TP behavior by @younesbelkada in #728 - [
Patch
] patch trainable params for 4bit layers by @younesbelkada in #733 - FIX: Warning when initializing prompt encoder by @BenjaminBossan in #716
- ENH: Warn when disabling adapters and bias != 'none' by @BenjaminBossan in #741
- FIX: Disabling adapter works with modules_to_save by @BenjaminBossan in #736
- Updated Example in Class:LoraModel by @TianyiPeng in #672
- [
AdaLora
] Fix adalora inference issue by @younesbelkada in #745 - Add btlm to officially supported LoRA by @Trapper4888 in #751
- [
ModulesToSave
] add correct hook management for modules to save by @younesbelkada in #755 - Example notebooks for LoRA with custom models by @BenjaminBossan in #724
- Add tests for AdaLoRA, fix a few bugs by @BenjaminBossan in #734
- Add progressbar unload/merge by @BramVanroy in #753
- Support XPU adapter loading by @abhilash1910 in #737
- Support Ascend NPU adapter loading by @statelesshz in #772
- Allow passing inputs_embeds instead of input_ids by @BenjaminBossan in #757
- [
core
] PEFT refactor + introducinginject_adapter_in_model
public method by @younesbelkada in #749 - Add adapter error handling by @BenjaminBossan in #800
- add lora default target module for codegen by @sywangyi in #787
- DOC: Update docstring of PeftModel.from_pretrained by @BenjaminBossan in #799
- fix crash when using torch.nn.DataParallel for LORA inference by @sywangyi in #805
- Peft model signature by @kiansierra in #784
- GPTQ Integration by @SunMarc in #771
- Only fail quantized Lora unload when actually merging by @BlackHC in #822
- Added additional parameters to mixing multiple LoRAs through SVD, added ability to mix LoRAs through concatenation by @kovalexal in #817
- TST: add test about loading custom models by @BenjaminBossan in #827
- Fix unbound error in ia3.py by @His-Wardship in #794
- [
Docker
] Fix gptq dockerfile by @younesbelkada in #835 - [
Tests
] Add 4bit slow training tests by @younesbelkada in #834 - [
Low-level-API
] Add docs about LLAPI by @younesbelkada in #836 - Type annotation fix by @vwxyzjn in #840
New Contributors
- @TianyiPeng made their first contribution in #672
- @Trapper4888 made their first contribution in #751
- @abhilash1910 made their first contribution in #737
- @statelesshz made their first contribution in #772
- @kiansierra made their first contribution in #784
- @BlackHC made their first contribution in #822
- @His-Wardship made their first contribution in #794
- @vwxyzjn made their first contribution in #840
Full Changelog: v0.4.0...v0.5.0
QLoRA, IA3 PEFT method, support for QA and Feature Extraction tasks, AutoPeftModelForxxx for simplified UX , LoRA for custom models with new added utils
QLoRA Support:
QLoRA uses 4-bit quantization to compress a pretrained language model. The LM parameters are then frozen and a relatively small number of trainable parameters are added to the model in the form of Low-Rank Adapters. During finetuning, QLoRA backpropagates gradients through the frozen 4-bit quantized pretrained language model into the Low-Rank Adapters. The LoRA layers are the only parameters being updated during training. For more details read the blog Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
- 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #476
- [
core
] Protect 4bit import by @younesbelkada in #480 - [
core
] Raise warning on usingprepare_model_for_int8_training
by @younesbelkada in #483
New PEFT methods: IA3 from T-Few paper
To make fine-tuning more efficient, IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) rescales inner activations with learned vectors. These learned vectors are injected into the attention and feedforward modules in a typical transformer-based architecture. These learned vectors are the only trainable parameters during fine-tuning, and thus the original weights remain frozen. Dealing with learned vectors (as opposed to learned low-rank updates to a weight matrix like LoRA) keeps the number of trainable parameters much smaller. For more details, read the paper Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
- Add functionality to support IA3 by @SumanthRH in #578
Support for new tasks: QA and Feature Extraction
Addition of PeftModelForQuestionAnswering
and PeftModelForFeatureExtraction
classes to support QA and Feature Extraction tasks, respectively. This enables exciting new use-cases with PEFT, e.g., LoRA for semantic similarity tasks.
- feat: Add PeftModelForQuestionAnswering by @sjrl in #473
- add support for Feature Extraction using PEFT by @pacman100 in #647
AutoPeftModelForxxx for better and Simplified UX
Introduces a new paradigm, AutoPeftModelForxxx intended for users that want to rapidly load and run peft models.
from peft import AutoPeftModelForCausalLM
peft_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")
- Introducing
AutoPeftModelForxxx
by @younesbelkada in #694
LoRA for custom models
Not a transformer model, no problem, we have got you covered. PEFT now enables the usage of LoRA with custom models.
- FEAT: Make LoRA work with custom models by @BenjaminBossan in #676
New LoRA utilities
Improvements to add_weighted_adapter
method to support SVD for combining multiple LoRAs when creating new LoRA.
New utils such as unload
and delete_adapter
providing users much better control about how they deal with the adapters.
- [Core] Enhancements and refactoring of LoRA method by @pacman100 in #695
PEFT and Stable Diffusion
PEFT is very extensible and easy to use for performing DreamBooth of Stable Diffusion. Community has added conversion scripts to be able to use PEFT models with Civitai/webui format and vice-versa.
- LoRA for Conv2d layer, script to convert kohya_ss LoRA to PEFT by @kovalexal in #461
- Added Civitai LoRAs conversion to PEFT, PEFT LoRAs conversion to webui by @kovalexal in #596
- [Bugfix] Fixed LoRA conv2d merge by @kovalexal in #637
- Fixed LoraConfig alpha modification on add_weighted_adapter by @kovalexal in #654
What's Changed
- Release: v0.4.0.dev0 by @pacman100 in #391
- do not use self.device. In FSDP cpu offload mode. self.device is "CPU… by @sywangyi in #352
- add accelerate example for DDP and FSDP in sequence classification fo… by @sywangyi in #358
- [
CI
] Fix CI - pin urlib by @younesbelkada in #402 - [docs] Fix index by @stevhliu in #397
- Fix documentation links on index page by @mikeorzel in #406
- Zero 3 init ReadME update by @dumpmemory in #399
- [
Tests
] Add soundfile to docker images by @younesbelkada in #401 - 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #476
- [
core
] Protect 4bit import by @younesbelkada in #480 - [
core
] Raise warning on usingprepare_model_for_int8_training
by @younesbelkada in #483 - Remove merge_weights by @Atry in #392
- [
core
] Add gradient checkpointing check by @younesbelkada in #404 - [docs] Fix LoRA image classification docs by @stevhliu in #524
- [docs] Prettify index by @stevhliu in #478
- change comment in tuners.lora, lora_alpha float to int by @codingchild2424 in #448
- [
LoRA
] Allow applying LoRA at different stages by @younesbelkada in #429 - Enable PeftConfig & PeftModel to load from revision by @lewtun in #433
- [
Llama-Adapter
] fix half precision inference + add tests by @younesbelkada in #456 - fix merge_and_unload when LoRA targets embedding layer by @0x000011b in #438
- return load_result when load_adapter by @dkqkxx in #481
- Fixed problem with duplicate same code. by @hotchpotch in #517
- Add starcoder model to target modules dict by @mrm8488 in #528
- Fix a minor typo where a non-default token_dim would crash prompt tuning by @thomas-schillaci in #459
- Remove device_map when training 4,8-bit model. by @SunMarc in #534
- add library name to model card by @pacman100 in #549
- Add thousands separator in print_trainable_parameters by @BramVanroy in #443
- [doc build] Use secrets by @mishig25 in #556
- improve readability of LoRA code by @martin-liu in #409
- [
core
] Add safetensors integration by @younesbelkada in #553 - [
core
] Fix config kwargs by @younesbelkada in #561 - Fix typo and url to
openai/whisper-large-v2
by @alvarobartt in #563 - feat: add type hint to
get_peft_model
by @samsja in #566 - Add issues template by @younesbelkada in #562
- [BugFix] Set alpha and dropout defaults in LoraConfig by @apbard in #390
- enable lora for mpt by @sywangyi in #576
- Fix minor typo in bug-report.yml by @younesbelkada in #582
- [
core
] Correctly passing the kwargs all over the place by @younesbelkada in #575 - Fix adalora device mismatch issue by @younesbelkada in #583
- LoRA for Conv2d layer, script to convert kohya_ss LoRA to PEFT by @kovalexal in #461
- Fix typo at peft_model.py by @Beomi in #588
- [
test
] Adds more CI tests by @younesbelkada in #586 - when from_pretrained is called in finetune case of lora with flag "… by @sywangyi in #591
- feat: Add PeftModelForQuestionAnswering by @sjrl in #473
- Improve the README when using PEFT by @pacman100 in #594
- [
tests
] Fix dockerfile by @younesbelkada in #608 - Fix final failing slow tests by @younesbelkada in #609
- [
core
] Addadapter_name
inget_peft_model
by @younesbelkada in #610 - [
core
] Stronger import of bnb by @younesbelkada in #605 - Added Civitai LoRAs conversion to PEFT, PEFT LoRAs conversion to webui by @kovalexal in #596
- update whisper test by @pacman100 in #617
- Update README.md, citation by @pminervini in #616
- Update train_dreambooth.py by @nafiturgut in #624
- [
Adalora
] Add adalora 4bit by @younesbelkada in #598 - [
AdaptionPrompt
] Add 8bit + 4bit support for...
Docs, Testing Suite, Multi Adapter Support, New methods and examples
Brand new Docs
With task guides, conceptual guides, integration guides, and code references all available at your fingertips, 🤗 PEFT's docs (found at https://huggingface.co/docs/peft) provide an insightful and easy-to-follow resource for anyone looking to how to use 🤗 PEFT. Whether you're a seasoned pro or just starting out, PEFT's documentation will help you to get the most out of it.
- [WIP-docs] Accelerate scripts by @stevhliu in #355
- [docs] Quicktour update by @stevhliu in #346
- [docs] Conceptual overview of prompting methods by @stevhliu in #339
- [docs] LoRA for token classification by @stevhliu in #302
- [docs] int8 training by @stevhliu in #332
- [docs] P-tuning for sequence classification by @stevhliu in #281
- [docs] Prompt tuning for CLM by @stevhliu in #264
- [docs] Prefix tuning for Seq2Seq by @stevhliu in #272
- [docs] Add API references by @stevhliu in #241
- [docs] Build notebooks from Markdown by @stevhliu in #240
- [docs] Supported models tables by @MKhalusova in #364
- [docs] Task guide with Dreambooth LoRA example by @MKhalusova in #330
- [docs] LoRA conceptual guide by @MKhalusova in #331
- [docs] Task guide for semantic segmentation with LoRA by @MKhalusova in #307
- Move image classification example to the docs by @MKhalusova in #239
Comprehensive Testing Suite
Comprised of both unit and integration tests, it rigorously tests core features, examples, and various models on different setups, including single and multiple GPUs. This commitment to testing helps ensure that PEFT maintains the highest levels of correctness, usability, and performance, while continuously improving in all areas.
- [
CI
] Add ci tests by @younesbelkada in #203 - Fix CI tests by @younesbelkada in #210
- [
CI
] Add more ci tests by @younesbelkada in #223 - [
tests
] Adds more tests + fix failing tests by @younesbelkada in #238 - [
tests
] Adds GPU tests by @younesbelkada in #256 - [
tests
] add slow tests to GH workflow by @younesbelkada in #304 - [
core
] Better log messages by @younesbelkada in #366
Multi Adapter Support
PEFT just got even more versatile with its new Multi Adapter Support! Now you can train and infer with multiple adapters, or even combine multiple LoRA adapters in a weighted combination. This is especially handy for RLHF training, where you can save memory by using a single base model with multiple adapters for actor, critic, reward, and reference. And the icing on the cake? Check out the LoRA Dreambooth inference example notebook to see this feature in action.
- Multi Adapter support by @pacman100 in #263
New PEFT methods: AdaLoRA and Adaption Prompt
PEFT just got even better, thanks to the contributions of the community! The AdaLoRA method is one of the exciting new additions. It takes the highly regarded LoRA method and improves it by allocating trainable parameters across the model to maximize performance within a given parameter budget. Another standout is the Adaption Prompt method, which enhances the already popular Prefix Tuning by introducing zero init attention.
- The Implementation of AdaLoRA (ICLR 2023) by @QingruZhang in #233
- Implement adaption prompt from Llama-Adapter paper by @yeoedward in #268
New LoRA utilities
Good news for LoRA users! PEFT now allows you to merge LoRA parameters into the base model's parameters, giving you the freedom to remove the PEFT wrapper and apply downstream optimizations related to inference and deployment. Plus, you can use all the features that are compatible with the base model without any issues.
- [
utils
] add merge_lora utility function by @younesbelkada in #227 - Add nn.Embedding Support to Lora by @Splo2t in #337
What's Changed
- release v0.3.0.dev0 by @pacman100 in #166
- fixing merged_linear lora issues by @pacman100 in #172
- Replace base_model's function temporarily by @PanQiWei in #170
- Support for LLaMA models by @zphang in #160
- [
core
] Fix peft multi-gpu issue by @younesbelkada in #145 - Update README.md by @dumpmemory in #167
- ChatGLM support by @mymusise in #180
- [
CI
] Add ci tests by @younesbelkada in #203 - Fix CI tests by @younesbelkada in #210
- Update train_dreambooth.py by @haofanwang in #204
- Fix failing test on
main
by @younesbelkada in #224 - Causal LM generation fix for prefix tuning: GPT2 model by @vineetm in #222
- [
CI
] Add more ci tests by @younesbelkada in #223 - Show CONFIG_NAME instead of "config.json" by @aitor-gamarra in #231
- add docs by @pacman100 in #214
- [
utils
] add merge_lora utility function by @younesbelkada in #227 - Have fix typo in README by @guspan-tanadi in #243
- Move image classification example to the docs by @MKhalusova in #239
- [docs] Add API references by @stevhliu in #241
- [docs] Build notebooks from Markdown by @stevhliu in #240
- [
core
] Fix offload issue by @younesbelkada in #248 - [
Automation
] Add stale bot by @younesbelkada in #247 - [resources] replace pdf links with abs links by @stas00 in #255
- [
Automation
] Update stale.py by @younesbelkada in #254 - docs: have fix bit typo README by @guspan-tanadi in #252
- Update other.py by @tpoisonooo in #250
- Fixing a bug where a wrong parameter name is used for the offload_folder by @toncho11 in #257
- [
tests
] Adds more tests + fix failing tests by @younesbelkada in #238 - The Implementation of AdaLoRA (ICLR 2023) by @QingruZhang in #233
- Add BLIP2 Example by @younesbelkada in #260
- Multi Adapter support by @pacman100 in #263
- Fix typo in examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py by @rmill040 in #277
- [
tests
] Adds GPU tests by @younesbelkada in #256 - Fix half precision forward by @younesbelkada in #261
- fix trainable params setting by @pacman100 in #283
- [docs] Prefix tuning for Seq2Seq by @stevhliu in #272
- Fix lora_dropout operator type when dropout=0 by @bigeagle in #288
- [
test
] Add Dockerfile by @younesbelkada in #278 - fix and update examples and readme by @pacman100 in #295
- [docs] Prompt tuning for CLM by @stevhliu in #264
- Change gather for gather_for_metrics in eval. by @JulesGM in #296
- Fix: unexpected keyword argument 'has_fp16_weights' by @cyberfox in #299
- [
tests
] add CI training tests by @younesbelkada in #311 - [docs] Task guide for semantic segmentation with LoRA by @MKhalusova in #307
- [docs] P-tuning for sequence classification by @stevhliu in #281
- Fix
merge_and_unload
when having additional trainable modules by @pacman100 in #322 - feat(ci): add
pip
caching to CI by @SauravMaheshkar in #314 - Fix eval for causal language modeling example by @BabyChouSr in #327
- [docs] LoRA for token classification by @stevhliu in #302
- [docs] int8 training by @stevhliu in #332
- fix lora modules_to_save issue by @pacman100 in #343
- [docs] Task guide with Dreamboo...