Skip to content

Releases: huggingface/peft

Patch Release v0.8.1

30 Jan 10:48
5e4aa7e
Compare
Choose a tag to compare

This is a small patch release of PEFT that should:

  • Fix breaking change related to support for saving resized embedding layers and Diffusers models. Contributed by @younesbelkada in #1414

What's Changed

Full Changelog: v0.8.0...v0.8.1

v0.8.0: Poly PEFT method, LoRA improvements, Documentation improvements and more

30 Jan 06:59
30889ef
Compare
Choose a tag to compare

Highlights

Poly PEFT method

Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists of pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (𝙿𝚘𝚕𝚢) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. To put simply, you can think of it as Mixture of Expert Adapters.
𝙼𝙷𝚁 (Multi-Head Routing) combines subsets of adapter parameters and outperforms 𝙿𝚘𝚕𝚢 under a comparable parameter budget; by only fine-tuning the routing function and not the adapters (𝙼𝙷𝚁-z) they achieve competitive performance with extreme parameter efficiency.

LoRA improvements

Now, you can specify all-linear to target_modules param of LoraConfig to target all the linear layers which has shown to perform better in QLoRA paper than only targeting query and valuer attention layers

  • Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295

Embedding layers of base models are now automatically saved when the embedding layers are resized when fine-tuning with PEFT approaches like LoRA. This enables extending the vocabulary of tokenizer to include special tokens. This is a common use-case when doing the following:

  1. Instruction finetuning with new tokens being added such as <|user|>, <|assistant|>, <|system|>, <|im_end|>, <|im_start|>, </s>, <s> to properly format the conversations
  2. Finetuning on a specific language wherein language specific tokens are added, e.g., Korean tokens being added to vocabulary for finetuning LLM on Korean datasets.
  3. Instruction finetuning to return outputs in a certain format to enable agent behaviour of new tokens such as <|FUNCTIONS|>, <|BROWSE|>, <|TEXT2IMAGE|>, <|ASR|>, <|TTS|>, <|GENERATECODE|>, <|RAG|>.
    A good blogpost to learn more about this https://www.philschmid.de/fine-tune-llms-in-2024-with-trl.
  • save the embeddings even when they aren't targetted but resized by @pacman100 in #1383

New option use_rslora in LoraConfig. Use it for ranks greater than 32 and see the increase in fine-tuning performance (same or better performance for ranks lower than 32 as well).

Documentation improvements

  • Refactoring and updating of the concept guides. [docs] Concept guides by @stevhliu in #1269
  • Improving task guides to focus more on how to use different PEFT methods and related nuances instead of focusing more on different type of tasks. It condenses the individual guides into a single one to highlight the commonalities and differences, and to refer to existing docs to avoid duplication. [docs] Task guides by @stevhliu in #1332
  • DOC: Update docstring for the config classes by @BenjaminBossan in #1343
  • LoftQ: edit README.md and example files by @yxli2123 in #1276
  • [Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
  • DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
  • [docs] Docstring link by @stevhliu in #1356
  • QOL improvements and doc updates by @pacman100 in #1318
  • Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
  • DOC: Improve target modules description by @BenjaminBossan in #1290
  • DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
  • DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
  • Improve documentation for the all-linear flag by @SumanthRH in #1357
  • Fix various typos in LoftQ docs. by @arnavgarg1 in #1408

What's Changed

Read more

v0.7.1 patch release

12 Dec 17:22
67a0800
Compare
Choose a tag to compare

This is a small patch release of PEFT that should handle:

  • Issues with loading multiple adapters when using quantized models (#1243)
  • Issues with transformers v4.36 and some prompt learning methods (#1252)

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.7.1

v0.7.0: Orthogonal Fine-Tuning, Megatron support, better initialization, safetensors, and more

06 Dec 16:13
2665f80
Compare
Choose a tag to compare

Highlights

  • Orthogonal Fine-Tuning (OFT): A new adapter that is similar to LoRA and shows a lot of promise for Stable Diffusion, especially with regard to controllability and compositionality. Give it a try! By @okotaku in #1160
  • Support for parallel linear LoRA layers using Megatron. This should lead to a speed up when using LoRA with Megatron. By @zhangsheng377 in #1092
  • LoftQ provides a new method to initialize LoRA layers of quantized models. The big advantage is that the LoRA layer weights are chosen in a way to minimize the quantization error, as described here: https://arxiv.org/abs/2310.08659. By @yxli2123 in #1150.

Other notable additions

  • It is now possible to choose which adapters are merged when calling merge (#1132)
  • IA³ now supports adapter deletion, by @alexrs (#1153)
  • A new initialization method for LoRA has been added, "gaussian" (#1189)
  • When training PEFT models with new tokens being added to the embedding layers, the embedding layer is now saved by default (#1147)
  • It is now possible to mix certain adapters like LoRA and LoKr in the same model, see the docs (#1163)
  • We started an initiative to improve the documenation, some of which should already be reflected in the current docs. Still, help by the community is always welcome. Check out this issue to get going.

Migration to v0.7.0

  • Safetensors are now the default format for PEFT adapters. In practice, users should not have to change anything in their code, PEFT takes care of everything -- just be aware that instead of creating a file adapter_model.bin, calling save_pretrained now creates adapter_model.safetensors. Safetensors have numerous advantages over pickle files (which is the PyTorch default format) and well supported on Hugging Face Hub.
  • When merging multiple LoRA adapter weights together using add_weighted_adapter with the option combination_type="linear", the scaling of the adapter weights is now performed differently, leading to improved results.
  • There was a big refactor of the inner workings of some PEFT adapters. For the vast majority of users, this should not make any difference (except making some code run faster). However, if your code is relying on PEFT internals, be aware that the inheritance structure of certain adapter layers has changed (e.g. peft.lora.Linear is no longer a subclass of nn.Linear, so isinstance checks may need updating). Also, to retrieve the original weight of an adapted layer, now use self.get_base_layer().weight, not self.weight (same for bias).

What's Changed

As always, a bunch of small improvements, bug fixes and doc improvements were added. We thank all the external contributors, both new and recurring. Below is the list of all changes since the last release.

Read more

v0.6.2 Patch Release: Refactor of adapter deletion API and fixes to `ModulesToSaveWrapper` when using Low-level API

14 Nov 05:55
Compare
Choose a tag to compare

This patch release refactors the adapter deletion API and fixes to ModulesToSaveWrapper when using Low-level API.

Refactor adapter deletion

Fix ModulesToSaveWrapper when using Low-level API

What's Changed

What's Changed

New Contributors

Full Changelog: v0.6.1...v0.6.2

0.6.1 Patch Release: compatibility of Adaptation Prompt with transformers 4.35.0

09 Nov 14:49
Compare
Choose a tag to compare

This patch release fixes the compatbility issues with Adaptation Prompt that users faced with transformers 4.35.0. Moreover, it fixes an issue with token classification PEFT models when saving them using safetensors

Adaptation prompt fixes

  • FIX: Skip adaption prompt tests with new transformers versions by @BenjaminBossan in #1077
  • FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) by @younesbelkada in #1084

Safetensors fixes:

What's Changed

New Contributors

Full Changelog: v0.6.0...v0.6.1

🧨 Diffusers now uses 🤗 PEFT, new tuning methods, better quantization support, higher flexibility and more

03 Nov 09:42
02f0a4c
Compare
Choose a tag to compare

Highlights

Integration with diffusers

🧨 Diffusers now leverage PEFT as a backend for LoRA inference for Stable Diffusion models (#873, #993, #961). Relevant PRs on 🧨 Diffusers are huggingface/diffusers#5058, huggingface/diffusers#5147, huggingface/diffusers#5151 and huggingface/diffusers#5359. This helps in unlocking a vast number of practically demanding use cases around adapter-based inference 🚀. Now you can do the following with easy-to-use APIs and it supports different checkpoint formats (Diffusers format, Kohya format ...):

  1. use multiple LoRAs
  2. switch between them instantaneously
  3. scale and combine them
  4. merge/unmerge
  5. enable/disable

For details, refer to the documentation at Inference with PEFT.

New tuning methods

Other notable additions

  • Allow merging of LoRA weights when using 4bit and 8bit quantization (bitsandbytes), thanks to @jiqing-feng (#851, #875)
  • IA³ now supports 4bit quantization thanks to @His-Wardship (#864)
  • We increased the speed of adapter layer initialization: This should be most notable when creating a PEFT LoRA model on top of a large base model (#887, #915, #994)
  • More fine-grained control when configuring LoRA: It is now possible to have different ranks and alpha values for different layers (#873)

Experimental features

  • For some adapters like LoRA, it is now possible to activate multiple adapters at the same time (#873)

Breaking changes

  • It is no longer allowed to create a LoRA adapter with rank 0 (r=0). This used to be possible, in which case the adapter was ignored.

What's Changed

As always, a bunch of small improvements, bug fixes and doc improvements were added. We thank all the external contributors, both new and recurring. Below is the list of all changes since the last release.

Read more

GPTQ Quantization, Low-level API

22 Aug 11:22
Compare
Choose a tag to compare

GPTQ Integration

Now, you can finetune GPTQ quantized models using PEFT. Here are some examples of how to use PEFT with a GPTQ model: colab notebook and finetuning script.

Low-level API

Enables users and developers to use PEFT as a utility library, at least for injectable adapters (LoRA, IA3, AdaLoRA). It exposes an API to modify the model in place to inject the new layers into the model.

Support for XPU and NPU devices

Leverage the support for more devices for loading and fine-tuning PEFT adapters.

Mix-and-match LoRAs

Stable support and new ways of merging multiple LoRAs. There are currently 3 ways of merging loras supported: linear, svd and cat.

  • Added additional parameters to mixing multiple LoRAs through SVD, added ability to mix LoRAs through concatenation by @kovalexal in #817

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

QLoRA, IA3 PEFT method, support for QA and Feature Extraction tasks, AutoPeftModelForxxx for simplified UX , LoRA for custom models with new added utils

18 Jul 08:12
Compare
Choose a tag to compare

QLoRA Support:

QLoRA uses 4-bit quantization to compress a pretrained language model. The LM parameters are then frozen and a relatively small number of trainable parameters are added to the model in the form of Low-Rank Adapters. During finetuning, QLoRA backpropagates gradients through the frozen 4-bit quantized pretrained language model into the Low-Rank Adapters. The LoRA layers are the only parameters being updated during training. For more details read the blog Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

New PEFT methods: IA3 from T-Few paper

To make fine-tuning more efficient, IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) rescales inner activations with learned vectors. These learned vectors are injected into the attention and feedforward modules in a typical transformer-based architecture. These learned vectors are the only trainable parameters during fine-tuning, and thus the original weights remain frozen. Dealing with learned vectors (as opposed to learned low-rank updates to a weight matrix like LoRA) keeps the number of trainable parameters much smaller. For more details, read the paper Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Support for new tasks: QA and Feature Extraction

Addition of PeftModelForQuestionAnswering and PeftModelForFeatureExtraction classes to support QA and Feature Extraction tasks, respectively. This enables exciting new use-cases with PEFT, e.g., LoRA for semantic similarity tasks.

  • feat: Add PeftModelForQuestionAnswering by @sjrl in #473
  • add support for Feature Extraction using PEFT by @pacman100 in #647

AutoPeftModelForxxx for better and Simplified UX

Introduces a new paradigm, AutoPeftModelForxxx intended for users that want to rapidly load and run peft models.

from peft import AutoPeftModelForCausalLM

peft_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")

LoRA for custom models

Not a transformer model, no problem, we have got you covered. PEFT now enables the usage of LoRA with custom models.

New LoRA utilities

Improvements to add_weighted_adapter method to support SVD for combining multiple LoRAs when creating new LoRA.
New utils such as unload and delete_adapter providing users much better control about how they deal with the adapters.

  • [Core] Enhancements and refactoring of LoRA method by @pacman100 in #695

PEFT and Stable Diffusion

PEFT is very extensible and easy to use for performing DreamBooth of Stable Diffusion. Community has added conversion scripts to be able to use PEFT models with Civitai/webui format and vice-versa.

  • LoRA for Conv2d layer, script to convert kohya_ss LoRA to PEFT by @kovalexal in #461
  • Added Civitai LoRAs conversion to PEFT, PEFT LoRAs conversion to webui by @kovalexal in #596
  • [Bugfix] Fixed LoRA conv2d merge by @kovalexal in #637
  • Fixed LoraConfig alpha modification on add_weighted_adapter by @kovalexal in #654

What's Changed

Read more

Docs, Testing Suite, Multi Adapter Support, New methods and examples

03 May 22:19
Compare
Choose a tag to compare

Brand new Docs

With task guides, conceptual guides, integration guides, and code references all available at your fingertips, 🤗 PEFT's docs (found at https://huggingface.co/docs/peft) provide an insightful and easy-to-follow resource for anyone looking to how to use 🤗 PEFT. Whether you're a seasoned pro or just starting out, PEFT's documentation will help you to get the most out of it.

Comprehensive Testing Suite

Comprised of both unit and integration tests, it rigorously tests core features, examples, and various models on different setups, including single and multiple GPUs. This commitment to testing helps ensure that PEFT maintains the highest levels of correctness, usability, and performance, while continuously improving in all areas.

Multi Adapter Support

PEFT just got even more versatile with its new Multi Adapter Support! Now you can train and infer with multiple adapters, or even combine multiple LoRA adapters in a weighted combination. This is especially handy for RLHF training, where you can save memory by using a single base model with multiple adapters for actor, critic, reward, and reference. And the icing on the cake? Check out the LoRA Dreambooth inference example notebook to see this feature in action.

New PEFT methods: AdaLoRA and Adaption Prompt

PEFT just got even better, thanks to the contributions of the community! The AdaLoRA method is one of the exciting new additions. It takes the highly regarded LoRA method and improves it by allocating trainable parameters across the model to maximize performance within a given parameter budget. Another standout is the Adaption Prompt method, which enhances the already popular Prefix Tuning by introducing zero init attention.

New LoRA utilities

Good news for LoRA users! PEFT now allows you to merge LoRA parameters into the base model's parameters, giving you the freedom to remove the PEFT wrapper and apply downstream optimizations related to inference and deployment. Plus, you can use all the features that are compatible with the base model without any issues.

What's Changed

Read more