v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA
Highlights
Support for QLoRA with DeepSpeed ZeRO3 and FSDP
We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allows you to fine-tune a 70B Llama model on two GPUs with 24GB memory each. Besides the latest version of PEFT, this requires bitsandbytes>=0.43.0
, accelerate>=0.28.0
, transformers>4.38.2
, trl>0.7.11
. Check out our docs on DeepSpeed and FSDP with PEFT, as well as this blogpost from answer.ai, for more details.
Layer replication
First time contributor @siddartha-RE added support for layer replication with LoRA. This allows you to duplicate layers of a model and apply LoRA adapters to them. Since the base weights are shared, this costs only very little extra memory, but can lead to a nice improvement of model performance. Find out more in our docs.
Improving DoRA
Last release, we added the option to enable DoRA in PEFT by simply adding use_dora=True
to your LoraConfig
. However, this only worked for non-quantized linear layers. With this PEFT release, we now also support Conv2d
layers, as well as linear layers quantized with bitsandbytes.
Mixed LoRA adapter batches
If you have a PEFT model with multiple LoRA adapters attached to it, it's now possible to apply different adapters (or, in fact, no adapter) on different samples in the same batch. To do this, pass a list of adapter names as an additional argument. For example, if you have a batch of three samples:
output = model(**inputs, adapter_names=["adapter1", "adapter2", "__base__"])`
Here, "adapter1"
and "adapter2"
should be the same name as your corresponding LoRA adapters and "__base__"
is a special name that refers to the base model without any adapter. Find more details in our docs.
Without this feature, if you wanted to run inference with different LoRA adapters, you'd have to use single samples or try to group batches with the same adapter, then switch between adapters using set_adapter
-- this is inefficient and inconvenient. Therefore, it is recommended to use this new, faster method from now on when encountering this scenario.
New LoftQ initialization function
We added an alternative way to initialize LoRA weights for a quantized model using the LoftQ method, which can be more convenient than the existing method. Right now, using LoftQ requires you to go through multiple steps as shown here. Furthermore, it's necessary to keep a separate copy of the quantized weights, as those are not identical to the quantized weights from the default model.
Using the new replace_lora_weights_loftq
function, it's now possible to apply LoftQ initialization in a single step and without the need for extra copies of the weights. Check out the docs and this example notebook to see how it works. Right now, this method only supports 4bit quantization with bitsandbytes, and the model has to be stored in the safetensors format.
Deprecations
The function prepare_model_for_int8_training
was deprecated for quite some time and is now removed completely. Use prepare_model_for_kbit_training
instead.
What's Changed
Besides these highlights, we added many small improvements and fixed a couple of bugs. All these changes are listed below. As always, we thank all the awesome contributors who helped us improve PEFT.
- Bump version to 0.9.1.dev0 by @BenjaminBossan in #1517
- Fix for "leaf Variable that requires grad" Error in In-Place Operation by @DopeorNope-Lee in #1372
- FIX [
CI
/Docker
] Follow up from #1481 by @younesbelkada in #1487 - CI: temporary disable workflow by @younesbelkada in #1534
- FIX [
Docs
/bnb
/DeepSpeed
] Add clarification on bnb + PEFT + DS compatibilities by @younesbelkada in #1529 - Expose bias attribute on tuner layers by @BenjaminBossan in #1530
- docs: highlight difference between
num_parameters()
andget_nb_trainable_parameters()
in PEFT by @kmehant in #1531 - fix: fail when required args not passed when
prompt_tuning_init==TEXT
by @kmehant in #1519 - Fixed minor grammatical and code bugs by @gremlin97 in #1542
- Optimize
levenshtein_distance
algorithm inpeft_lora_seq2seq_accelera…
by @SUNGOD3 in #1527 - Update
prompt_based_methods.md
by @insist93 in #1548 - FIX Allow AdaLoRA rank to be 0 by @BenjaminBossan in #1540
- FIX: Make adaptation prompt CI happy for transformers 4.39.0 by @younesbelkada in #1551
- MNT: Use
BitsAndBytesConfig
asload_in_*
is deprecated by @BenjaminBossan in #1552 - Add Support for Mistral Model in Llama-Adapter Method by @PrakharSaxena24 in #1433
- Add support for layer replication in LoRA by @siddartha-RE in #1368
- QDoRA: Support DoRA with BnB quantization by @BenjaminBossan in #1518
- Feat: add support for Conv2D DoRA by @sayakpaul in #1516
- TST Report slowest tests by @BenjaminBossan in #1556
- Changes to support fsdp+qlora and dsz3+qlora by @pacman100 in #1550
- Update style with ruff 0.2.2 by @BenjaminBossan in #1565
- FEAT Mixing different LoRA adapters in same batch by @BenjaminBossan in #1558
- FIX [
CI
] Fix test docker CI by @younesbelkada in #1535 - Fix LoftQ docs and tests by @BenjaminBossan in #1532
- More convenient way to initialize LoftQ by @BenjaminBossan in #1543
New Contributors
- @DopeorNope-Lee made their first contribution in #1372
- @kmehant made their first contribution in #1531
- @gremlin97 made their first contribution in #1542
- @SUNGOD3 made their first contribution in #1527
- @insist93 made their first contribution in #1548
- @PrakharSaxena24 made their first contribution in #1433
- @siddartha-RE made their first contribution in #1368
Full Changelog: v0.9.0...v0.10.0