Is it possible to add LoRA on specific head? #2293

SpeeeedLee · 2024-12-22T19:57:54Z

Feature request

Could I add LoRA only to some selected heads on the model?
I read some documentation here, but am still not sure about how to implement my goal.

Motivation

Current LoRA Config can allow users to decide where matrices to add LoRA, a more fine-grained control on which heads to add LoRA would be beneficial for the developers.

Your contribution

I would appreciate some tips on how to implement this.

d-kleine · 2024-12-23T08:11:28Z

With target_modules, it's described here: https://huggingface.co/docs/peft/developer_guides/custom_models#multilayer-perceptron

config = LoraConfig(
    target_modules=["seq.0", "seq.2"], # use the layer names according to the model you are using
    modules_to_save=["seq.4"],
)

You can retrieve the name and type of each layer of your model with this code:

for n, m in base_model.named_modules(): # replace `base_model` with the variable your pretrained model is stored in
    print((n, type(m)))

For example for Llama 3.2 1b:

for specific heads, something like

target_modules=["layers.0.self_attn.q_proj", "layers.0.self_attn.v_proj"],  # q and v for self_attn layer 0

for a group of layers:

target_modules=["q_proj", "v_proj"],   # q and v for all self_attn layers

SpeeeedLee · 2024-12-25T05:02:07Z

Hi, @d-kleine, thanks for the reply. I am thinking more about something like adding LoRA on some attention heads only, which means my target might be:
First head-dim columns of layers.0.self_attn.q_proj, i.e, the first attention head of Query matrix. Besides adding LoRA, I am also curious about how can I freeze some heads and fine-tune some selected ones only.

BenjaminBossan · 2024-12-25T11:52:22Z

It is not possible to target specific heads. The issue is that the weights of all heads are combined into a single nn.Linear weight, so if we apply LoRA to it, it will affect all the weights.

SpeeeedLee · 2024-12-25T15:59:15Z

Thanks for the prompt reply, @BenjaminBossan.
Yes, I understand this issue and am wondering whether it possible for me to write a custom code to separate each head to a single nn.Linear weight. If so, then I can selectively fine-tune some heads or adding LoRA on them.

I found some possible approaches like this previous issue, where SAM's Q, K, and V are successfully separated. This might be used in my case, where I want to separate each head out?

BenjaminBossan · 2024-12-26T11:40:17Z

wondering whether it possible for me to write a custom code to separate each head to a single nn.Linear weight. If so, then I can selectively fine-tune some heads or adding LoRA on them.

That is possible, it means that you have to implement the whole transformer attention module for yourself and you might be missing out on some optimizations (flash attention, caching).

Alternatively, you might be able to write a custom LoRA layer that, say, masks out the heads that should not be touched, and register it with the PEFT dispatcher to be applied to the whole attention module, e.g. LlamaAttention if that's what your model is using.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to add LoRA on specific head? #2293

Is it possible to add LoRA on specific head? #2293

SpeeeedLee commented Dec 22, 2024

d-kleine commented Dec 23, 2024 •

edited

Loading

SpeeeedLee commented Dec 25, 2024 •

edited

Loading

BenjaminBossan commented Dec 25, 2024

SpeeeedLee commented Dec 25, 2024

BenjaminBossan commented Dec 26, 2024

Is it possible to add LoRA on specific head? #2293

Is it possible to add LoRA on specific head? #2293

Comments

SpeeeedLee commented Dec 22, 2024

Feature request

Motivation

Your contribution

d-kleine commented Dec 23, 2024 • edited Loading

SpeeeedLee commented Dec 25, 2024 • edited Loading

BenjaminBossan commented Dec 25, 2024

SpeeeedLee commented Dec 25, 2024

BenjaminBossan commented Dec 26, 2024

d-kleine commented Dec 23, 2024 •

edited

Loading

SpeeeedLee commented Dec 25, 2024 •

edited

Loading