rslora guidance for alpha value? #1387

andysalerno · 2024-01-23T00:08:02Z

andysalerno
Jan 23, 2024

Since the flag to enable rslora was merged, I'm curious if there is any guidance for a sane default for the alpha value when using rslora.

For regular lora, I've seen somewhere that "2 * rank" is a good default choice for alpha. Does the same hold true for rslora?

One other thing I don't understand... if "alpha" is arbitrary, and rslora just changes "alpha/rank" to "alpha/sqrt(rank)", couldn't I achieve the exact same outcome by just modifying alpha to produce the same result from the ratio? Or am I oversimplifying what rslora actually does?

Thanks!!

BenjaminBossan · 2024-02-06T13:24:21Z

BenjaminBossan
Feb 6, 2024
Maintainer

Reading the rslora paper might give you some starting points for choosing good alpha values.

if "alpha" is arbitrary, and rslora just changes "alpha/rank" to "alpha/sqrt(rank)", couldn't I achieve the exact same outcome by just modifying alpha to produce the same result from the ratio?

Yes, that would be possible, this is mostly a convenience feature. Note that by using alpha_pattern on the LoraConfig, you can set different alpha values for different layers, so it may not be as easy as just calculating this value once. use_rslora takes care of that for you.

0 replies

Damjan-Kalajdzievski · 2024-05-13T03:27:04Z

Damjan-Kalajdzievski
May 13, 2024

Had not seen this discussion previously; BenjaminBossan is exactly correct.

For regular lora, I've seen somewhere that "2 * rank" is a good default choice for alpha. Does the same hold true for rslora?

So in my experience so far, depending on the finetuning dataset and downstream evaluation, there may be some amount of forgetting/overfitting that happens to saturate performance with eventually large enough rank (so in that case, rsLoRA with some rank less than model dimension works better than full model fine-tuning), otherwise use the largest rank you can fit in your time and memory budget. The performance in my experience is always better than usual LoRA, since the best performing ranks are only saturating for rank >> 16.

For models with model dimension 4096 (eg some common ~7B size models like Llama, Mistral, etc), I find rank 256 is a pretty good balance of quickness and thickness in my experience, (see this blog post for example). Sometimes I use up to rank 2048.

4 replies

Shiroe7 May 29, 2024

Thank you very much.
However, in this blog, the initial setting of lora_alpha was 256, which resulted in a final alpha of 16. But in subsequent experiments, it seems that lora_alpha was set to 16 for comparison purposes, making the final alpha value 1. The experimental results in the article seem quite good, so I also plan to test with an alpha of 1 to seek the optimal value. I'm not sure if I have misunderstood.

BigDataMLexplorer Aug 26, 2024

@Damjan-Kalajdzievski
Hi, I'm training the Llama3 8b model. I did many trials with lora rank = 16 and different aplhas -> (32, 16 and 8). In my case the best result was with aplha 8. I did not use rslora in this testing.
Even assuming I have the best aplha value, can it still help me to use use_rslora=True in this configuration? If I have aplha set to 8, what aplha will be used when rslora? I didn't quite get it from the huggingface article.

In general, I read that a higher rank in lora should capture more nuances, because more parameters will be trained.
That's why I also tried to increase lora's rank to 256 and leave aplha at half (128). Of course with a proper learning rate, otherwise the results would be very bad. I already used use_rslora=True here. The result was 1% percentage point worse than rank 16. The result was worse than rank 16 even though I didn't use rslora.

Do you think I may have already reached the optimum or should I do something different when using rslora?

Damjan-Kalajdzievski Sep 10, 2024

Even assuming I have the best aplha value, can it still help me to use use_rslora=True in this configuration? If I have aplha set to 8, what aplha will be used when rslora? I didn't quite get it from the huggingface article.

For best results one should do any alpha hyperparameter testing with rsLoRA, but in general using common values for alpha in LoRA (like the one you used alpha=8), should work better with rsLoRA for higher ranks.

... 256 and leave aplha at half (128). ... Do you think I may have already reached the optimum or should I do something different when using rslora?

If I understood correctly, you tested alpha 8 for a lower rank 16 with use_rslora=False and then modified alpha to 128 for a higher rank 256 when using use_rslora=True; In this case, if you do not want to sweep alpha again for rsLoRA and higher ranks, I would not modify the alpha value for rsLoRA, ie alpha=8 for a higher rank 256 with use_rslora=True.

It is not impossible for very low ranks like 16 for LoRA/rsLoRA to be best, however I think these datasets are less common, so likely the best performing would be rsLoRA with rank > 16, like say 128, 256, ...

If anyone is confused about setting alpha for rsLoRA, just leave it to whatever the default is and sweep the rank.

BugReporterZ Sep 25, 2024

In my tests alpha=1 appears to be the optimal value with rsLoRA, yielding the most stable training and lowest train loss, even after tuning the learning rate. The paper also appears to just use 1 and to suggest that the optimal scaling is 1/sqrt(rank), not just any arbitrary value of alpha.

If so, why can't this point be made clearer? People are still confused as to what alpha to use with rsLoRA. The default value when alpha is undefined appears to be alpha=16, which is not optimal with rsLoRA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rslora guidance for alpha value? #1387

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

rslora guidance for alpha value? #1387

andysalerno Jan 23, 2024

Replies: 2 comments · 4 replies

BenjaminBossan Feb 6, 2024 Maintainer

Damjan-Kalajdzievski May 13, 2024

Shiroe7 May 29, 2024

BigDataMLexplorer Aug 26, 2024

Damjan-Kalajdzievski Sep 10, 2024

BugReporterZ Sep 25, 2024

andysalerno
Jan 23, 2024

Replies: 2 comments 4 replies

BenjaminBossan
Feb 6, 2024
Maintainer

Damjan-Kalajdzievski
May 13, 2024