rslora guidance for alpha value? #1387
Replies: 2 comments 4 replies
-
Reading the rslora paper might give you some starting points for choosing good alpha values.
Yes, that would be possible, this is mostly a convenience feature. Note that by using |
Beta Was this translation helpful? Give feedback.
-
Had not seen this discussion previously; BenjaminBossan is exactly correct.
So in my experience so far, depending on the finetuning dataset and downstream evaluation, there may be some amount of forgetting/overfitting that happens to saturate performance with eventually large enough rank (so in that case, rsLoRA with some rank less than model dimension works better than full model fine-tuning), otherwise use the largest rank you can fit in your time and memory budget. The performance in my experience is always better than usual LoRA, since the best performing ranks are only saturating for rank >> 16. For models with model dimension 4096 (eg some common ~7B size models like Llama, Mistral, etc), I find rank 256 is a pretty good balance of quickness and thickness in my experience, (see this blog post for example). Sometimes I use up to rank 2048. |
Beta Was this translation helpful? Give feedback.
-
Since the flag to enable rslora was merged, I'm curious if there is any guidance for a sane default for the alpha value when using rslora.
For regular lora, I've seen somewhere that "2 * rank" is a good default choice for alpha. Does the same hold true for rslora?
One other thing I don't understand... if "alpha" is arbitrary, and rslora just changes "alpha/rank" to "alpha/sqrt(rank)", couldn't I achieve the exact same outcome by just modifying alpha to produce the same result from the ratio? Or am I oversimplifying what rslora actually does?
Thanks!!
Beta Was this translation helpful? Give feedback.
All reactions