PEFT LoRA training on multiple datasets in parallel #1061
Replies: 1 comment
-
To make sure, you want to train the different LoRA adapters all independently of each other, so not LoRA 5 on top of LoRA 4 on top of ... (incremental training)? If that is so, it would seem to me that the "naive" approach 2 should be both easiest (less custom code, less debugging) and most efficient. I would only go for 1 if either the model doesn't fit on one GPU or 2 you want some kind of incremental training. However, there is no perfect answer that always applies, as it can depend on so many factors. I would suggest to follow the general advice for training neural nets, I don't think the fact that LoRA is being used here changes the general wisdom. |
Beta Was this translation helpful? Give feedback.
-
Hello, I want to find the most efficient way to train N different LoRA weight adapters separately on N different datasets / tasks. I have access to up to to 8 A100 GPUs with 40GB VRAM each and want to optimize for speed.
Right now, training just 1 dataset (dolly-15k) using peft Lora on one GPU is going well–we are using up around 80% of 1 GPU's memory.
To most efficiently train N different sets of loRA adapters on N different datasets, optimizing for speed, there are different approaches I was thinking about:
For this, I am assuming N=5.
For both of these approaches, would I need to edit Trainer directly? How would you approach this? Thank you!
Beta Was this translation helpful? Give feedback.
All reactions