-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError when inference with different LoRA adapters in the same batch #2283
Comments
The code is as follows:
|
What PyTorch version are you using? |
1.13.1+cu117 |
That's the reason, your torch version is really old and does not support this argument yet. Would it be possible for you to upgrade to a newer torch version? |
Thanks for the suggestion. After upgrading to torch 2.4.1, the error during normal generation has been resolved. However, when using beam search during inference, I encountered a new ValueError. I suspect this is because num_beams is set to 20, causing the error. How can I adapt the model to handle beam search during inference? Additionally, I’m curious whether the inference for different adapters in a batch runs serially or in parallel. Since different adapters share the same base model, they theoretically can perform in parallel. However, the current implementation seems to run serially, as suggested by the execution time. Is optimizing for parallel execution feasible, and are there any plans to support this functionality in the future?
...... |
See huggingface#2283 Right now, using mixed adapter batches with beam search generations does not work. This is because users need to pass the adapter names associated with each sample, i.e. the number of adapter names should be identical to the number of samples in the input. When applying beam search, transformers internally repeats the samples once per beam (or so it looks like). Therefore, we have more samples during generation than samples in the input. Consequently, the adapter names have to be extended accordingly. This is now taken care of. Unfortunately, this does not work for encoder-decoder models yet. With these models, there is always a size mismatch, whether adapter names are extended or not. What I suspect is happening is that only the decoder needs to be extended, but right now I don't see a way to implement this distinction in PEFT. Therefore, encoder-decoder + beam search generations is not supported for the time being.
@yuxiang-guo Thanks for reporting back. Indeed, beam search is currently not supported but I created a PR that should enable it: #2287. If you have time, you could try installing PEFT from that branch and see if it fixes your issue.
You are right that PEFT does not parallelize here. This is somewhat out of scope for PEFT, as there are already many parallelization methods in the torch ecosystem (DDP, FSDP, DeepSpeed, etc.). If we added our own parallelization into PEFT, it would most likely interfere with these existing methods and hamper performance. |
@BenjaminBossan Many thanks! I will try it.
In the current implementation, are different adapters loaded into the base model and then unloaded to perform serial inference within a batch? I’m wondering if it’s possible to use multiple adapters to perform inference in parallel, where the outputs of each adapter are then added to the base model’s output. That being said, inference could be conducted without merging the LoRA adapters into the base model. In this way, the inference time won't increase linearly to the number of adapters within a batch. |
The best way to achieve that would be to merge those adapters into the base model using the
I don't understand this part. If you don't merge the weights, it means that there is always a LoRA overhead. |
System Info
transformers 4.41.0
peft 0.13.2
Who can help?
@BenjaminBossan
I tried to adopt [Inference with different LoRA adapters in the same batch] to an encoder-decoder T5 model.
Specifically, I load the base model, the first LoRA, and the second LoRA adapters, and perform inference with these three models in the same batch. However, some errors occurred.
BTW, does [inference with different LoRA adapters in the same batch] support beam search when using generate()?
Information
Tasks
examples
folderReproduction
Code:
The error message:
Expected behavior
I expect the existing function for [inference with different LoRA adapters in the same batch] to support T5 with LoRAs and work in my beam search experiments during generation.
The text was updated successfully, but these errors were encountered: