You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model is (mostly) being loaded to the last GPU. However, I'd expect it to be loaded across the different GPUs. Moreover, infer_auto_device_map seems to be not working.
I have experienced this very similar issue with different hardware.
The text was updated successfully, but these errors were encountered:
Hi @guillemram97, thanks for reporting this issue 😊. Indeed it seems to be a bug related to how we load quantized models on accelerate side. We are currently working on a fix to improve these edge cases. You can refer to the PR linked to the issue if you want to understand the details.
System Info
System Info
Hardware: Amazon Linux EC2 Instance.
8 NVIDIA A10G (23 GB)
Who can help?
@muellerz @SunMarc @MekkCyber
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Reproduction
However, if I load without the quantization_config, no issue at all:
Expected behavior
The model is (mostly) being loaded to the last GPU. However, I'd expect it to be loaded across the different GPUs. Moreover, infer_auto_device_map seems to be not working.
I have experienced this very similar issue with different hardware.
The text was updated successfully, but these errors were encountered: