Some Questions About 'accelerate.prepare()' #3278

Klein-Lan · 2024-12-07T18:08:11Z

Hello, I have some questions regarding the operations related to accelerate.prepare(), and I hope to get your answers.

During my experiments, I noticed the following:

from transformers import AutoModelForCausalLM

import accelerate

accelerator = accelerate.Accelerator()

model = AutoModelForCausalLM.from_pretrained("llama2-7b-hf").half()

model = accelerator.prepare(model)

print(model)

My hardware is an NVIDIA A5000 GPU with 24GB of VRAM. Theoretically, loading a 7B model in half precision should only require around 14GB of VRAM.

However, I encountered an out-of-memory error when executing model = accelerator.prepare(model), while using model.to(accelerator.device) did not result in an error.

This outcome is quite puzzling to me. I don't understand why this happens, or how I can use accelerate to perform multi-GPU bf16 inference for llama2-7b-hf on my GPU.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Questions About 'accelerate.prepare()' #3278

Some Questions About 'accelerate.prepare()' #3278

Klein-Lan commented Dec 7, 2024 •

edited

Loading

Some Questions About 'accelerate.prepare()' #3278

Some Questions About 'accelerate.prepare()' #3278

Comments

Klein-Lan commented Dec 7, 2024 • edited Loading

Klein-Lan commented Dec 7, 2024 •

edited

Loading