Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a large difference between the memory estimated by Accelerate and the actual memory consumption during runtime inference. #3297

Open
4 tasks
clemente0731 opened this issue Dec 16, 2024 · 1 comment

Comments

@clemente0731
Copy link

System Info

1. memory estimated
 trust_remote_code=True accelerate estimate-memory neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 --trust_remote_code
2. runtime-infer memory
lm_eval \
  --model vllm \
  --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
  --tasks mmlu \
  --num_fewshot 5 \
  --batch_size 1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

  1. memory estimated
    trust_remote_code=True accelerate estimate-memory neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 --trust_remote_code
  2. runtime-infer memory
    lm_eval
    --model vllm
    --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1
    --tasks mmlu
    --num_fewshot 5
    --batch_size 1

Expected behavior

The memory estimation is not much different.

@BenjaminBossan
Copy link
Member

Could you please report what number you're getting and how much memory is actually being used? Remember that this calculation is for the minimum amount of memory used and does not account for the total required memory (e.g. from hidden states/activations).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants