There is a large difference between the memory estimated by Accelerate and the actual memory consumption during runtime inference. #3297

clemente0731 · 2024-12-16T09:45:44Z

System Info

1. memory estimated
 trust_remote_code=True accelerate estimate-memory neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 --trust_remote_code
2. runtime-infer memory
lm_eval \
  --model vllm \
  --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
  --tasks mmlu \
  --num_fewshot 5 \
  --batch_size 1

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

memory estimated
trust_remote_code=True accelerate estimate-memory neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 --trust_remote_code
runtime-infer memory
lm_eval
--model vllm
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1
--tasks mmlu
--num_fewshot 5
--batch_size 1

Expected behavior

The memory estimation is not much different.

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-12-16T10:47:06Z

Could you please report what number you're getting and how much memory is actually being used? Remember that this calculation is for the minimum amount of memory used and does not account for the total required memory (e.g. from hidden states/activations).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is a large difference between the memory estimated by Accelerate and the actual memory consumption during runtime inference. #3297

There is a large difference between the memory estimated by Accelerate and the actual memory consumption during runtime inference. #3297

clemente0731 commented Dec 16, 2024

BenjaminBossan commented Dec 16, 2024

There is a large difference between the memory estimated by Accelerate and the actual memory consumption during runtime inference. #3297

There is a large difference between the memory estimated by Accelerate and the actual memory consumption during runtime inference. #3297

Comments

clemente0731 commented Dec 16, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Dec 16, 2024