You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
Could you please report what number you're getting and how much memory is actually being used? Remember that this calculation is for the minimum amount of memory used and does not account for the total required memory (e.g. from hidden states/activations).
System Info
1. memory estimated trust_remote_code=True accelerate estimate-memory neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 --trust_remote_code 2. runtime-infer memory lm_eval \ --model vllm \ --model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \ --tasks mmlu \ --num_fewshot 5 \ --batch_size 1
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
trust_remote_code=True accelerate estimate-memory neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 --trust_remote_code
lm_eval
--model vllm
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1
--tasks mmlu
--num_fewshot 5
--batch_size 1
Expected behavior
The memory estimation is not much different.
The text was updated successfully, but these errors were encountered: