Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于单卡多模型加载 #2708

Open
1 of 3 tasks
luckfu opened this issue Dec 26, 2024 · 0 comments
Open
1 of 3 tasks

关于单卡多模型加载 #2708

luckfu opened this issue Dec 26, 2024 · 0 comments
Labels
Milestone

Comments

@luckfu
Copy link

luckfu commented Dec 26, 2024

System Info / 系統信息

cuda 12.2

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

1.1.0

The command used to start Xinference / 用以启动 xinference 的命令

xinference  launch --model-name qwen2.5-instruct \
>   --model-type LLM \
>   --model-uid Qwen1_5B \
>   --model_path /models/Qwen/Qwen2___5-1___5B-Instruct \
>   --model-engine 'vllm' \
>   --model-format 'pytorch' \
>   --quantization None \
>   --n-gpu 1\
>   --gpu-idx "0" \
>   --tensor_parallel_size 1 \
>   --gpu_memory_utilization 0.30 \
>   --max_model_len 4096

Reproduction / 复现过程

xinference  launch --model-name qwen2.5-instruct \
>   --model-type LLM \
>   --model-uid Qwen1_5B \
>   --model_path /models/Qwen/Qwen2___5-1___5B-Instruct \
>   --model-engine 'vllm' \
>   --model-format 'pytorch' \
>   --quantization None \
>   --n-gpu 1\
>   --gpu-idx "0" \
>   --tensor_parallel_size 1 \
>   --gpu_memory_utilization 0.30 \
>   --max_model_len 4096
Launch model name: qwen2.5-instruct with kwargs: {'model_path': '/models/Qwen/Qwen2___5-1___5B-Instruct', 'tensor_parallel_size': 1, 'gpu_memory_utilization': 0.3, 'max_model_len': 4096}
Traceback (most recent call last):
  File "/usr/local/bin/xinference", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/cmdline.py", line 908, in model_launch
    model_uid = client.launch_model(
  File "/usr/local/lib/python3.10/dist-packages/xinference/client/restful/restful_client.py", line 999, in launch_model
    raise RuntimeError(
RuntimeError: Failed to launch model, detail: [address=0.0.0.0:26194, pid=237] User specified GPU index 0 has been occupied with a vLLM model: Qwen0_5B-0, therefore cannot allocate GPU memory for a new model.
截屏2024-12-26 14 51 37 实际上显存有空闲

Expected behavior / 期待表现

在显存允许的范围内,单卡可以加载多个模型

@XprobeBot XprobeBot added the gpu label Dec 26, 2024
@XprobeBot XprobeBot added this to the v1.x milestone Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants