Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

无法在多GPU环境下在指定GPU上启动3个及以上数量的副本 #2689

Open
1 of 3 tasks
epic1219 opened this issue Dec 19, 2024 · 2 comments
Open
1 of 3 tasks
Milestone

Comments

@epic1219
Copy link

System Info / 系統信息

Ubuntu 22.04.5 LTS

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

xprobe/xinference:v1.0.1
sha256:5934a612a67108569a576ec1546e1d6ad17e510bd5624b95eaebb981400fd12f

The command used to start Xinference / 用以启动 xinference 的命令

docker run -e XINFERENCE_MODEL_SRC=huggingface -e HF_ENDPOINT=https://hf-mirror.com -p 9997:9997 --gpus all xprobe/xinference:v1.0.1 xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

  1. 启动镜像并进入容器
  2. xinference launch -n bge-large-zh-v1.5 -t embedding -r 3 --n-gpu 1 --gpu-idx 0,0,0 -u bge-large-zh-v1.5
  3. 报错:RuntimeError: Failed to launch model, detail: [address=0.0.0.0:33372, pid=53129] Model not found in the model list, uid: bge-large-zh-v1.5-2
  4. 初步排查原因在于函数 allocate_devices_with_gpu_idx (xinference/core/worker.py:L466) 判断所选GPU设备上是否部署有vllm模型时调用了 supervisorget_model 函数导致

Expected behavior / 期待表现

正常启动模型实例

@XprobeBot XprobeBot added the gpu label Dec 19, 2024
@XprobeBot XprobeBot added this to the v1.x milestone Dec 19, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Dec 20, 2024

gpu idx 和 replica 配合可能支持的不太好,如果有修复也可以提交 PR。

Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants