无法在多GPU环境下在指定GPU上启动3个及以上数量的副本 #2689

epic1219 · 2024-12-19T01:16:33Z

System Info / 系統信息

Ubuntu 22.04.5 LTS

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

xprobe/xinference:v1.0.1
sha256:5934a612a67108569a576ec1546e1d6ad17e510bd5624b95eaebb981400fd12f

The command used to start Xinference / 用以启动 xinference 的命令

docker run -e XINFERENCE_MODEL_SRC=huggingface -e HF_ENDPOINT=https://hf-mirror.com -p 9997:9997 --gpus all xprobe/xinference:v1.0.1 xinference-local --host 0.0.0.0 --port 9997

Reproduction / 复现过程

启动镜像并进入容器
xinference launch -n bge-large-zh-v1.5 -t embedding -r 3 --n-gpu 1 --gpu-idx 0,0,0 -u bge-large-zh-v1.5
报错：RuntimeError: Failed to launch model, detail: [address=0.0.0.0:33372, pid=53129] Model not found in the model list, uid: bge-large-zh-v1.5-2
初步排查原因在于函数 allocate_devices_with_gpu_idx (xinference/core/worker.py:L466) 判断所选GPU设备上是否部署有vllm模型时调用了 supervisor 的 get_model 函数导致

Expected behavior / 期待表现

正常启动模型实例

The text was updated successfully, but these errors were encountered:

qinxuye · 2024-12-20T02:40:55Z

gpu idx 和 replica 配合可能支持的不太好，如果有修复也可以提交 PR。

github-actions · 2024-12-27T19:03:33Z

This issue is stale because it has been open for 7 days with no activity.

XprobeBot added the gpu label Dec 19, 2024

XprobeBot added this to the v1.x milestone Dec 19, 2024

github-actions bot added the stale label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

无法在多GPU环境下在指定GPU上启动3个及以上数量的副本 #2689

无法在多GPU环境下在指定GPU上启动3个及以上数量的副本 #2689

epic1219 commented Dec 19, 2024

qinxuye commented Dec 20, 2024

github-actions bot commented Dec 27, 2024

无法在多GPU环境下在指定GPU上启动3个及以上数量的副本 #2689

无法在多GPU环境下在指定GPU上启动3个及以上数量的副本 #2689

Comments

epic1219 commented Dec 19, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Dec 20, 2024

github-actions bot commented Dec 27, 2024