Xin cannot perceive whether the service is running normally. #2685

liuzhenghua · 2024-12-18T02:02:32Z

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

0.16.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference launch --model-name bge-m3 --model-type embedding --replica 2 --n-gpu 1

Reproduction / 复现过程

Use the above command to launch two instances of the BGE-M3 model. Sometimes, an instance may encounter CUDA-related errors when calling an interface.

Expected behavior / 期待表现

Xin can automatically detect problematic nodes and restart them.

qinxuye · 2024-12-18T08:10:16Z

Maybe we can add a health_check for each worker model actor to query its status.

github-actions · 2024-12-25T19:03:52Z

This issue is stale because it has been open for 7 days with no activity.

XprobeBot added the gpu label Dec 18, 2024

XprobeBot added this to the v1.x milestone Dec 18, 2024

github-actions bot added the stale label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xin cannot perceive whether the service is running normally. #2685

Xin cannot perceive whether the service is running normally. #2685

liuzhenghua commented Dec 18, 2024 •

edited

Loading

qinxuye commented Dec 18, 2024

github-actions bot commented Dec 25, 2024

Xin cannot perceive whether the service is running normally. #2685

Xin cannot perceive whether the service is running normally. #2685

Comments

liuzhenghua commented Dec 18, 2024 • edited Loading

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Dec 18, 2024

github-actions bot commented Dec 25, 2024

liuzhenghua commented Dec 18, 2024 •

edited

Loading