Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xin cannot perceive whether the service is running normally. #2685

Open
1 of 3 tasks
liuzhenghua opened this issue Dec 18, 2024 · 2 comments
Open
1 of 3 tasks

Xin cannot perceive whether the service is running normally. #2685

liuzhenghua opened this issue Dec 18, 2024 · 2 comments
Milestone

Comments

@liuzhenghua
Copy link
Contributor

liuzhenghua commented Dec 18, 2024

System Info / 系統信息

image

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

0.16.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference launch --model-name bge-m3 --model-type embedding --replica 2 --n-gpu 1

Reproduction / 复现过程

Use the above command to launch two instances of the BGE-M3 model. Sometimes, an instance may encounter CUDA-related errors when calling an interface.

Expected behavior / 期待表现

Xin can automatically detect problematic nodes and restart them.

@XprobeBot XprobeBot added the gpu label Dec 18, 2024
@XprobeBot XprobeBot added this to the v1.x milestone Dec 18, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Dec 18, 2024

Maybe we can add a health_check for each worker model actor to query its status.

Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants