-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
集群模式部署,如果重启supervisor,必须重启所有woker吗? #2402
Comments
我也发现了这个问题, 必须先启动supervisor, 后启动worker, 而且此后连接不能中断. 否则即使supervisor成功重启了, worker依然会持续报错连不上supervisor的ip地址. |
启动supervisor时指定supervisor-port的话,重启supervisor后是能够让worker连上的(因为supervisor端口固定了) 如果supervisor能实现无状态(比如通过redis共享),还能解决目前supervisor单点问题 |
我也发现这个问题了,如果这个问题不解决,是没法真正集群使用的 |
This issue is stale because it has been open for 7 days with no activity. |
This issue was closed because it has been inactive for 5 days since being marked as stale. |
This issue is stale because it has been open for 7 days with no activity. |
This issue was closed because it has been inactive for 5 days since being marked as stale. |
System Info / 系統信息
docker
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.15.2
The command used to start Xinference / 用以启动 xinference 的命令
分布式场景,正常启动 supervisor 和 worker
supervisor启动指定了supervisor-port
在worker上启动一个模型,如:bge-m3
Reproduction / 复现过程
重启supervisor,前端无法查看正在运行的模型 bge-m3;模型服务不可用;
Expected behavior / 期待表现
The text was updated successfully, but these errors were encountered: