Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xinference模型每次重启完可以正常回答,过一段时间对话一直转圈 #2674

Open
1 of 3 tasks
jiusi9 opened this issue Dec 16, 2024 · 6 comments
Open
1 of 3 tasks
Milestone

Comments

@jiusi9
Copy link

jiusi9 commented Dec 16, 2024

System Info / 系統信息

sentence-transformers 3.0.1
transformers 4.42.3
transformers-stream-generator 0.0.5

nvidia-cublas-cu12 12.4.5.8 单独升级过,为了适配H20 gpu
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.560.30
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.5.82
nvidia-nvtx-cu12 12.1.105

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

xinference 1.0.1

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 8080 --log-level debug && xinference register -t LLM -f /opt/xinference/configs/Qwen2.5-Coder-32B-Instruct.json -p && xinference launch --model-name sparrowx-Qwen2.5-Coder-32B-Instruct --model-engine transformers --quantization 4-bit

Reproduction / 复现过程

每次模型启动起来都可以正常回答,过了一会再次调用的时候,

  1. 通过dify调用(stream=true)得不到回答, xinference日志好像有回答的信息?
2024-12-16 10:44:19,407 xinference.core.supervisor 111 DEBUG    [request a60b20ac-bb57-11ef-8f84-264d7caea4aa] Enter describe_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7f60aead57c0>,sparrowx-Qwen2.5-Coder-32B-Instruct, kwargs: 
2024-12-16 10:44:19,407 xinference.core.worker 111 DEBUG    Enter describe_model, args: <xinference.core.worker.WorkerActor object at 0x7f60ae8c5540>, kwargs: model_uid=sparrowx-Qwen2.5-Coder-32B-Instruct-0
2024-12-16T02:44:19.407386140Z 2024-12-16 10:44:19,407 xinference.core.worker 111 DEBUG    Leave describe_model, elapsed time: 0 s
2024-12-16 10:44:19,407 xinference.core.supervisor 111 DEBUG    [request a60b20ac-bb57-11ef-8f84-264d7caea4aa] Leave describe_model, elapsed time: 0 s
2024-12-16 10:44:19,436 xinference.core.supervisor 111 DEBUG    [request a60fa550-bb57-11ef-8f84-264d7caea4aa] Enter describe_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7f60aead57c0>,sparrowx-Qwen2.5-Coder-32B-Instruct, kwargs: 
2024-12-16T02:44:19.436893790Z 2024-12-16 10:44:19,436 xinference.core.worker 111 DEBUG    Enter describe_model, args: <xinference.core.worker.WorkerActor object at 0x7f60ae8c5540>, kwargs: model_uid=sparrowx-Qwen2.5-Coder-32B-Instruct-0
2024-12-16 10:44:19,436 xinference.core.worker 111 DEBUG    Leave describe_model, elapsed time: 0 s
2024-12-16 10:44:19,436 xinference.core.supervisor 111 DEBUG    [request a60fa550-bb57-11ef-8f84-264d7caea4aa] Leave describe_model, elapsed time: 0 s
2024-12-16 10:44:19,448 xinference.core.supervisor 111 DEBUG    [request a6116b42-bb57-11ef-8f84-264d7caea4aa] Enter get_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7f60aead57c0>,sparrowx-Qwen2.5-Coder-32B-Instruct, kwargs: 
2024-12-16T02:44:19.448512062Z 2024-12-16 10:44:19,448 xinference.core.worker 111 DEBUG    Enter get_model, args: <xinference.core.worker.WorkerActor object at 0x7f60ae8c5540>, kwargs: model_uid=sparrowx-Qwen2.5-Coder-32B-Instruct-0
2024-12-16T02:44:19.448549747Z 2024-12-16 10:44:19,448 xinference.core.worker 111 DEBUG    Leave get_model, elapsed time: 0 s
2024-12-16 10:44:19,448 xinference.core.supervisor 111 DEBUG    [request a6116b42-bb57-11ef-8f84-264d7caea4aa] Leave get_model, elapsed time: 0 s
2024-12-16T02:44:19.449313425Z 2024-12-16 10:44:19,449 xinference.core.supervisor 111 DEBUG    [request a6118f96-bb57-11ef-8f84-264d7caea4aa] Enter describe_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7f60aead57c0>,sparrowx-Qwen2.5-Coder-32B-Instruct, kwargs: 
2024-12-16 10:44:19,449 xinference.core.worker 111 DEBUG    Enter describe_model, args: <xinference.core.worker.WorkerActor object at 0x7f60ae8c5540>, kwargs: model_uid=sparrowx-Qwen2.5-Coder-32B-Instruct-0
2024-12-16T02:44:19.449442086Z 2024-12-16 10:44:19,449 xinference.core.worker 111 DEBUG    Leave describe_model, elapsed time: 0 s
2024-12-16 10:44:19,449 xinference.core.supervisor 111 DEBUG    [request a6118f96-bb57-11ef-8f84-264d7caea4aa] Leave describe_model, elapsed time: 0 s
2024-12-16 10:44:19,451 xinference.core.model 323 DEBUG    Request chat, current serve request count: -67, request limit: inf for the model sparrowx-Qwen2.5-Coder-32B-Instruct
2024-12-16 10:44:19,451 xinference.core.model 323 DEBUG    [request a611eb58-bb57-11ef-a3c1-264d7caea4aa] Enter chat, args: ModelActor(sparrowx-Qwen2.5-Coder-32B-Instruct-0),[{'role': 'system', 'content': "Use the following context as your learned knowledge, inside <context...,{'frequency_penalty': 0.0, 'max_tokens': 512, 'presence_penalty': 0.0, 'temperature': 1.0, 'top_p': ..., kwargs: raw_params={'frequency_penalty': 0.0, 'max_tokens': 512, 'presence_penalty': 0.0, 'stream': True, 'temperature'...
2024-12-16 10:44:19,452 xinference.core.model 323 DEBUG    [request a611eb58-bb57-11ef-a3c1-264d7caea4aa] Leave chat, elapsed time: 0 s
2024-12-16T02:44:19.452588058Z 2024-12-16 10:44:19,452 xinference.core.model 323 DEBUG    After request chat, current serve request count: -67 for the model sparrowx-Qwen2.5-Coder-32B-Instruct
2024-12-16 10:44:21,584 xinference.model.llm.transformers.utils 323 DEBUG    Average throughput for a step: 8.001524374163832 token/s.
  1. 直接调用xinference接口(stream=false),会返回报错
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

从GPU memory监控看,模型未oom和重启。

Expected behavior / 期待表现

可以正常的调用模型对话

@XprobeBot XprobeBot added the gpu label Dec 16, 2024
@XprobeBot XprobeBot added this to the v1.x milestone Dec 16, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Dec 16, 2024

在哪里对话?

@jiusi9
Copy link
Author

jiusi9 commented Dec 16, 2024

在哪里对话?

在dify对话一直转圈圈。
我调xinference /v1/chat/complete接口会超时报错

@wanfade
Copy link

wanfade commented Dec 18, 2024

+1
在dify中注册openai格式的llm以后,问答一直转圈。
注册xinference格式的llm,可以正常问答

@qinxuye
Copy link
Contributor

qinxuye commented Dec 18, 2024

dify里有xinf的为啥还要用 openai格式呢

@wanfade
Copy link

wanfade commented Dec 18, 2024

用其他工具调用了dify的模型注册的接口;注册的时候没有选xinference,而是按照openai格式注册了。改成xinference的provider就好了

Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants