We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sentence-transformers 3.0.1 transformers 4.42.3 transformers-stream-generator 0.0.5
nvidia-cublas-cu12 12.4.5.8 单独升级过,为了适配H20 gpu nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.560.30 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105
xinference 1.0.1
xinference-local --host 0.0.0.0 --port 8080 --log-level debug && xinference register -t LLM -f /opt/xinference/configs/Qwen2.5-Coder-32B-Instruct.json -p && xinference launch --model-name sparrowx-Qwen2.5-Coder-32B-Instruct --model-engine transformers --quantization 4-bit
每次模型启动起来都可以正常回答,过了一会再次调用的时候,
2024-12-16 10:44:19,407 xinference.core.supervisor 111 DEBUG [request a60b20ac-bb57-11ef-8f84-264d7caea4aa] Enter describe_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7f60aead57c0>,sparrowx-Qwen2.5-Coder-32B-Instruct, kwargs: 2024-12-16 10:44:19,407 xinference.core.worker 111 DEBUG Enter describe_model, args: <xinference.core.worker.WorkerActor object at 0x7f60ae8c5540>, kwargs: model_uid=sparrowx-Qwen2.5-Coder-32B-Instruct-0 2024-12-16T02:44:19.407386140Z 2024-12-16 10:44:19,407 xinference.core.worker 111 DEBUG Leave describe_model, elapsed time: 0 s 2024-12-16 10:44:19,407 xinference.core.supervisor 111 DEBUG [request a60b20ac-bb57-11ef-8f84-264d7caea4aa] Leave describe_model, elapsed time: 0 s 2024-12-16 10:44:19,436 xinference.core.supervisor 111 DEBUG [request a60fa550-bb57-11ef-8f84-264d7caea4aa] Enter describe_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7f60aead57c0>,sparrowx-Qwen2.5-Coder-32B-Instruct, kwargs: 2024-12-16T02:44:19.436893790Z 2024-12-16 10:44:19,436 xinference.core.worker 111 DEBUG Enter describe_model, args: <xinference.core.worker.WorkerActor object at 0x7f60ae8c5540>, kwargs: model_uid=sparrowx-Qwen2.5-Coder-32B-Instruct-0 2024-12-16 10:44:19,436 xinference.core.worker 111 DEBUG Leave describe_model, elapsed time: 0 s 2024-12-16 10:44:19,436 xinference.core.supervisor 111 DEBUG [request a60fa550-bb57-11ef-8f84-264d7caea4aa] Leave describe_model, elapsed time: 0 s 2024-12-16 10:44:19,448 xinference.core.supervisor 111 DEBUG [request a6116b42-bb57-11ef-8f84-264d7caea4aa] Enter get_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7f60aead57c0>,sparrowx-Qwen2.5-Coder-32B-Instruct, kwargs: 2024-12-16T02:44:19.448512062Z 2024-12-16 10:44:19,448 xinference.core.worker 111 DEBUG Enter get_model, args: <xinference.core.worker.WorkerActor object at 0x7f60ae8c5540>, kwargs: model_uid=sparrowx-Qwen2.5-Coder-32B-Instruct-0 2024-12-16T02:44:19.448549747Z 2024-12-16 10:44:19,448 xinference.core.worker 111 DEBUG Leave get_model, elapsed time: 0 s 2024-12-16 10:44:19,448 xinference.core.supervisor 111 DEBUG [request a6116b42-bb57-11ef-8f84-264d7caea4aa] Leave get_model, elapsed time: 0 s 2024-12-16T02:44:19.449313425Z 2024-12-16 10:44:19,449 xinference.core.supervisor 111 DEBUG [request a6118f96-bb57-11ef-8f84-264d7caea4aa] Enter describe_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7f60aead57c0>,sparrowx-Qwen2.5-Coder-32B-Instruct, kwargs: 2024-12-16 10:44:19,449 xinference.core.worker 111 DEBUG Enter describe_model, args: <xinference.core.worker.WorkerActor object at 0x7f60ae8c5540>, kwargs: model_uid=sparrowx-Qwen2.5-Coder-32B-Instruct-0 2024-12-16T02:44:19.449442086Z 2024-12-16 10:44:19,449 xinference.core.worker 111 DEBUG Leave describe_model, elapsed time: 0 s 2024-12-16 10:44:19,449 xinference.core.supervisor 111 DEBUG [request a6118f96-bb57-11ef-8f84-264d7caea4aa] Leave describe_model, elapsed time: 0 s 2024-12-16 10:44:19,451 xinference.core.model 323 DEBUG Request chat, current serve request count: -67, request limit: inf for the model sparrowx-Qwen2.5-Coder-32B-Instruct 2024-12-16 10:44:19,451 xinference.core.model 323 DEBUG [request a611eb58-bb57-11ef-a3c1-264d7caea4aa] Enter chat, args: ModelActor(sparrowx-Qwen2.5-Coder-32B-Instruct-0),[{'role': 'system', 'content': "Use the following context as your learned knowledge, inside <context...,{'frequency_penalty': 0.0, 'max_tokens': 512, 'presence_penalty': 0.0, 'temperature': 1.0, 'top_p': ..., kwargs: raw_params={'frequency_penalty': 0.0, 'max_tokens': 512, 'presence_penalty': 0.0, 'stream': True, 'temperature'... 2024-12-16 10:44:19,452 xinference.core.model 323 DEBUG [request a611eb58-bb57-11ef-a3c1-264d7caea4aa] Leave chat, elapsed time: 0 s 2024-12-16T02:44:19.452588058Z 2024-12-16 10:44:19,452 xinference.core.model 323 DEBUG After request chat, current serve request count: -67 for the model sparrowx-Qwen2.5-Coder-32B-Instruct 2024-12-16 10:44:21,584 xinference.model.llm.transformers.utils 323 DEBUG Average throughput for a step: 8.001524374163832 token/s.
raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response
从GPU memory监控看,模型未oom和重启。
可以正常的调用模型对话
The text was updated successfully, but these errors were encountered:
在哪里对话?
Sorry, something went wrong.
在dify对话一直转圈圈。 我调xinference /v1/chat/complete接口会超时报错
+1 在dify中注册openai格式的llm以后,问答一直转圈。 注册xinference格式的llm,可以正常问答
dify里有xinf的为啥还要用 openai格式呢
用其他工具调用了dify的模型注册的接口;注册的时候没有选xinference,而是按照openai格式注册了。改成xinference的provider就好了
This issue is stale because it has been open for 7 days with no activity.
No branches or pull requests
System Info / 系統信息
sentence-transformers 3.0.1
transformers 4.42.3
transformers-stream-generator 0.0.5
nvidia-cublas-cu12 12.4.5.8 单独升级过,为了适配H20 gpu
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.560.30
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.5.82
nvidia-nvtx-cu12 12.1.105
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
xinference 1.0.1
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local --host 0.0.0.0 --port 8080 --log-level debug && xinference register -t LLM -f /opt/xinference/configs/Qwen2.5-Coder-32B-Instruct.json -p && xinference launch --model-name sparrowx-Qwen2.5-Coder-32B-Instruct --model-engine transformers --quantization 4-bit
Reproduction / 复现过程
每次模型启动起来都可以正常回答,过了一会再次调用的时候,
从GPU memory监控看,模型未oom和重启。
Expected behavior / 期待表现
可以正常的调用模型对话
The text was updated successfully, but these errors were encountered: