-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradio-Web界面推理和Python推理输出不一致 #2705
Comments
Gradio 界面背后也是正常接口,大模型推理不一致是正常现象吧。 |
Gradio 界面调用的是python推理 |
inference/xinference/core/chat_interface.py Lines 187 to 235 in ae7b3f6
这是 gradio 调用的代码。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System Info / 系統信息
Linux: Ubuntu 22.04 LTS
GPU: H100
NVIDIA-SMI: 560.35.05
Driver Version: 560.35.05
CUDA Version: 12.6
python: 3.10.15
gradio: 5.9.1
transformers: 4.47.0
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
xinference: 1.1.0
xinference-client: 0.13.3
The command used to start Xinference / 用以启动 xinference 的命令
xinference-local --host 0.0.0.0 --port 9997
Reproduction / 复现过程
1. Gradio Web界面推理
xinference-local --host 0.0.0.0 --port 9997
2. python 推理
根据xinference文档
https://inference.readthedocs.io/zh-cn/latest/index.html
的LLM推理代码修改,其中"max_tokens": 512, "temperature": 1.0,改为跟web界面的默认参数一致:在Web界面推理同一张图片和提示词多次,得到的输出是非常稳定的。
但用同样的图片和提示词,观察Web推理和python推理的输出,发现差异非常大,有时两个回答的结论完全相反了。
3.于是我再用
gradio_client
推理结果我发现跟Web界面推理的也对不上,这个不科学啊,这个
gradio_client
不是模仿Web界面推理的吗?Expected behavior / 期待表现
期待三种方式的推理输出一致
The text was updated successfully, but these errors were encountered: