We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubuntu 24.04 单卡4090
最新版
无
不知道有没有试过和ollama的对比,我测试了一下,同样的模型ollama的速度明显会快不少,不知道是不是我部署的问题。请问在性能方面有没有什么说明啥的
请问有没有性能方面的说明啥的 比如测能测试
The text was updated successfully, but these errors were encountered:
有部署到显卡上吗?
Sorry, something went wrong.
有的,是用显卡跑的 直观现象是卡,我研究了下接口, 它的流式接口的输出是离散的,比如在16:02:29秒输出五段文字,下一次的输出就会在16:02:34了,中间会卡5秒左右
This issue is stale because it has been open for 7 days with no activity.
No branches or pull requests
System Info / 系統信息
Ubuntu 24.04
单卡4090
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
最新版
The command used to start Xinference / 用以启动 xinference 的命令
无
Reproduction / 复现过程
不知道有没有试过和ollama的对比,我测试了一下,同样的模型ollama的速度明显会快不少,不知道是不是我部署的问题。请问在性能方面有没有什么说明啥的
Expected behavior / 期待表现
请问有没有性能方面的说明啥的
比如测能测试
The text was updated successfully, but these errors were encountered: