-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请求增加优先级参数,优先调度高优先级的请求 #2669
Labels
Milestone
Comments
vllm是支持优先级的:vllm-project/vllm#8850 |
可以尝试在 generate_config 里支持 priority。 inference/xinference/model/llm/vllm/core.py Lines 334 to 404 in b0b2fa6
欢迎提交 PR。 |
@qinxuye 大哥,这个需求应该很多人都需要。因为私有化部署,往往资源是有限的。不可能部署两套,一套做批处理任务,一套做实时任务。你就动动你的金手指,分分钟把它写了算了!我也想贡献个PR,但是我搞不懂啊! |
这个是个企业级特性,我们只会在企业版上支持类似能力,不过开源欢迎社区贡献。 |
This issue is stale because it has been open for 7 days with no activity. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature request / 功能建议
如题。我们在实际生产环境中,需要长时间跑一些文件处理任务(比如写摘要,且对于单个大文件又要拆分为很多小的请求)。与此同时,用户可能有大模型聊天的需求。如果没有优先级,会导致聊天请求要排队等待前面的文件处理任务执行完,才慢慢调度到聊天的请求。这显然是不能接受的。现在的情况是,我们一有文件在处理(比如几百个),就(基本)无法聊天了
Motivation / 动机
我希望咱们能扩展openai的参数,增加一个priority的参数。我不希望是增加新的接口。我希望能够保持我的程序能随时切换到任何支持openai接口的大模型。
Your contribution / 您的贡献
无
The text was updated successfully, but these errors were encountered: