-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support concurrent embedding, update LangChain QA demo with multithreaded embedding creation #348
base: main
Are you sure you want to change the base?
ENH: Support concurrent embedding, update LangChain QA demo with multithreaded embedding creation #348
Conversation
Embedding is a CPU-intensive call, and even for a stateless actor, it is not executed simultaneously because the current loop lock is not released until the first call. Therefore, the embedding operation needs to be called with 'to_thread' in model actor. However, I have tried it, and even embedding is not thread-safe for llamacpp, and the process results in a core dump if called concurrently. |
We can first try supporting concurrent embedding creation for PyTorch models.
|
No description provided.