Optimizing ComfyUI for Parallel Workflow Processing on a Single GPU #5941
Unanswered
menahem121
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Unfortunately, utilizing multiple GPUs simultaneously within a single instance is not currently supported. Please refer to the discussions regarding this improvement. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
My goal is to serve a large number of users by running multiple workflows in parallel on a single GPU. Currently, ComfyUI processes workflows sequentially within a single server, which works well for one user but becomes restrictive when handling multiple simultaneous requests.
The Challenge
In my tests, generating a single image takes about 1 second. However, with the current sequential queue system, if 30 requests are made, the 30th request takes 30 seconds to complete, which is not ideal for scaling.
What I’ve Tried
To address this, I set up three ComfyUI servers on a single AWS instance with a 48GB NVIDIA GPU. While this reduced generation times to 1–2 seconds per image, I encountered GPU overloading.
The issue seems to stem from how each server loads models independently. For example:
When Server 1 runs Workflow 1, it loads the required models into memory.
When Server 2 starts the same workflow, it reloads the same models, duplicating them in VRAM.
The same happens with Server 3.
This duplication wastes GPU memory and limits efficiency. I understand that this might be intentional if each server requires its own latent space to process workflows.
Question
Is there a way to optimize this setup so that multiple ComfyUI servers can share the same models in VRAM? Or is there another recommended approach for serving parallel workflows more efficiently?
Thank you for your insights!
Beta Was this translation helpful? Give feedback.
All reactions