Multi-GPU Support #4139

laetokang · 2024-07-30T02:15:51Z

laetokang
Jul 30, 2024

Hi! I'm interested in using ComfyUI with multiple GPUs for both training and inference.
Specifically, I'm planning to utilize H100s. However, I haven't been able to find any specific options for multi-GPU support.
I would greatly appreciate if someone could provide information on

Whether multi-GPU support is possible in ComfyUI,
If it is possible, how to implement it

Any guidance or insights on this matter would be extremely helpful.

Answered by ltdrdata

Jul 30, 2024

SwarmUI provides a UI that can handle multiple ComfyUI instances as backends at once.
https://github.com/mcmonkeyprojects/SwarmUI

Currently, ComfyUI does not provide a method to execute workflows in parallel.

If you are a developer and want to implement inference functionality for multiple GPUs, I think modifying the KSampler would be the most effective approach.

If I had a multiple GPU environment, I would like to experiment with this, but I'm not sure if PyTorch can properly handle this scenario.

It's important to note that several custom nodes use implementations that hijack the Sampling function. Your modifications might cause issues with these nodes.

View full answer

xuyh1024 · 2024-07-30T02:31:58Z

xuyh1024
Jul 30, 2024

I've been having the same problem and I'm willing to pay!!

0 replies

ltdrdata · 2024-07-30T08:36:32Z

ltdrdata
Jul 30, 2024
Collaborator

SwarmUI provides a UI that can handle multiple ComfyUI instances as backends at once.
https://github.com/mcmonkeyprojects/SwarmUI

Currently, ComfyUI does not provide a method to execute workflows in parallel.

If you are a developer and want to implement inference functionality for multiple GPUs, I think modifying the KSampler would be the most effective approach.

If I had a multiple GPU environment, I would like to experiment with this, but I'm not sure if PyTorch can properly handle this scenario.

It's important to note that several custom nodes use implementations that hijack the Sampling function. Your modifications might cause issues with these nodes.

0 replies

BrechtCorbeel · 2024-08-02T12:09:41Z

BrechtCorbeel
Aug 2, 2024

SwarmUI provides a UI that can handle multiple ComfyUI instances as backends at once.
https://github.com/mcmonkeyprojects/SwarmUI

Currently, ComfyUI does not provide a method to execute workflows in parallel.

If you are a developer and want to implement inference functionality for multiple GPUs, I think modifying the KSampler would be the most effective approach.

If I had a multiple GPU environment, I would like to experiment with this, but I'm not sure if PyTorch can properly handle this scenario.

It's important to note that several custom nodes use implementations that hijack the Sampling function. Your modifications might cause issues with these nodes.

Pytorch does provide multiGPU handling, atleast you do have functionality for choosing devices so should be doable.

0 replies

CremantS · 2024-08-27T11:25:49Z

CremantS
Aug 27, 2024

To be honest i have no idea but I have the same problem cause i NEEED more VRAM but havent bought an extra GPU yet.

I found two Custom nodes in the ComfyUI Node Manager tho.

Its called ComfyUI_NetDist and the descriptions says:

"Run ComfyUI workflows on multiple local GPUs/networked machines. Nodes: Remote images, Local Remote control"

let me know if it works casue imma buy and extra GPU then! Best of luck :)

0 replies

erikturn · 2024-10-25T06:28:32Z

erikturn
Oct 25, 2024

Hmn... the stuff about splitting model execution parts across GPUs is more complicated... a good start would be update the queue node to support sending the next available job in the queue to the next available local GPU and or remote server URL to accelerate cranking through the queued jobs. Probably add some UI to select which GPUs to queue on, in case you have mixed size memory GPUs and some are too small for that model. Model splitting is useful too, but I think batching across more cards is lower hanging fruit that would be useful in the near term. Add an asynchronous completion node to shove in in front of the image save node too, so the job knows how to route results back instead of writing the file out locally on the remote server.

EasyDiffusion is/was fantastic at this kind of thing, but it's in maintenance mode and was recently broken by Huggingface changing their URLs around.

As AI accelerates there will be a glut of older cards on the market that are considered too slow for the miners but are still useful, and it turns out that hooking up a ton of cards in a mining rack with pcie 1x extenders will limit you to like 1x pcie lanes, but for a lot of stuff that's only going to affect the model load times....for rendering batches of images on the same model the models are already loaded on the card so after the first one pcie link speed doesn't matter much, and you can attach on the order of 28 GPUs to a single system with those extenders, a couple mining case racks, and using server power supplies...e.g. I picked up a used mining rig with 6x nvidia 6gb 1060's for super cheap, and picked up a ton of Tesla M40 24GB cards for under $100 each and a bunch of Tesla K80 cards for $30 each....while the perf is on the slow side, and eventually they will drop off the cuda support list for latest nvidia compute capabilities, but those old 24gb and dual 12gb cards have a lot of life left in them. (assuming you either diy coolers or have a proper server chassis). they have enough memory to run the bigger models easily and I don't really care if I have to wait overnight to generate a few thousand images....running 8 of them in parallel isn't that far off the speed of a newer card if you can get it to be easy to batch the jobs across cards. You can also use full speed 16x riser cables to hook up things at full speed....a bunch of motherboards that physically don't allow you to plug in and use all the pcie ports suddenly let you use all the ports when you use risers, and ATX power supplies are a joke compared to the server breakout board setups. If you don't mind your rig looking like Lain's bedroom from 90's anime. It's not like you wouldn't stick the thing in your garage or basement anyway.

I'm trying to play with NetDist and StableSwarmUI, and it does fire up across cards, but it's also pretty janky and goofy. StableSwarmUI handles launching a bunch of server instances easily enough, but ComfyUI needs a way to submit the jobs easily.... and I think the best place to start that would be at the queueing.

3 replies

sonukaloshiya Oct 25, 2024

ltdrdata Oct 25, 2024
Collaborator

We are discusssing about this feature.
#3683

frankyifei Nov 4, 2024

Hmn... the stuff about splitting model execution parts across GPUs is more complicated... a good start would be update the queue node to support sending the next available job in the queue to the next available local GPU and or remote server URL to accelerate cranking through the queued jobs. Probably add some UI to select which GPUs to queue on, in case you have mixed size memory GPUs and some are too small for that model. Model splitting is useful too, but I think batching across more cards is lower hanging fruit that would be useful in the near term. Add an asynchronous completion node to shove in in front of the image save node too, so the job knows how to route results back instead of writing the file out locally on the remote server.

EasyDiffusion is/was fantastic at this kind of thing, but it's in maintenance mode and was recently broken by Huggingface changing their URLs around.

As AI accelerates there will be a glut of older cards on the market that are considered too slow for the miners but are still useful, and it turns out that hooking up a ton of cards in a mining rack with pcie 1x extenders will limit you to like 1x pcie lanes, but for a lot of stuff that's only going to affect the model load times....for rendering batches of images on the same model the models are already loaded on the card so after the first one pcie link speed doesn't matter much, and you can attach on the order of 28 GPUs to a single system with those extenders, a couple mining case racks, and using server power supplies...e.g. I picked up a used mining rig with 6x nvidia 6gb 1060's for super cheap, and picked up a ton of Tesla M40 24GB cards for under $100 each and a bunch of Tesla K80 cards for $30 each....while the perf is on the slow side, and eventually they will drop off the cuda support list for latest nvidia compute capabilities, but those old 24gb and dual 12gb cards have a lot of life left in them. (assuming you either diy coolers or have a proper server chassis). they have enough memory to run the bigger models easily and I don't really care if I have to wait overnight to generate a few thousand images....running 8 of them in parallel isn't that far off the speed of a newer card if you can get it to be easy to batch the jobs across cards. You can also use full speed 16x riser cables to hook up things at full speed....a bunch of motherboards that physically don't allow you to plug in and use all the pcie ports suddenly let you use all the ports when you use risers, and ATX power supplies are a joke compared to the server breakout board setups. If you don't mind your rig looking like Lain's bedroom from 90's anime. It's not like you wouldn't stick the thing in your garage or basement anyway.

I'm trying to play with NetDist and StableSwarmUI, and it does fire up across cards, but it's also pretty janky and goofy. StableSwarmUI handles launching a bunch of server instances easily enough, but ComfyUI needs a way to submit the jobs easily.... and I think the best place to start that would be at the queueing.

have you tried xDit or distrifuser

Sumbawa · 2024-11-03T15:43:50Z

Sumbawa
Nov 3, 2024

is it possible to use this ? https://github.com/mit-han-lab/distrifuser

0 replies

frankyifei · 2024-11-04T07:07:53Z

frankyifei
Nov 4, 2024

xDit provided a comfy wrapper for their multiple GPU solution

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU Support #4139

{{title}}

Replies: 7 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Multi-GPU Support #4139

Replies: 7 comments · 2 replies

ltdrdata Jul 30, 2024 Collaborator

ltdrdata Oct 25, 2024 Collaborator

Replies: 7 comments 2 replies

ltdrdata
Jul 30, 2024
Collaborator

ltdrdata Oct 25, 2024
Collaborator