-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple process per device #2916
Allow multiple process per device #2916
Conversation
Thanks for the PR. Do you have an example code snippet that produces the error you mentioned? |
@BenjaminBossan from accelerate import PartialState
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
distributed_state = PartialState()
pipe.to(distributed_state.device)
with distributed_state.split_between_processes(["a dog", "a cat", "a snail", "a hedgehog"]) as prompt:
print(f"Running {prompt} on process {distributed_state.process_index}, device {distributed_state.device}")
result = pipe(prompt).images
result[0].save(f"{prompt[0]}.png") Command: CUDA_VISIBLE_DEVICES=0,1 accelerate launch --num-processes 4 distributed_inference.py Error:
With this patch, it outputs:
|
Thanks for providing the example. I wonder if it could be adapted as a unit test on a multi GPU runner, assuming we want to proceed with the PR as is. I'll let @muellerzr decide this.
I think this line can safely be removed, as it's covered by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement!
If you can add a test perhaps in src/accelerate/test_utils/scripts/test_ops.py
most likely, or in test_script.py
?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Zach Mueller <[email protected]>
@muellerzr Would you have any more guidance on where and how to add the test? I did try adding a test to @require_multi_device
def test_multiple_processes_per_device_ops(self):
num_processes = 2 * device_count
print(f"Found {device_count} devices, testing {num_processes} processes.")
cmd = get_launch_command(num_processes=num_processes) + [self.operation_file_path]
with patch_environment(omp_num_threads=1):
execute_subprocess_async(cmd) However, this fails on
|
That makes me a bit hesitant to push this in, since generally users will likely want to do |
I just save the files to a directory as in the example in the docs. I agree it would be better to have support for Related discussions I found: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. As long as our main tests don't break, and your use-case works, I don't see an issue to why we can't merge this in since that's what we really care about.
Do you have two GPUs to run RUN_SLOW=1 pytest -sv tests/
with?
I get the same set of failed tests for
|
Great sounds good! Thanks for the improvement! |
Running multiple replicas of a model on each device can speed up inference. However,
accelerate
currently does not seem to support this. Settingnum_processes
to a value larger than the number of available devices results in an error (for CUDA: "invalid device ordinal").What does this PR do?
Makes sure that the device index is always valid by taking the remainder upon division by
device_count()
, hence allowing multiple processes to run on each device by settingnum_processes
to a multiple ofdevice_count()
.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@muellerzr @BenjaminBossan @SunMarc