Allow multiple process per device #2916

cifkao · 2024-07-04T20:20:25Z

Running multiple replicas of a model on each device can speed up inference. However, accelerate currently does not seem to support this. Setting num_processes to a value larger than the number of available devices results in an error (for CUDA: "invalid device ordinal").

What does this PR do?

Makes sure that the device index is always valid by taking the remainder upon division by device_count(), hence allowing multiple processes to run on each device by setting num_processes to a multiple of device_count().

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@muellerzr @BenjaminBossan @SunMarc

BenjaminBossan · 2024-07-05T10:12:37Z

Thanks for the PR. Do you have an example code snippet that produces the error you mentioned?

cifkao · 2024-07-06T18:24:18Z

@BenjaminBossan
I adapted the stable diffusion example from the docs:

from accelerate import PartialState
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
distributed_state = PartialState()
pipe.to(distributed_state.device)

with distributed_state.split_between_processes(["a dog", "a cat", "a snail", "a hedgehog"]) as prompt:
    print(f"Running {prompt} on process {distributed_state.process_index}, device {distributed_state.device}")
    result = pipe(prompt).images
    result[0].save(f"{prompt[0]}.png")

Command:

CUDA_VISIBLE_DEVICES=0,1 accelerate launch --num-processes 4 distributed_inference.py

Error:

[rank3]: Traceback (most recent call last):
[rank3]:   File "/home/ondrej/proj/accelerate/distributed_inference.py", line 6, in <module>
[rank3]:     distributed_state = PartialState()
[rank3]:                         ^^^^^^^^^^^^^^
[rank3]:   File "/home/ondrej/proj/accelerate/src/accelerate/state.py", line 280, in __init__
[rank3]:     self.set_device()
[rank3]:   File "/home/ondrej/proj/accelerate/src/accelerate/state.py", line 790, in set_device
[rank3]:     torch.cuda.set_device(self.device)
[rank3]:   File "/home/ondrej/mambaforge/envs/accelerate/lib/python3.11/site-packages/torch/cuda/__init__.py", line 399, in set_device
[rank3]:     torch._C._cuda_setDevice(device)
[rank3]: RuntimeError: CUDA error: invalid device ordinal
[rank3]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank3]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank3]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

[rank2]: Traceback (most recent call last):
[rank2]:   File "/home/ondrej/proj/accelerate/distributed_inference.py", line 6, in <module>
[rank2]:     distributed_state = PartialState()
[rank2]:                         ^^^^^^^^^^^^^^
[rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/state.py", line 280, in __init__
[rank2]:     self.set_device()
[rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/state.py", line 790, in set_device
[rank2]:     torch.cuda.set_device(self.device)
[rank2]:   File "/home/ondrej/mambaforge/envs/accelerate/lib/python3.11/site-packages/torch/cuda/__init__.py", line 399, in set_device
[rank2]:     torch._C._cuda_setDevice(device)
[rank2]: RuntimeError: CUDA error: invalid device ordinal
[rank2]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank2]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank2]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

W0706 11:17:44.554000 139756795762496 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 4070667 closing signal SIGTERM
W0706 11:17:44.555000 139756795762496 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 4070670 closing signal SIGTERM
W0706 11:17:44.555000 139756795762496 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 4070671 closing signal SIGTERM
E0706 11:17:45.083000 139756795762496 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 3 (pid: 4070672) of binary: /home/ondrej/mambaforge/env
s/accelerate/bin/python3.11

With this patch, it outputs:

Running ['a dog'] on process 0, device cuda:0
Running ['a snail'] on process 2, device cuda:0
Running ['a hedgehog'] on process 3, device cuda:1
Running ['a cat'] on process 1, device cuda:1

BenjaminBossan · 2024-07-08T09:45:08Z

Thanks for providing the example. I wonder if it could be adapted as a unit test on a multi GPU runner, assuming we want to proceed with the PR as is. I'll let @muellerzr decide this.

assert device in ("cpu", "cuda", "mlu", "npu", "xpu")

I think this line can safely be removed, as it's covered by the ValueError a few lines above.

muellerzr

Thanks for the improvement!

If you can add a test perhaps in src/accelerate/test_utils/scripts/test_ops.py most likely, or in test_script.py?

src/accelerate/state.py

HuggingFaceDocBuilderDev · 2024-07-08T16:03:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: Zach Mueller <[email protected]>

cifkao · 2024-07-09T20:53:47Z

@muellerzr Would you have any more guidance on where and how to add the test?

I did try adding a test to tests/test_multigpu.py, something like this:

    @require_multi_device
    def test_multiple_processes_per_device_ops(self):
        num_processes = 2 * device_count
        print(f"Found {device_count} devices, testing {num_processes} processes.")
        cmd = get_launch_command(num_processes=num_processes) + [self.operation_file_path]
        with patch_environment(omp_num_threads=1):
            execute_subprocess_async(cmd)

However, this fails on gather because it uses NCCL, which doesn't seem to like having multiple processes use a single device:

stderr: [rank2]: Traceback (most recent call last):
stderr: [rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/test_utils/scripts/test_ops.py", line 179, in <module>
stderr: [rank2]:     main()
stderr: [rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/test_utils/scripts/test_ops.py", line 159, in main
stderr: [rank2]:     test_gather(state)
stderr: [rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/test_utils/scripts/test_ops.py", line 39, in test_gather
stderr: [rank2]:     gathered_tensor = gather(tensor)
stderr: [rank2]:                       ^^^^^^^^^^^^^^
stderr: [rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/utils/operations.py", line 375, in wrapper
stderr: [rank2]:     return function(*args, **kwargs)
stderr: [rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^
stderr: [rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/utils/operations.py", line 436, in gather
stderr: [rank2]:     return _gpu_gather(tensor)
stderr: [rank2]:            ^^^^^^^^^^^^^^^^^^^
stderr: [rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/utils/operations.py", line 355, in _gpu_gather
stderr: [rank2]:     return recursively_apply(_gpu_gather_one, tensor, error_on_other_type=True)
stderr: [rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
stderr: [rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/utils/operations.py", line 126, in recursively_apply
stderr: [rank2]:     return func(data, *args, **kwargs)
stderr: [rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
stderr: [rank2]:   File "/home/ondrej/proj/accelerate/src/accelerate/utils/operations.py", line 345, in _gpu_gather_one
stderr: [rank2]:     gather_op(output_tensors, tensor)
stderr: [rank2]:   File "/home/ondrej/mambaforge/envs/accelerate/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
stderr: [rank2]:     return func(*args, **kwargs)
stderr: [rank2]:            ^^^^^^^^^^^^^^^^^^^^^
stderr: [rank2]:   File "/home/ondrej/mambaforge/envs/accelerate/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 2948, in all_gather_into_tensor
stderr: [rank2]:     work = group._allgather_base(output_tensor, input_tensor, opts)
stderr: [rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
stderr: [rank2]: torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1970, invalid usage (run with NCCL_DEBUG=WARN for details), NCCL version 2.20.5
stderr: [rank2]: ncclInvalidUsage: This usually reflects invalid usage of NCCL library.
stderr: [rank2]: Last error:
stderr: [rank2]: Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 1000

muellerzr · 2024-07-09T20:57:56Z

That makes me a bit hesitant to push this in, since generally users will likely want to do .gather() on all the outputs after inference. How are you collecting the outputs from the model to a centralized location?

cifkao · 2024-07-10T09:20:40Z

I just save the files to a directory as in the example in the docs. I agree it would be better to have support for gather...

Related discussions I found:

muellerzr

That makes sense. As long as our main tests don't break, and your use-case works, I don't see an issue to why we can't merge this in since that's what we really care about.

Do you have two GPUs to run RUN_SLOW=1 pytest -sv tests/ with?

cifkao · 2024-07-11T13:23:52Z

I get the same set of failed tests for main and this branch:

FAILED tests/deepspeed/test_deepspeed.py::DeepSpeedConfigIntegration::test_autofill_comm_buffers_dsconfig_config_with_hidden_size - deepspeed.ops.op_builder.builder.CUDAMismatc
hException: >- DeepSpeed Op Builder: Installed CUDA version 10.1 does not match the version torch was compiled with 12.1, unabl...
FAILED tests/deepspeed/test_deepspeed.py::DeepSpeedConfigIntegration::test_autofill_comm_buffers_dsconfig_config_with_hidden_sizes - deepspeed.ops.op_builder.builder.CUDAMismat
chException: >- DeepSpeed Op Builder: Installed CUDA version 10.1 does not match the version torch was compiled with 12.1, unabl...
FAILED tests/deepspeed/test_deepspeed.py::DeepSpeedConfigIntegration::test_prepare_deepspeed_custom_optimizer_custom_scheduler - deepspeed.ops.op_builder.builder.CUDAMismatchEx
ception: >- DeepSpeed Op Builder: Installed CUDA version 10.1 does not match the version torch was compiled with 12.1, unabl...
FAILED tests/deepspeed/test_deepspeed.py::DeepSpeedConfigIntegration::test_prepare_deepspeed_custom_optimizer_deepspeed_scheduler - deepspeed.ops.op_builder.builder.CUDAMismatc
hException: >- DeepSpeed Op Builder: Installed CUDA version 10.1 does not match the version torch was compiled with 12.1, unabl...
FAILED tests/test_accelerator.py::AcceleratorTester::test_get_state_dict_from_offload_use_pytorch - ValueError: cannot mmap an empty file
FAILED tests/test_accelerator.py::AcceleratorTester::test_get_state_dict_from_offload_use_safetensors - ValueError: cannot mmap an empty file
FAILED tests/test_accelerator.py::AcceleratorTester::test_save_model_offload_use_pytorch - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_accelerator.py::AcceleratorTester::test_save_model_offload_use_safetensors - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_accelerator.py::AcceleratorTester::test_save_sharded_model_use_pytorch - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_accelerator.py::AcceleratorTester::test_save_sharded_model_use_safetensors - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_cpu_offload - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_cpu_offload_with_unused_submodules - RuntimeError: weight should contain 5 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_disk_offload - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_disk_offload_with_unused_submodules - RuntimeError: weight should contain 5 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_dispatch_model - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_dispatch_model_and_remove_hook - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_dispatch_model_copy - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_dispatch_model_force_hooks - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_dispatch_model_multi_devices - RuntimeError: weight should contain 5 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_dispatch_model_with_non_persistent_buffers - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_dispatch_model_with_unused_submodules - RuntimeError: weight should contain 5 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_dispatch_model_with_unused_submodules_multi_device - RuntimeError: weight should contain 5 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_init_empty_weights - NotImplementedError: Cannot copy out of meta tensor; no data!
FAILED tests/test_big_modeling.py::BigModelingTester::test_load_checkpoint_and_dispatch - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_load_checkpoint_and_dispatch_multi_device - RuntimeError: weight should contain 5 elements not 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_load_checkpoint_and_dispatch_multi_device_with_unused_submodules - RuntimeError: weight should contain 5 elements not
 0
FAILED tests/test_big_modeling.py::BigModelingTester::test_load_checkpoint_and_dispatch_with_unused_submodules - RuntimeError: weight should contain 5 elements not 0
FAILED tests/test_cli.py::AccelerateLauncherTester::test_accelerate_test - RuntimeError: 'accelerate test' failed with returncode 1                                 [8592/45802]
FAILED tests/test_cli.py::ModelEstimatorTester::test_gated - AssertionError: GatedRepoError not raised : Repo for model `meta-llama/Llama-2-7b-hf` is gated
FAILED tests/test_examples.py::FeatureExamplesTests::test_checkpointing_by_epoch - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_fi
le=/tmp/tmpyq6p0qhv/default_config.yml examples/by_feature/checkpointing.py --...
FAILED tests/test_examples.py::FeatureExamplesTests::test_checkpointing_by_steps - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_fi
le=/tmp/tmpyq6p0qhv/default_config.yml examples/by_feature/checkpointing.py --...
FAILED tests/test_examples.py::FeatureExamplesTests::test_cross_validation - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file=/tm
p/tmpyq6p0qhv/default_config.yml examples/by_feature/cross_validation.py...
FAILED tests/test_examples.py::FeatureExamplesTests::test_ddp_comm_hook - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file=/tmp/t
mpyq6p0qhv/default_config.yml examples/by_feature/ddp_comm_hook.py --...
FAILED tests/test_examples.py::FeatureExamplesTests::test_distributed_inference_examples_phi2 - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launc
h --config_file=/tmp/tmpyq6p0qhv/default_config.yml examples/inference/distributed/phi2.py`...
FAILED tests/test_examples.py::FeatureExamplesTests::test_distributed_inference_examples_stable_diffusion - accelerate.test_utils.testing.SubprocessCallException: Command `acce
lerate launch --config_file=/tmp/tmpyq6p0qhv/default_config.yml examples/inference/distributed/stable_d...
FAILED tests/test_examples.py::FeatureExamplesTests::test_early_stopping - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file=/tmp/
tmpyq6p0qhv/default_config.yml examples/by_feature/early_stopping.py` ...
FAILED tests/test_examples.py::FeatureExamplesTests::test_gradient_accumulation - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_fil
e=/tmp/tmpyq6p0qhv/default_config.yml examples/by_feature/gradient_accumulati...
FAILED tests/test_examples.py::FeatureExamplesTests::test_load_states_by_epoch - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file
=/tmp/tmpyq6p0qhv/default_config.yml examples/by_feature/checkpointing.py --...
FAILED tests/test_examples.py::FeatureExamplesTests::test_load_states_by_steps - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file
=/tmp/tmpyq6p0qhv/default_config.yml examples/by_feature/checkpointing.py --...
FAILED tests/test_examples.py::FeatureExamplesTests::test_local_sgd - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file=/tmp/tmpyq
6p0qhv/default_config.yml examples/by_feature/local_sgd.py` faile...
FAILED tests/test_examples.py::FeatureExamplesTests::test_multi_process_metrics - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_fil
e=/tmp/tmpyq6p0qhv/default_config.yml examples/by_feature/multi_process_metri...
FAILED tests/test_examples.py::FeatureExamplesTests::test_pippy_examples_bert - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file=
/tmp/tmpyq6p0qhv/default_config.yml examples/inference/pippy/bert.py` faile...
FAILED tests/test_examples.py::FeatureExamplesTests::test_pippy_examples_gpt2 - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file=
/tmp/tmpyq6p0qhv/default_config.yml examples/inference/pippy/gpt2.py` faile...
FAILED tests/test_examples.py::FeatureExamplesTests::test_pippy_examples_t5 - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file=/t
mp/tmpyq6p0qhv/default_config.yml examples/inference/pippy/t5.py` failed ...
FAILED tests/test_examples.py::FeatureExamplesTests::test_profiler - accelerate.test_utils.testing.SubprocessCallException: Command `accelerate launch --config_file=/tmp/tmpyq6
p0qhv/default_config.yml examples/by_feature/profiler.py` failed...
FAILED tests/test_grad_sync.py::SyncScheduler::test_gradient_sync_gpu_multi - RuntimeError: 'accelerate launch --num_processes=2 --monitor_interval=0.1 /home/ondrej/proj/accele
rate/src/accelerate/test_utils/scripts/test_sync.py' failed with returnco...
FAILED tests/test_hooks.py::HooksModelTester::test_add_remove_hook_fx_graph_module - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_hooks.py::HooksModelTester::test_align_devices_as_cpu_offload - AssertionError: assert device(type='cuda', index=0) == device(type='cpu')
FAILED tests/test_hooks.py::HooksModelTester::test_align_devices_as_model_parallelism - AssertionError: assert device(type='cuda', index=0) == device(type='cpu')
FAILED tests/test_hooks.py::HooksModelTester::test_attach_align_device_hook_as_cpu_offload - AssertionError: assert device(type='cuda', index=0) == device(type='cpu')
FAILED tests/test_hooks.py::HooksModelTester::test_attach_align_device_hook_as_cpu_offload_with_weight_map - AssertionError: assert device(type='cuda', index=0) == device(type=
'cpu')
FAILED tests/test_hooks.py::HooksModelTester::test_no_grad_in_hook - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_hooks.py::HooksModelTester::test_post_forward_hook_is_executed - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_hooks.py::HooksModelTester::test_pre_forward_hook_is_executed - RuntimeError: weight should contain 4 elements not 0
FAILED tests/test_kwargs_handlers.py::KwargsHandlerTester::test_ddp_comm_hook - RuntimeError: 'accelerate launch --num_processes=2 --monitor_interval=0.1 /home/ondrej/proj/acce
lerate/src/accelerate/test_utils/scripts/test_ddp_comm_hook.py' failed with...
FAILED tests/test_kwargs_handlers.py::KwargsHandlerTester::test_ddp_kwargs - RuntimeError: 'accelerate launch --num_processes=2 --monitor_interval=0.1 /home/ondrej/proj/acceler
ate/tests/test_kwargs_handlers.py' failed with returncode 1
FAILED tests/test_modeling_utils.py::ModelingUtilsTester::test_compute_module_sizes - AssertionError: assert defaultdict(<..._tracked': 8}) == {'': 236, 'ba...cked': 8, ...}
FAILED tests/test_modeling_utils.py::ModelingUtilsTester::test_get_balanced_memory - AssertionError: assert {0: 200, 1: 200} == {0: 187, 1: 200}
FAILED tests/test_modeling_utils.py::ModelingUtilsTester::test_infer_auto_device_map_with_buffer_check_and_multi_devices - assert 1 == 0
FAILED tests/test_modeling_utils.py::ModelingUtilsTester::test_infer_auto_device_map_with_tied_weights - AssertionError: assert OrderedDict([...tchnorm', 1)]) == {'layer1': 0,.
..ear1': 1, ...}
FAILED tests/test_modeling_utils.py::ModelingUtilsTester::test_load_checkpoint_in_model_disk_offload - ValueError: cannot mmap an empty file
FAILED tests/test_offload.py::OffloadTester::test_offload_state_dict - ValueError: cannot mmap an empty file
FAILED tests/test_offload.py::OffloadTester::test_offload_weights_loader - ValueError: cannot mmap an empty file
==================================================== 63 failed, 235 passed, 40 skipped, 519 warnings in 1153.61s (0:19:13) =====================================================

muellerzr · 2024-07-15T13:28:39Z

Great sounds good! Thanks for the improvement!

Allow more processes than devices

c9bb0e5

muellerzr reviewed Jul 8, 2024

View reviewed changes

src/accelerate/state.py Outdated Show resolved Hide resolved

Accept suggestion

f3b2886

Co-authored-by: Zach Mueller <[email protected]>

muellerzr approved these changes Jul 10, 2024

View reviewed changes

Merge branch 'main' into more-processes-than-devices

e280fe1

muellerzr merged commit c6da9f8 into huggingface:main Jul 15, 2024
24 of 25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow multiple process per device #2916

Allow multiple process per device #2916

cifkao commented Jul 4, 2024 •

edited

Loading

BenjaminBossan commented Jul 5, 2024

cifkao commented Jul 6, 2024 •

edited

Loading

BenjaminBossan commented Jul 8, 2024

muellerzr left a comment •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 8, 2024

cifkao commented Jul 9, 2024

muellerzr commented Jul 9, 2024

cifkao commented Jul 10, 2024

muellerzr left a comment

cifkao commented Jul 11, 2024

muellerzr commented Jul 15, 2024

Allow multiple process per device #2916

Allow multiple process per device #2916

Conversation

cifkao commented Jul 4, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

BenjaminBossan commented Jul 5, 2024

cifkao commented Jul 6, 2024 • edited Loading

BenjaminBossan commented Jul 8, 2024

muellerzr left a comment • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 8, 2024

cifkao commented Jul 9, 2024

muellerzr commented Jul 9, 2024

cifkao commented Jul 10, 2024

muellerzr left a comment

Choose a reason for hiding this comment

cifkao commented Jul 11, 2024

muellerzr commented Jul 15, 2024

cifkao commented Jul 4, 2024 •

edited

Loading

cifkao commented Jul 6, 2024 •

edited

Loading

muellerzr left a comment •

edited

Loading