Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][compiled graph] Support all-to-one collective ops (e.g. gather) #49324

Open
jeffreyjeffreywang opened this issue Dec 18, 2024 · 2 comments
Labels
compiled-graphs core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks

Comments

@jeffreyjeffreywang
Copy link
Contributor

Description

As part of the effort (meta-issue: #47983) to support collective communication ops, we need to support all-to-one (gather) patterns. As discussed in the RFC, we will pass the reader worker handle to the collective call.

workers = [Worker.options(num_gpus=1).remote() for _ in range(3)]
nccl_group_handle = ray.collective.NcclGroup(workers)
with InputNode() as inp:
  results = [worker.fwd.bind(inp) for worker in workers]
  # Pass the worker handle to the collective call.
  dag = ray.collective.gather.bind(
    results, workers[0],
    transport=nccl_group_handle)
  dag = workers[0].sync.bind(dag)

# Errors if `gather` reader is not part of the group.
dag = dag.experimental_compile()

Use case

No response

@jeffreyjeffreywang jeffreyjeffreywang added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Dec 18, 2024
@jeffreyjeffreywang
Copy link
Contributor Author

I'm getting a head start on this issue.

@jcotant1 jcotant1 added core Issues that should be addressed in Ray Core compiled-graphs labels Dec 18, 2024
@ruisearch42 ruisearch42 added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Dec 18, 2024
@Moonquakes
Copy link

Maybe we can apply RDMA directly to all Ray Object transmissions? #30094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiled-graphs core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants