[c10d] Make barrier as a custom op#79777
[c10d] Make barrier as a custom op#79777alanwaketan wants to merge 3 commits intogh/alanwaketan/41/basefrom
Conversation
Summary: This patch makes alltoall as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Test Plan: python test/distributed/test_c10d_nccl.py -k test_nccl_barrier ...and other existing distributed tests. [ghstack-poisoned]
🔗 Helpful links
✅ No Failures (0 Pending)As of commit f21f40e (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Summary: This patch makes alltoall as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Test Plan: python test/distributed/test_c10d_nccl.py -k test_nccl_barrier ...and other existing distributed tests. ghstack-source-id: aad4231 Pull Request resolved: #79777
Summary: This patch makes barrier as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Test Plan: python test/distributed/test_c10d_nccl.py -k test_nccl_barrier ...and other existing distributed tests. [ghstack-poisoned]
| "alltoall_", | ||
| dispatch(c10::DispatchKey::CompositeExplicitAutograd, alltoall_)); | ||
| m.def( | ||
| "barrier", |
There was a problem hiding this comment.
missing a trailing underscore?
| AllToAllOptions{std::chrono::milliseconds(timeout)}); | ||
| } | ||
|
|
||
| c10::intrusive_ptr<ProcessGroup::Work> barrier( |
There was a problem hiding this comment.
missing a trailing underscore?
There was a problem hiding this comment.
barrier is not a in-place op. Therefore, it's not necessary to add a trailing _.
There was a problem hiding this comment.
Oh I see, trailing score is for inplace ops as in Python APIs. Thought it was a dispatcher convention.
Summary: This patch makes barrier as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Test Plan: python test/distributed/test_c10d_nccl.py -k test_nccl_barrier ...and other existing distributed tests. [ghstack-poisoned]
|
@pytorchbot merge --green |
|
Thanks for approving this pull request, Shen and Wanchao. |
|
@pytorchbot successfully started a merge job. Check the current status here |
|
Hey @alanwaketan. |
Summary: This patch makes barrier as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Pull Request resolved: #79777 Approved by: https://github.com/mrshenli, https://github.com/wanchaol Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/238eaf20949cefa4cb61a9d61893ab136fbdb3a2 Test plan from GitHub: python test/distributed/test_c10d_nccl.py -k test_nccl_barrier ...and other existing distributed tests. Reviewed By: atalman Differential Revision: D37455834 Pulled By: alanwaketan fbshipit-source-id: 753702a7bfb50bdca701b209ba3e8becd5ef2921
Stack from ghstack (oldest at bottom):
Summary:
This patch makes barrier as a custom op such that it's dispatcher
passable. It's one part of the effort to route comm ops to the dispatcher
such that tracing mechanisms that relies on the dispatcher can trace them,
e.g., LazyTensor and AOTAutograd.
Test Plan:
python test/distributed/test_c10d_nccl.py -k test_nccl_barrier
...and other existing distributed tests.