[c10d] Make reduce_scatter as a custom op#79683
[c10d] Make reduce_scatter as a custom op#79683alanwaketan wants to merge 3 commits intogh/alanwaketan/36/basefrom
Conversation
Summary: This patch makes reduce_scatter as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Test Plan: python test/distributed/test_c10d_nccl.py -k test_reduce_scatter_ops [ghstack-poisoned]
🔗 Helpful links
❌ 1 New FailuresAs of commit e73037d (more details on the Dr. CI page): Expand to see more
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
Summary: This patch makes reduce_scatter as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Test Plan: python test/distributed/test_c10d_nccl.py -k test_reduce_scatter_ops [ghstack-poisoned]
Summary: This patch makes reduce_scatter as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Test Plan: python test/distributed/test_c10d_nccl.py -k test_reduce_scatter_ops ghstack-source-id: b9afad5 Pull Request resolved: #79683
Summary: This patch makes reduce_scatter as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Test Plan: python test/distributed/test_c10d_nccl.py -k test_reduce_scatter_ops [ghstack-poisoned]
mrshenli
left a comment
There was a problem hiding this comment.
LGTM! Same suggestions on tests as the PR below this one.
| .findSchemaOrThrow("c10d::reduce_scatter_", "") | ||
| .typed<c10::intrusive_ptr<::c10d::ProcessGroup::Work>( | ||
| const std::vector<at::Tensor>&, | ||
| const std::vector<std::vector<at::Tensor>>&, | ||
| const c10::intrusive_ptr<::c10d::ProcessGroup>&, | ||
| int64_t, | ||
| int64_t)>(); |
There was a problem hiding this comment.
Would I be correct that the overhead of these two ops are negligible?
There was a problem hiding this comment.
It should be negligible. And then we cache it for the consecutive calls to minimize the overhead too.
| .def( | ||
| "reduce_scatter", | ||
| [](::c10d::ProcessGroup& pg, | ||
| [](const c10::intrusive_ptr<::c10d::ProcessGroup>& pg, |
There was a problem hiding this comment.
let's stay consistent on the naming? This one is using pg. The one above is using self.
There was a problem hiding this comment.
I believe this pg is user defined, and the above is really self. That's why there is this inconsistency.
|
Thanks Shen for approving this pull request. The CI failure is unrelated. |
|
@pytorchbot merge -f |
|
@pytorchbot successfully started a merge job. Check the current status here |
|
Hey @alanwaketan. |
Summary: This patch makes reduce_scatter as a custom op such that it's dispatcher passable. It's one part of the effort to route comm ops to the dispatcher such that tracing mechanisms that relies on the dispatcher can trace them, e.g., LazyTensor and AOTAutograd. Pull Request resolved: #79683 Approved by: https://github.com/mrshenli Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/80b50dfa3ab16de3e90dab8eeed003a98a0da1fe Test plan from GitHub: python test/distributed/test_c10d_nccl.py -k test_reduce_scatter_ops Reviewed By: atalman Differential Revision: D37455687 Pulled By: alanwaketan fbshipit-source-id: bb78183e5cf5798b6b558488eed07af7cd4d4eff
Stack from ghstack (oldest at bottom):
Summary:
This patch makes reduce_scatter as a custom op such that it's dispatcher
passable. It's one part of the effort to route comm ops to the dispatcher
such that tracing mechanisms that relies on the dispatcher can trace them,
e.g., LazyTensor and AOTAutograd.
Test Plan:
python test/distributed/test_c10d_nccl.py -k test_reduce_scatter_ops