-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add c10d::broadcast_coalesced and tests #20234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Differential Revision: D15228099 Differential Version: 81264978
| // Returns tensor at specified index in input tensor list. | ||
| const auto lookup = [&tensors](size_t index) { return tensors[index]; }; | ||
|
|
||
| // We maintain a maximum of 2 in flight broadcast operations to avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be configurable? Will it be possible that two broadcast cannot fully utilize resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rather not make it configurable, as it's yet another parameter to configure and provide backwards compatibility for. If you don't utilize resources with 2 in flight, you can increase the buffer size as well.
| py::call_guard<py::gil_scoped_release>()); | ||
|
|
||
| module.def( | ||
| "_broadcast_coalesced", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to eventually replace the existing _dist_broadcast_coalesced with this one? IIUC, this is still distributed broadcast. Any reason for not including a dist in the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove _dist_broadcast_coalesced once this lands. I don't like having dist in the name, since it is implied by the namespace. When we do work on the C++ API a bit more, we'll have the CUDA version as torch::cuda::broadcast_coalesced and the distributed version as torch::distributed::broadcast_coalesced. No need to repeat dist in the name IMO.
| device = torch.device('cuda:%d' % self.rank) | ||
| self._test_broadcast_coalesced(process_group, device) | ||
|
|
||
| def test_broadcast_coalesced_gloo_cpu(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need MPI tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I'll add a separate issue for it. Doing MPI tests through test_c10d.py is not yet possible, because of the intricacies of launching an MPI test (done through run_test.py for test_distributed.py).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #20245.
Differential Revision: D15228099 Differential Version: 81326668
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang tidy hits the following error:
[{"name": "torch/csrc/distributed/c10d/comm.cpp", "lines": [[1, 83]]}, {"name": "torch/csrc/distributed/c10d/init.cpp", "lines": [[21, 22], [538, 553]]}] torch/csrc/distributed/c10d/comm.cpp torch/csrc/distributed/c10d/init.cpp: /home/travis/build/pytorch/pytorch/torch/csrc/distributed/c10d/comm.cpp:15:43: error: the parameter 'process_group' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param,-warnings-as-errors]
std::shared_ptr<c10d::ProcessGroup> process_group,
~~~ ^
const &
Differential Revision: D15228099 Differential Version: 81445331
|
This pull request has been merged in caa0d0c. |
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
[ghstack-poisoned]
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
[ghstack-poisoned]
Pull Request resolved: #37990 The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs. #20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced` Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine. ghstack-source-id: 103638483 Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
[ghstack-poisoned]
Pull Request resolved: #37990 The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs. #20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced` Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine. ghstack-source-id: 103647579 Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
[ghstack-poisoned]
Pull Request resolved: #37990 The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs. #20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced` Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine. ghstack-source-id: 103716041 Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
[ghstack-poisoned]
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
[ghstack-poisoned]
Pull Request resolved: #37990 The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs. #20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced` Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine. ghstack-source-id: 103871258 Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
[ghstack-poisoned]
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.
#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`
Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
[ghstack-poisoned]
Pull Request resolved: #37990 The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs. #20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced` Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine. ghstack-source-id: 103885383 Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
Summary: Pull Request resolved: #37990 The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs. #20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced` Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine. ghstack-source-id: 103885383 Test Plan: CI Differential Revision: D21443879 fbshipit-source-id: 764d8681ca629056bfe2c260ffab47fa5bdf07ff
Stack:
:white_circle: #20236 Make DistributedDataParallel usable with CPU models 💚
:white_circle: #20235 Refactor core DistributedDataParallel tests 💚
:black_circle: #20234 Add c10d::broadcast_coalesced and tests 💚
The differences with the existing function _dist_broadcast_coalesced
is that this one works for both CPU and CUDA tensors and that it has a
maximum number of in flight operations.
This should be the final change needed to have only a single version
of DistributedDataParallel that both supports CPU and CUDA models, or
even a mix of both.
See #17757 for more information.
Differential Revision: D15228099