Skip to content

Conversation

@pietern
Copy link
Contributor

@pietern pietern commented May 7, 2019

Stack:
    :white_circle:  #20236 Make DistributedDataParallel usable with CPU models  💚
    :white_circle:  #20235 Refactor core DistributedDataParallel tests  💚
    :black_circle:  #20234 Add c10d::broadcast_coalesced and tests  💚

The differences with the existing function _dist_broadcast_coalesced
is that this one works for both CPU and CUDA tensors and that it has a
maximum number of in flight operations.

This should be the final change needed to have only a single version
of DistributedDataParallel that both supports CPU and CUDA models, or
even a mix of both.

See #17757 for more information.

Differential Revision: D15228099

Differential Revision: D15228099
Differential Version: 81264978
@pietern pietern requested review from apaszke and mrshenli as code owners May 7, 2019 19:25
@pytorchbot pytorchbot added module: build Build system issues oncall: distributed Add this issue/PR to distributed oncall triage queue labels May 7, 2019
// Returns tensor at specified index in input tensor list.
const auto lookup = [&tensors](size_t index) { return tensors[index]; };

// We maintain a maximum of 2 in flight broadcast operations to avoid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable? Will it be possible that two broadcast cannot fully utilize resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather not make it configurable, as it's yet another parameter to configure and provide backwards compatibility for. If you don't utilize resources with 2 in flight, you can increase the buffer size as well.

py::call_guard<py::gil_scoped_release>());

module.def(
"_broadcast_coalesced",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to eventually replace the existing _dist_broadcast_coalesced with this one? IIUC, this is still distributed broadcast. Any reason for not including a dist in the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove _dist_broadcast_coalesced once this lands. I don't like having dist in the name, since it is implied by the namespace. When we do work on the C++ API a bit more, we'll have the CUDA version as torch::cuda::broadcast_coalesced and the distributed version as torch::distributed::broadcast_coalesced. No need to repeat dist in the name IMO.

device = torch.device('cuda:%d' % self.rank)
self._test_broadcast_coalesced(process_group, device)

def test_broadcast_coalesced_gloo_cpu(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need MPI tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'll add a separate issue for it. Doing MPI tests through test_c10d.py is not yet possible, because of the intricacies of launching an MPI test (done through run_test.py for test_distributed.py).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #20245.

Differential Revision: D15228099
Differential Version: 81326668
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang tidy hits the following error:

[{"name": "torch/csrc/distributed/c10d/comm.cpp", "lines": [[1, 83]]}, {"name": "torch/csrc/distributed/c10d/init.cpp", "lines": [[21, 22], [538, 553]]}] torch/csrc/distributed/c10d/comm.cpp torch/csrc/distributed/c10d/init.cpp: /home/travis/build/pytorch/pytorch/torch/csrc/distributed/c10d/comm.cpp:15:43: error: the parameter 'process_group' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param,-warnings-as-errors]
      std::shared_ptr<c10d::ProcessGroup> process_group,
      ~~~                                 ^
      const                              &

Differential Revision: D15228099
Differential Version: 81445331
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in caa0d0c.

@pietern pietern deleted the export-D15228099 branch May 9, 2019 22:37
rohan-varma added a commit that referenced this pull request May 7, 2020
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 7, 2020
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 7, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103638483

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
rohan-varma added a commit that referenced this pull request May 7, 2020
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 7, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103647579

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
rohan-varma added a commit that referenced this pull request May 8, 2020
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 8, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103716041

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
rohan-varma added a commit that referenced this pull request May 11, 2020
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 11, 2020
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 11, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103871258

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
rohan-varma added a commit that referenced this pull request May 11, 2020
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 11, 2020
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 11, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103885383

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
facebook-github-bot pushed a commit that referenced this pull request May 12, 2020
Summary:
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103885383

Test Plan: CI

Differential Revision: D21443879

fbshipit-source-id: 764d8681ca629056bfe2c260ffab47fa5bdf07ff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: build Build system issues oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants