Move allgather_coalesced implementation from Python to C++ #29059

agolynski · 2019-11-01T20:59:19Z

Summary:
Pull Request resolved: #29059

Resubmit of reverted PR #28857.

Differential Revision: D18277097

facebook-github-bot · 2019-11-01T20:59:32Z

This pull request was exported from Phabricator. Differential Revision: D18277097

agolynski · 2019-11-01T21:02:27Z

#29059 caused a broken build due to unimplemented function in MPI backend. Fixed here.

facebook-github-bot · 2019-11-01T22:16:51Z

This pull request was exported from Phabricator. Differential Revision: D18277097

facebook-github-bot · 2019-11-01T22:24:49Z

This pull request was exported from Phabricator. Differential Revision: D18277097

facebook-github-bot · 2019-11-01T23:25:44Z

This pull request was exported from Phabricator. Differential Revision: D18277097

pietern

I should have checked CI before approving #28857.

It's all green now, so it should be good to go.

Summary: Pull Request resolved: pytorch#29059 This is a resubmit of reverted diff D18209289 ( PR pytorch#28857 ). Test Plan: buck test caffe2/test:c10d buck test caffe2/test:distributed_gloo Reviewed By: pietern Differential Revision: D18277097 fbshipit-source-id: 3e16c4c5f71e5c051ffef280e021bd253caf127c

facebook-github-bot · 2019-11-04T16:09:16Z

This pull request was exported from Phabricator. Differential Revision: D18277097

facebook-github-bot · 2019-11-04T21:40:06Z

This pull request has been merged in 23695ab.

gchanan · 2019-11-08T18:33:37Z

test/test_c10d.py

        store = c10d.FileStore(self.file_name, self.world_size)
        pg = c10d.ProcessGroupGloo(store, self.rank, self.world_size, self.opts())
-        dummy_input = [torch.Tensor([1])]
+        dummy_input = [torch.zeros([1], dtype=torch.float32)]


sorry -- why are these not translated exactly?

torch.Tensor([1]) is torch.ones([1]), not zeros, right?

also same with the line below, why did that change from -1 to 0?

This only tests error handling, so they underlying values here should not be important (all_gather_coalesced never copies anything in this function). I am happy to change it back if you prefer

gchanan · 2019-11-08T18:38:05Z

torch/lib/c10d/Utils.hpp


+inline void assertSameDevice(
+    std::function<void(const std::string&)> fn,
+    const at::ArrayRef<at::Tensor>& tensors) {


don't we have a TensorList for this? (Also I wouldn't expect const reference to it, it's trivial to copy).

ah, a good point. I was trying to be consistent with the other functions in this module mostly use const reference to ArrayRef, and not TensorList (TensorList = ArrayRef).
Actually, I just need to verify tensors in a vector, so I might just accept a const ref to a vector.
Would you prefer that?

agolynski requested review from apaszke, mrshenli and pietern as code owners November 1, 2019 20:59

agolynski force-pushed the export-D18277097 branch from 36b882e to efd623b Compare November 1, 2019 22:16

agolynski force-pushed the export-D18277097 branch from efd623b to 0542a6c Compare November 1, 2019 22:24

agolynski force-pushed the export-D18277097 branch from 0542a6c to b21acc4 Compare November 1, 2019 23:25

pietern changed the title ~~Moving python allgather_coalesced impl from Py to C. (#28857)~~ Moving allgather_coalesced implementation from Python to C++ Nov 4, 2019

pietern changed the title ~~Moving allgather_coalesced implementation from Python to C++~~ Move allgather_coalesced implementation from Python to C++ Nov 4, 2019

pietern approved these changes Nov 4, 2019

View reviewed changes

agolynski force-pushed the export-D18277097 branch from b21acc4 to 557c40b Compare November 4, 2019 16:09

facebook-github-bot closed this in 23695ab Nov 4, 2019

facebook-github-bot added the merged label Nov 4, 2019

gchanan reviewed Nov 8, 2019

View reviewed changes

mruberry added the Merged label Oct 28, 2020

Move allgather_coalesced implementation from Python to C++ #29059

Move allgather_coalesced implementation from Python to C++ #29059

Uh oh!

Conversation

agolynski commented Nov 1, 2019 • edited by pietern Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

agolynski commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

pietern left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 4, 2019

Uh oh!

facebook-github-bot commented Nov 4, 2019

Uh oh!

gchanan Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

agolynski Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

gchanan Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

agolynski Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

agolynski commented Nov 1, 2019 •

edited by pietern

Loading