Skip to content

Enable distributed package on windows, Gloo backend supported only#42897

Closed
gunandrose4u wants to merge 67 commits intopytorch:masterfrom
gunandrose4u:enable_win_dist
Closed

Enable distributed package on windows, Gloo backend supported only#42897
gunandrose4u wants to merge 67 commits intopytorch:masterfrom
gunandrose4u:enable_win_dist

Conversation

@gunandrose4u
Copy link
Copy Markdown
Contributor

Fixes #42095

For test case part will be committed to this PR later

@mrshenli, please help to review

@dr-ci
Copy link
Copy Markdown

dr-ci bot commented Aug 12, 2020

💊 CI failures summary and remediations

As of commit 8404d1e (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

1 failure confirmed as flaky and can be ignored:

  • pytorch_macos_10_13_py3_build

Extra GitHub checks: 1 failed


codecov.io: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 325 times.

@gunandrose4u gunandrose4u marked this pull request as draft August 12, 2020 04:14
Comment thread CMakeLists.txt Outdated
Comment thread cmake/Dependencies.cmake Outdated
Comment thread torch/CMakeLists.txt Outdated
Comment thread torch/CMakeLists.txt Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to add NOT MSVC to the USE_NCCL flag as well. Looks like NCCL does not officially support Windows yet https://forums.developer.nvidia.com/t/is-there-a-nccl-2-x-for-windows/55659/3

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should add one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, I don't see a special handling for NCCL, but there is no test failures for NCCL backend. Is it because NCCL now works on Windows or we didn't actually trigger those tests?

Comment thread torch/csrc/distributed/c10d/comm.h Outdated
Comment thread torch/csrc/distributed/c10d/init.cpp Outdated
Comment thread torch/csrc/distributed/c10d/reducer.cpp Outdated
Comment thread torch/csrc/distributed/rpc/rref_context.h Outdated
Comment thread torch/csrc/distributed/rpc/tensorpipe_agent.cpp Outdated
Comment thread torch/csrc/distributed/rpc/tensorpipe_agent.h Outdated
Comment thread torch/csrc/distributed/rpc/tensorpipe_agent.cpp Outdated
Comment thread torch/csrc/distributed/rpc/unpickled_python_call.h Outdated
Comment thread torch/lib/c10d/FileStore.cpp Outdated
@zhangguanheng66 zhangguanheng66 added module: distributions Related to torch.distributions triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 12, 2020
Comment thread torch/csrc/distributed/c10d/comm.h Outdated
Comment thread torch/csrc/distributed/rpc/init.cpp Outdated
Comment thread torch/csrc/distributed/rpc/rref_context.h Outdated
Comment thread torch/distributed/rpc/backend_registry.py Outdated
Comment thread torch/distributed/rpc/constants.py Outdated
Comment thread torch/lib/c10d/FileStore.cpp Outdated
Comment thread torch/lib/c10d/GlooDeviceFactory.cpp Outdated
Update my fork from Pytorch repo
@mrshenli mrshenli requested a review from malfet August 20, 2020 14:18
@mrshenli mrshenli added the module: build Build system issues label Aug 20, 2020
Rebase from pytorch/pytorch master
facebook-github-bot pushed a commit that referenced this pull request Sep 25, 2020
Summary:
Fixes #{issue number}
This is resubmit for PR #42897 . Together with fix for Windows build issue introduced by PR #44344 .

Pull Request resolved: #45335

Reviewed By: zou3519

Differential Revision: D23931471

Pulled By: mrshenli

fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494
@gunandrose4u
Copy link
Copy Markdown
Contributor Author

Merged again by PR f07ac6a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: build Build system issues module: distributions Related to torch.distributions open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Add Windows support to torch.distributed package