Skip to content

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Oct 24, 2022

This fixes regression in distributed headers installation.
Caused by following PR: #85953
which removed the inclusions

Fixes #87173

Test plan from wheel build by this CI: https://github.com/pytorch/pytorch/actions/runs/3314742519

[ec2-user@ip-10-0-9-132 c10d]$ pwd
/home/ec2-user/actions-runner/_work/_temp/artifacts/torch/include/torch/csrc/distributed/c10d
[ec2-user@ip-10-0-9-132 c10d]$ ls -las
total 300
 4 drwxr-xr-x 2 ec2-user ec2-user  4096 Oct 24 19:12 .
 0 drwxr-xr-x 4 ec2-user ec2-user    29 Oct 24 19:12 ..
12 -rw-r--r-- 1 ec2-user ec2-user  9051 Oct 24 17:28 Backend.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   216 Oct 24 17:28 c10d.h
 4 -rw-r--r-- 1 ec2-user ec2-user  3880 Oct 24 17:28 comm.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   604 Oct 24 17:28 debug.h
 4 -rw-r--r-- 1 ec2-user ec2-user  1717 Oct 24 17:28 default_comm_hooks.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1316 Oct 24 17:28 error.h
 4 -rw-r--r-- 1 ec2-user ec2-user   962 Oct 24 17:28 exception.h
 4 -rw-r--r-- 1 ec2-user ec2-user  1461 Oct 24 17:28 FileStore.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   771 Oct 24 17:28 GlooDeviceFactory.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1154 Oct 24 17:28 HashStore.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  4058 Oct 24 17:28 logger.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2059 Oct 24 17:28 logging.h
 8 -rw-r--r-- 1 ec2-user ec2-user  7979 Oct 24 17:28 NCCLUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2756 Oct 24 17:28 Ops.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1814 Oct 24 17:28 ParamCommsUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1478 Oct 24 17:28 PrefixStore.hpp
16 -rw-r--r-- 1 ec2-user ec2-user 13235 Oct 24 17:28 ProcessGroupGloo.hpp
12 -rw-r--r-- 1 ec2-user ec2-user 11298 Oct 24 17:28 ProcessGroup.hpp
12 -rw-r--r-- 1 ec2-user ec2-user  8645 Oct 24 17:28 ProcessGroupMPI.hpp
28 -rw-r--r-- 1 ec2-user ec2-user 26526 Oct 24 17:28 ProcessGroupNCCL.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  3805 Oct 24 17:28 ProcessGroupRoundRobin.hpp
12 -rw-r--r-- 1 ec2-user ec2-user 10361 Oct 24 17:28 ProcessGroupUCC.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  5062 Oct 24 17:28 ProcessGroupWrapper.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4201 Oct 24 17:28 PyProcessGroup.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1072 Oct 24 17:28 python_comm_hook.h
24 -rw-r--r-- 1 ec2-user ec2-user 23859 Oct 24 17:28 reducer.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2330 Oct 24 17:28 reducer_timer.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1683 Oct 24 17:28 sequence_num.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2108 Oct 24 17:28 socket.h
 4 -rw-r--r-- 1 ec2-user ec2-user  2589 Oct 24 17:28 Store.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  3264 Oct 24 17:28 TCPStore.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  6944 Oct 24 17:28 TraceUtils.h
 8 -rw-r--r-- 1 ec2-user ec2-user  4539 Oct 24 17:28 Types.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   580 Oct 24 17:28 UCCForNCCL.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2301 Oct 24 17:28 UCCTracing.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4933 Oct 24 17:28 UCCUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   584 Oct 24 17:28 UnixSockUtils.hpp
24 -rw-r--r-- 1 ec2-user ec2-user 20796 Oct 24 17:28 Utils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   575 Oct 24 17:28 WinSockUtils.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4259 Oct 24 17:28 Work.hpp

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 24, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 24, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87615

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 Failures, 1 Pending

As of commit 4f8bd6c:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2022
@malfet
Copy link
Contributor

malfet commented Oct 24, 2022

s/This fixes regression in distributed wheels/This fixes regression in distributed headers packaging/, but otherwise LGTM

@atalman atalman added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Oct 24, 2022
@atalman
Copy link
Contributor Author

atalman commented Oct 24, 2022

@pytorchmergebot merge -f "needed for the release"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor

malfet commented Oct 24, 2022

Tested by modifying test_cpp_extension_jit.py as follows:

diff --git a/test/test_cpp_extensions_jit.py b/test/test_cpp_extensions_jit.py
index e4b1e9e550..90b7c27ec8 100644
--- a/test/test_cpp_extensions_jit.py
+++ b/test/test_cpp_extensions_jit.py
@@ -847,6 +847,7 @@ class TestCppExtensionJIT(common.TestCase):
 
         source = """
         #include <torch/library.h>
+        #include <torch/csrc/distributed/c10d/ProcessGroup.hpp>
         torch::Tensor my_add(torch::Tensor x, torch::Tensor y) {
           return x + y;
         }

and run the test

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
This fixes regression in distributed headers installation.
Caused by following PR: pytorch#85953
which removed the inclusions

Fixes pytorch#87173

Test plan from wheel build by this CI: https://github.com/pytorch/pytorch/actions/runs/3314742519

```
[ec2-user@ip-10-0-9-132 c10d]$ pwd
/home/ec2-user/actions-runner/_work/_temp/artifacts/torch/include/torch/csrc/distributed/c10d
[ec2-user@ip-10-0-9-132 c10d]$ ls -las
total 300
 4 drwxr-xr-x 2 ec2-user ec2-user  4096 Oct 24 19:12 .
 0 drwxr-xr-x 4 ec2-user ec2-user    29 Oct 24 19:12 ..
12 -rw-r--r-- 1 ec2-user ec2-user  9051 Oct 24 17:28 Backend.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   216 Oct 24 17:28 c10d.h
 4 -rw-r--r-- 1 ec2-user ec2-user  3880 Oct 24 17:28 comm.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   604 Oct 24 17:28 debug.h
 4 -rw-r--r-- 1 ec2-user ec2-user  1717 Oct 24 17:28 default_comm_hooks.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1316 Oct 24 17:28 error.h
 4 -rw-r--r-- 1 ec2-user ec2-user   962 Oct 24 17:28 exception.h
 4 -rw-r--r-- 1 ec2-user ec2-user  1461 Oct 24 17:28 FileStore.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   771 Oct 24 17:28 GlooDeviceFactory.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1154 Oct 24 17:28 HashStore.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  4058 Oct 24 17:28 logger.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2059 Oct 24 17:28 logging.h
 8 -rw-r--r-- 1 ec2-user ec2-user  7979 Oct 24 17:28 NCCLUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2756 Oct 24 17:28 Ops.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1814 Oct 24 17:28 ParamCommsUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1478 Oct 24 17:28 PrefixStore.hpp
16 -rw-r--r-- 1 ec2-user ec2-user 13235 Oct 24 17:28 ProcessGroupGloo.hpp
12 -rw-r--r-- 1 ec2-user ec2-user 11298 Oct 24 17:28 ProcessGroup.hpp
12 -rw-r--r-- 1 ec2-user ec2-user  8645 Oct 24 17:28 ProcessGroupMPI.hpp
28 -rw-r--r-- 1 ec2-user ec2-user 26526 Oct 24 17:28 ProcessGroupNCCL.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  3805 Oct 24 17:28 ProcessGroupRoundRobin.hpp
12 -rw-r--r-- 1 ec2-user ec2-user 10361 Oct 24 17:28 ProcessGroupUCC.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  5062 Oct 24 17:28 ProcessGroupWrapper.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4201 Oct 24 17:28 PyProcessGroup.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1072 Oct 24 17:28 python_comm_hook.h
24 -rw-r--r-- 1 ec2-user ec2-user 23859 Oct 24 17:28 reducer.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2330 Oct 24 17:28 reducer_timer.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1683 Oct 24 17:28 sequence_num.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2108 Oct 24 17:28 socket.h
 4 -rw-r--r-- 1 ec2-user ec2-user  2589 Oct 24 17:28 Store.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  3264 Oct 24 17:28 TCPStore.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  6944 Oct 24 17:28 TraceUtils.h
 8 -rw-r--r-- 1 ec2-user ec2-user  4539 Oct 24 17:28 Types.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   580 Oct 24 17:28 UCCForNCCL.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2301 Oct 24 17:28 UCCTracing.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4933 Oct 24 17:28 UCCUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   584 Oct 24 17:28 UnixSockUtils.hpp
24 -rw-r--r-- 1 ec2-user ec2-user 20796 Oct 24 17:28 Utils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   575 Oct 24 17:28 WinSockUtils.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4259 Oct 24 17:28 Work.hpp
```

Pull Request resolved: pytorch#87615
Approved by: https://github.com/malfet
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
This fixes regression in distributed headers installation.
Caused by following PR: pytorch#85953
which removed the inclusions

Fixes pytorch#87173

Test plan from wheel build by this CI: https://github.com/pytorch/pytorch/actions/runs/3314742519

```
[ec2-user@ip-10-0-9-132 c10d]$ pwd
/home/ec2-user/actions-runner/_work/_temp/artifacts/torch/include/torch/csrc/distributed/c10d
[ec2-user@ip-10-0-9-132 c10d]$ ls -las
total 300
 4 drwxr-xr-x 2 ec2-user ec2-user  4096 Oct 24 19:12 .
 0 drwxr-xr-x 4 ec2-user ec2-user    29 Oct 24 19:12 ..
12 -rw-r--r-- 1 ec2-user ec2-user  9051 Oct 24 17:28 Backend.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   216 Oct 24 17:28 c10d.h
 4 -rw-r--r-- 1 ec2-user ec2-user  3880 Oct 24 17:28 comm.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   604 Oct 24 17:28 debug.h
 4 -rw-r--r-- 1 ec2-user ec2-user  1717 Oct 24 17:28 default_comm_hooks.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1316 Oct 24 17:28 error.h
 4 -rw-r--r-- 1 ec2-user ec2-user   962 Oct 24 17:28 exception.h
 4 -rw-r--r-- 1 ec2-user ec2-user  1461 Oct 24 17:28 FileStore.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   771 Oct 24 17:28 GlooDeviceFactory.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1154 Oct 24 17:28 HashStore.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  4058 Oct 24 17:28 logger.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2059 Oct 24 17:28 logging.h
 8 -rw-r--r-- 1 ec2-user ec2-user  7979 Oct 24 17:28 NCCLUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2756 Oct 24 17:28 Ops.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1814 Oct 24 17:28 ParamCommsUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1478 Oct 24 17:28 PrefixStore.hpp
16 -rw-r--r-- 1 ec2-user ec2-user 13235 Oct 24 17:28 ProcessGroupGloo.hpp
12 -rw-r--r-- 1 ec2-user ec2-user 11298 Oct 24 17:28 ProcessGroup.hpp
12 -rw-r--r-- 1 ec2-user ec2-user  8645 Oct 24 17:28 ProcessGroupMPI.hpp
28 -rw-r--r-- 1 ec2-user ec2-user 26526 Oct 24 17:28 ProcessGroupNCCL.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  3805 Oct 24 17:28 ProcessGroupRoundRobin.hpp
12 -rw-r--r-- 1 ec2-user ec2-user 10361 Oct 24 17:28 ProcessGroupUCC.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  5062 Oct 24 17:28 ProcessGroupWrapper.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4201 Oct 24 17:28 PyProcessGroup.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1072 Oct 24 17:28 python_comm_hook.h
24 -rw-r--r-- 1 ec2-user ec2-user 23859 Oct 24 17:28 reducer.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2330 Oct 24 17:28 reducer_timer.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  1683 Oct 24 17:28 sequence_num.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2108 Oct 24 17:28 socket.h
 4 -rw-r--r-- 1 ec2-user ec2-user  2589 Oct 24 17:28 Store.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  3264 Oct 24 17:28 TCPStore.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  6944 Oct 24 17:28 TraceUtils.h
 8 -rw-r--r-- 1 ec2-user ec2-user  4539 Oct 24 17:28 Types.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   580 Oct 24 17:28 UCCForNCCL.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user  2301 Oct 24 17:28 UCCTracing.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4933 Oct 24 17:28 UCCUtils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   584 Oct 24 17:28 UnixSockUtils.hpp
24 -rw-r--r-- 1 ec2-user ec2-user 20796 Oct 24 17:28 Utils.hpp
 4 -rw-r--r-- 1 ec2-user ec2-user   575 Oct 24 17:28 WinSockUtils.hpp
 8 -rw-r--r-- 1 ec2-user ec2-user  4259 Oct 24 17:28 Work.hpp
```

Pull Request resolved: pytorch#87615
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch/csrc/distributed/c10d/Types.hpp: No such file or directory

3 participants