Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py#56598
Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py#56598pbelevich wants to merge 1 commit intopytorch:masterfrom
Conversation
💊 CI failures summary and remediationsAs of commit 751bf5e (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Run tests | 🔁 rerun | |
| Run tests | 🔁 rerun | |
| Run tests | 🔁 rerun |
🚧 1 ongoing upstream failure:
These were probably caused by upstream breakages that are not fixed yet.
- pytorch_linux_xenial_py3_clang7_onnx_ort_test1 since Apr 21 (24ff92f)
🚧 1 fixed upstream failure:
These were probably caused by upstream breakages that were already fixed.
Please rebase on the viable/strict branch (expand for instructions)
If your commit is older than viable/strict, run these commands:
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD
- pytorch_linux_xenial_py3_clang7_onnx_ort_test2 on Apr 21 from 5:59pm to 7:44pm (24ff92f - 6032ea0)
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
|
This pull request was exported from Phabricator. Differential Revision: D27913170 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D27913170 |
b3333f0 to
57f0a92
Compare
|
This pull request was exported from Phabricator. Differential Revision: D27913170 |
57f0a92 to
e8712cd
Compare
|
Thanks for the refactoring! This will definitely make the test codebase much more manageable. On the other hand, I think this PR does a bit more than plain splitting. Therefore, it will be better to add some descriptions in the PR to highlight some changes, e.g., creating some abstract classes like Another suggestion is separating out many fundamental store-related unit tests into a separate file, as these tests and other distributed training tests target different layers. This can be done in a fllow-up PR if you'd like. |
test/distributed/test_c10d_gloo.py
Outdated
There was a problem hiding this comment.
Why is this test in gloo rather than common?
There was a problem hiding this comment.
line 181:
c10d.init_process_group(backend="gloo", init_method="env://")
There was a problem hiding this comment.
Better to add @requires_gloo() here.
test/distributed/test_c10d_gloo.py
Outdated
There was a problem hiding this comment.
The indentation here and a few "ValueError" lines and "RuntimeError" lines are changed. Previously it's aligned with the end of "with". Is this intentional?
test/distributed/test_c10d_gloo.py
Outdated
|
This pull request was exported from Phabricator. Differential Revision: D27913170 |
e8712cd to
dca061c
Compare
|
This pull request was exported from Phabricator. Differential Revision: D27913170 |
dca061c to
7b9787c
Compare
…0d_nccl.py (pytorch#56598) Summary: Pull Request resolved: pytorch#56598 Test Plan: NA Reviewed By: SciPioneer Differential Revision: D27913170 fbshipit-source-id: 4f2c09037b94b37b87545fd2b48ed6502c47433b
|
This pull request was exported from Phabricator. Differential Revision: D27913170 |
7b9787c to
751bf5e
Compare
|
This pull request has been merged in 5cc75e4. |
The process group is created on a gloo backend. Context: #56598 (comment) Differential Revision: [D27936805](https://our.internmc.facebook.com/intern/diff/D27936805/) ghstack-source-id: 127150910 Pull Request resolved: #56682
The process group is created on a gloo backend. Context: #56598 (comment) Differential Revision: [D27936805](https://our.internmc.facebook.com/intern/diff/D27936805/) [ghstack-poisoned]
Summary: Pull Request resolved: #56682 The process group is created on a gloo backend. Context: #56598 (comment) ghstack-source-id: 127150910 Test Plan: buck test caffe2/test/distributed:c10d -- test_logging_init Reviewed By: pbelevich Differential Revision: D27936805 fbshipit-source-id: 932efc638f94bdf78ddbae291e3720a20e43f2af
…0d_nccl.py (pytorch#56598) Summary: Pull Request resolved: pytorch#56598 Test Plan: NA Reviewed By: SciPioneer Differential Revision: D27913170 fbshipit-source-id: 3439d18141131b02d55f2ca399a4c795cba2b04b
Summary: Pull Request resolved: pytorch#56682 The process group is created on a gloo backend. Context: pytorch#56598 (comment) ghstack-source-id: 127150910 Test Plan: buck test caffe2/test/distributed:c10d -- test_logging_init Reviewed By: pbelevich Differential Revision: D27936805 fbshipit-source-id: 932efc638f94bdf78ddbae291e3720a20e43f2af
Summary: Fixes test failure after pytorch#56598 Introduced by pytorch#45335. Pull Request resolved: pytorch#57000 Reviewed By: zou3519 Differential Revision: D28030360 Pulled By: seemethere fbshipit-source-id: 4871d51e6b80dceef8bf95c6c658441287575f63
Test Plan: NA
Differential Revision: D27913170