Skip to content

Conversation

@rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented May 7, 2020

Stack from ghstack:

The code in ddp.{h, cpp} and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete _dist_broadcast_coalesced

Verified that all tests pass with cuda by running test_c10d on a gpu-enabled machine.

Differential Revision: D21443879

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 7, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103638483

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
@dr-ci
Copy link

dr-ci bot commented May 7, 2020

💊 CI failures summary and remediations

As of commit fb06ad8 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 20 times.

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 7, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103647579

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see getting rid of dead code! The failed tests doesn't seem to be broken by this PR, but can you double check if they are flaky instead of consistently failing by running those locally

I created #38015 and #38016 to track.

@rohan-varma
Copy link
Contributor Author

@mrshenli Actually it looks like some of the test issues are related. In test_distributed.py, we define an exit code of 77 for some specific skip condition, but in common_distributed that code is for something else. This is a duplication issue now that test_distributed uses the utils in common_distributed. Now in common_distributed I removed the 77 skip code, messing things up. I'll send a PR deduping these exit codes below this.

rohan-varma added a commit that referenced this pull request May 7, 2020
`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 7, 2020
…buted"

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 7, 2020
Pull Request resolved: #38078

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`
ghstack-source-id: 103713465

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 8, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103716041

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
rohan-varma added a commit that referenced this pull request May 8, 2020
…buted"

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 8, 2020
…buted"

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 8, 2020
Pull Request resolved: #38078

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`
ghstack-source-id: 103782583

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)
facebook-github-bot pushed a commit that referenced this pull request May 9, 2020
Summary:
Pull Request resolved: #38078

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`
ghstack-source-id: 103782583

Test Plan: CI

Differential Revision: D21466768

fbshipit-source-id: 53b5af36672ebd8b51ba8b42709d87e96cadef20
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 11, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103871258

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 11, 2020
Pull Request resolved: #37990

The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs.

#20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced`

Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine.
ghstack-source-id: 103885383

Differential Revision: [D21443879](https://our.internmc.facebook.com/intern/diff/D21443879/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21443879/)!
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 906c50e.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/120/head branch May 16, 2020 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants