-
Notifications
You must be signed in to change notification settings - Fork 26.3k
dedupe test skipping in common_distributed and test_distributed #38078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices". This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests. It is also the source of test failures in #37990. This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed` Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/) [ghstack-poisoned]
…buted" `common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices". This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests. It is also the source of test failures in #37990. This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed` Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/) [ghstack-poisoned]
Pull Request resolved: #38078 `common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices". This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests. It is also the source of test failures in #37990. This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed` ghstack-source-id: 103713465 Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)
💊 CI failures summary and remediationsAs of commit dc6608a (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 5 times. |
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is great!
| } | ||
|
|
||
| def skip_if_no_gpu(func): | ||
| """ Nccl multigpu tests requires at least 2 GPUS. Skip if this is not met""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is prior to this PR)
tests requires -> tests require
| return wrapper | ||
|
|
||
|
|
||
| def skip_if_no_cuda_distributed(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is always used together with skip_if_no_gpu, so we can get rid of this as well?
…buted" `common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices". This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests. It is also the source of test failures in #37990. This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed` Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/) [ghstack-poisoned]
…buted" `common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices". This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests. It is also the source of test failures in #37990. This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed` Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/) [ghstack-poisoned]
Pull Request resolved: #38078 `common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices". This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests. It is also the source of test failures in #37990. This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed` ghstack-source-id: 103782583 Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)
|
This pull request has been merged in 4501083. |
Stack from ghstack:
common_distributedandtest_distributedhave some error codes that overlap but are for different reasons, for example, code 75 intest_distributedis "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".This is an issue because the tests in
test_distributednow use the utils incommon_distributed, so we could get the wrong reason for skipping tests.It is also the source of test failures in #37990.
This diff makes it so that the test skipping logic is deduped and put into
common_distributed.py, where it can be reused and then imported intotest_distributedDifferential Revision: D21466768