Skip to content

Conversation

@rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented May 7, 2020

Stack from ghstack:

common_distributed and test_distributed have some error codes that overlap but are for different reasons, for example, code 75 in test_distributed is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in test_distributed now use the utils in common_distributed, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into common_distributed.py, where it can be reused and then imported into test_distributed

Differential Revision: D21466768

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)

[ghstack-poisoned]
…buted"

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 7, 2020
Pull Request resolved: #38078

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`
ghstack-source-id: 103713465

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)
@dr-ci
Copy link

dr-ci bot commented May 7, 2020

💊 CI failures summary and remediations

As of commit dc6608a (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 5 times.

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is great!

}

def skip_if_no_gpu(func):
""" Nccl multigpu tests requires at least 2 GPUS. Skip if this is not met"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is prior to this PR)

tests requires -> tests require

return wrapper


def skip_if_no_cuda_distributed(func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is always used together with skip_if_no_gpu, so we can get rid of this as well?

…buted"

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)

[ghstack-poisoned]
…buted"

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request May 8, 2020
Pull Request resolved: #38078

`common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices".

This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests.

It is also the source of test failures in #37990.

This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed`
ghstack-source-id: 103782583

Differential Revision: [D21466768](https://our.internmc.facebook.com/intern/diff/D21466768/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 4501083.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/123/head branch May 12, 2020 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants