-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Remove rpc fork and dist autograd fork tests from PyTorch repo #29827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running. Differential Revision: [D18507384](https://our.internmc.facebook.com/intern/diff/D18507384/) [ghstack-poisoned]
There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running. Differential Revision: [D18507384](https://our.internmc.facebook.com/intern/diff/D18507384/) ghstack-source-id: 93919578 Pull Request resolved: #29827
| TESTS.extend([ | ||
| 'rpc_fork', | ||
| 'rpc_spawn', | ||
| 'dist_autograd_fork', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the same to test_dist_optimizer_fork.py. It depends on RpcAgent as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
| if PY33: | ||
| TESTS.extend([ | ||
| 'rpc_fork', | ||
| 'rpc_spawn', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we only have spawn mode on the OSS side, do we still need to explicitly have a "spawn" in the file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can keep it for now? once we completely get rid of fork, we can merge rpcSpawnTest with RpcTest
…repo" There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running. Differential Revision: [D18507384](https://our.internmc.facebook.com/intern/diff/D18507384/) [ghstack-poisoned]
Pull Request resolved: #29827 There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running. Differential Revision: [D18507384](https://our.internmc.facebook.com/intern/diff/D18507384/) ghstack-source-id: 94040608
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lint and test failures are irrelevant.
Curious, why dist_optimizer_spawn is not in windows and rocm blacklist?
…repo" There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running. Differential Revision: [D18507384](https://our.internmc.facebook.com/intern/diff/D18507384/) [ghstack-poisoned]
…repo" There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running. Differential Revision: [D18507384](https://our.internmc.facebook.com/intern/diff/D18507384/) [ghstack-poisoned]
Pull Request resolved: #29827 There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running. Differential Revision: [D18507384](https://our.internmc.facebook.com/intern/diff/D18507384/) ghstack-source-id: 94139812
|
checked "rpc_fork", "dist_autograd_fork" and "dist_optimizer_fork" did not run but "rpc_spawn", "dist_autograd_spawn" and "dist_optimizer_spawn" ran in https://circleci.com/api/v1.1/project/github/pytorch/pytorch/3645637/output/105/0?file=true&allocation-id=5dd30d9f0d6a0b62d1d857f9-0-build%2F10D9680D |
|
rocm test failure is not relevant FAILED (skipped=12, unexpected successes=1) |
|
This pull request has been merged in 861ef05. |
As after #29827 we only test RPC using spawn, the multi-thread/fork error should disappear. [ghstack-poisoned]
As after #29827 we only test RPC using spawn, the multi-thread/fork error should disappear. [ghstack-poisoned]
As after #29827 we only test RPC using spawn, the multi-thread/fork error should disappear. [ghstack-poisoned]
With #29827, the flakiness should disappear for test_call_method_on_rref [ghstack-poisoned]
Stack from ghstack:
There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running.
Differential Revision: D18507384