-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[distributed][rpc] separate out rpc to sync and async apis #26570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
…sync apis" Per #24247, this splits out the async and sync rpc implementations to their own functions. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
…sync apis" Per #24247, this splits out the async and sync rpc implementations to their own functions. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Per #24247, this splits out the async and sync rpc implementations to their own functions. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90533401 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90575024 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
|
||
|
|
||
| @_require_initialized | ||
| def rpc(to, func, args=None, kwargs=None, async_call=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rohan-varma, thanks for adding this. IIUC, the original goal was actually removing rpc and splitting it into rpc_sync and rpc_async. The reason for this is that, if rpc does not have a fixed return type, we won't be able to TorchScript it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli Are we concerned about backwards compatibility here? I'm not completely familiar with the backwards compatibility protocols we follow, but won't completely removing rpc immediately break the code of whoever is using it when they update to the new version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently test/test_rpc.py and test/test_dist_autograd.py are the only dependency we have I believe, it should be sufficient to modify those accordingly. But I think your solution is more appropriate by marking rpc as deprecated. Let's land this PR and then remove dist.rpc later. :)
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
|
||
|
|
||
| @_require_initialized | ||
| def rpc(to, func, args=None, kwargs=None, async_call=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently test/test_rpc.py and test/test_dist_autograd.py are the only dependency we have I believe, it should be sufficient to modify those accordingly. But I think your solution is more appropriate by marking rpc as deprecated. Let's land this PR and then remove dist.rpc later. :)
| _agent, _to_worker_id(to), serialize(PythonUDF(func, args, kwargs))) | ||
|
|
||
| warnings.warn( | ||
| """dist.rpc is deprecated. Use dist.rpc_async for asynchronous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shall we modify test_rpc.py and test_dist_autograd in this PR as well? Otherwise, this warning message will show up in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sounds good
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90651940 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
@pytorchbot retest this please |
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90684557 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90689755 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
@pytorchbot retest this please |
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90726918 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90761640 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
@pytorchbot retest this please |
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90886603 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
@pytorchbot retest this please |
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90976894 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90997440 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 91023006 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
This pull request has been merged in fb8fd6d. |
| import torch.distributed as dist | ||
| from common_distributed import MultiProcessTestCase | ||
| from common_utils import load_tests, run_tests | ||
| from torch.distributed.rpc import RpcBackend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section (or at least this line) should be put after the if check, otherwise the import will fail. This is currently blocking the Windows builds.
Error:
04:57:13 Traceback (most recent call last):
04:57:13 File "test_rpc.py", line 12, in <module>
04:57:13 from torch.distributed.rpc import RpcBackend
04:57:13 File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\distributed\rpc.py", line 3, in <module>
04:57:13 from . import invoke_rpc_builtin, invoke_rpc_python_udf, invoke_remote_builtin
04:57:13 ImportError: cannot import name 'invoke_rpc_builtin'There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterjc123 thanks for pointing this out, I am putting up a diff now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #27126.
…27126) Summary: These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in #26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this. Pull Request resolved: #27126 Differential Revision: D17682761 Pulled By: rohan-varma fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
|
Windows build issue fixed in #27126. |
Summary: Pull Request resolved: pytorch#26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 91023006 Test Plan: Tests in test/test_rpc.py should all pass. Differential Revision: D17509975 fbshipit-source-id: cc331ec1fdd50c31fa1fae3ba73ddb95d17dad64
…ytorch#27126) Summary: These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in pytorch#26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this. Pull Request resolved: pytorch#27126 Differential Revision: D17682761 Pulled By: rohan-varma fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
Summary: Pull Request resolved: pytorch#26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 91023006 Test Plan: Tests in test/test_rpc.py should all pass. Differential Revision: D17509975 fbshipit-source-id: cc331ec1fdd50c31fa1fae3ba73ddb95d17dad64
…ytorch#27126) Summary: These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in pytorch#26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this. Pull Request resolved: pytorch#27126 Differential Revision: D17682761 Pulled By: rohan-varma fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
Stack from ghstack:
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call
dist.rpcand pass in anasyncflag if they wanted to run it asynchronously. This change introduces two new methods:rpc_syncandrpc_asyncthat return the result or a future containing the result, respectively. The common code is moved to_invoke_rpc.This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.
Differential Revision: D17509975