Skip to content

Conversation

@rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented Sep 20, 2019

Stack from ghstack:

Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call dist.rpc and pass in an async flag if they wanted to run it asynchronously. This change introduces two new methods: rpc_sync and rpc_async that return the result or a future containing the result, respectively. The common code is moved to _invoke_rpc.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: D17509975

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
@pytorchbot pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 20, 2019
…sync apis"


Per #24247, this splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
…sync apis"


Per #24247, this splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
@rohan-varma rohan-varma changed the title [pytorch][distributed][rpc] separate out rpc to sync and async apis [distributed][rpc] separate out rpc to sync and async apis Sep 20, 2019
Per #24247, this splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 20, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90533401

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 23, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90575024

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)


@_require_initialized
def rpc(to, func, args=None, kwargs=None, async_call=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rohan-varma, thanks for adding this. IIUC, the original goal was actually removing rpc and splitting it into rpc_sync and rpc_async. The reason for this is that, if rpc does not have a fixed return type, we won't be able to TorchScript it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli Are we concerned about backwards compatibility here? I'm not completely familiar with the backwards compatibility protocols we follow, but won't completely removing rpc immediately break the code of whoever is using it when they update to the new version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently test/test_rpc.py and test/test_dist_autograd.py are the only dependency we have I believe, it should be sufficient to modify those accordingly. But I think your solution is more appropriate by marking rpc as deprecated. Let's land this PR and then remove dist.rpc later. :)

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!



@_require_initialized
def rpc(to, func, args=None, kwargs=None, async_call=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently test/test_rpc.py and test/test_dist_autograd.py are the only dependency we have I believe, it should be sufficient to modify those accordingly. But I think your solution is more appropriate by marking rpc as deprecated. Let's land this PR and then remove dist.rpc later. :)

_agent, _to_worker_id(to), serialize(PythonUDF(func, args, kwargs)))

warnings.warn(
"""dist.rpc is deprecated. Use dist.rpc_async for asynchronous
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we modify test_rpc.py and test_dist_autograd in this PR as well? Otherwise, this warning message will show up in tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sounds good

Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 24, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90651940

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
@rohan-varma
Copy link
Contributor Author

@pytorchbot retest this please

Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 24, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90684557

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 24, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90689755

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
@rohan-varma
Copy link
Contributor Author

@pytorchbot retest this please

rohan-varma added a commit that referenced this pull request Sep 25, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90726918

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 25, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90761640

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
@rohan-varma
Copy link
Contributor Author

@pytorchbot retest this please

Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 26, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90886603

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
@rohan-varma
Copy link
Contributor Author

@pytorchbot retest this please

Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 28, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90976894

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 29, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 90997440

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`.

This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 30, 2019
Pull Request resolved: #26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 91023006

Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in fb8fd6d.

import torch.distributed as dist
from common_distributed import MultiProcessTestCase
from common_utils import load_tests, run_tests
from torch.distributed.rpc import RpcBackend
Copy link
Collaborator

@peterjc123 peterjc123 Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section (or at least this line) should be put after the if check, otherwise the import will fail. This is currently blocking the Windows builds.
Error:

04:57:13 Traceback (most recent call last):
04:57:13   File "test_rpc.py", line 12, in <module>
04:57:13     from torch.distributed.rpc import RpcBackend
04:57:13   File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\distributed\rpc.py", line 3, in <module>
04:57:13     from . import invoke_rpc_builtin, invoke_rpc_python_udf, invoke_remote_builtin
04:57:13 ImportError: cannot import name 'invoke_rpc_builtin'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterjc123 thanks for pointing this out, I am putting up a diff now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@rohan-varma rohan-varma Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #27126.

facebook-github-bot pushed a commit that referenced this pull request Oct 1, 2019
…27126)

Summary:
These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in #26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this.
Pull Request resolved: #27126

Differential Revision: D17682761

Pulled By: rohan-varma

fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
@rohan-varma
Copy link
Contributor Author

Windows build issue fixed in #27126.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/9/head branch October 28, 2019 22:19
pdlive215 pushed a commit to pdlive215/pytorch that referenced this pull request Nov 27, 2019
Summary:
Pull Request resolved: pytorch#26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 91023006

Test Plan: Tests in test/test_rpc.py should all pass.

Differential Revision: D17509975

fbshipit-source-id: cc331ec1fdd50c31fa1fae3ba73ddb95d17dad64
pdlive215 pushed a commit to pdlive215/pytorch that referenced this pull request Nov 27, 2019
…ytorch#27126)

Summary:
These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in pytorch#26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this.
Pull Request resolved: pytorch#27126

Differential Revision: D17682761

Pulled By: rohan-varma

fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#26570

This diff splits out the async and sync rpc implementations to their
own functions. This way, we can have stronger typehinting and make the type of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the functiohn.
ghstack-source-id: 91023006

Test Plan: Tests in test/test_rpc.py should all pass.

Differential Revision: D17509975

fbshipit-source-id: cc331ec1fdd50c31fa1fae3ba73ddb95d17dad64
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
…ytorch#27126)

Summary:
These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in pytorch#26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this.
Pull Request resolved: pytorch#27126

Differential Revision: D17682761

Pulled By: rohan-varma

fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants