sync and async torch.distributed.rpc for builtin operators #23228

mrshenli · 2019-07-23T14:16:20Z

Stack from ghstack:

sync and async torch.distributed.rpc for builtin operators #23228 sync and async torch.distributed.rpc for builtin operators

Features:

sync and async RPC for builtin operators
RpcAgent API
ProcessGroupAgent implementation

Goal:

This is the first PR for #23110, and there will be many followup ones. So let's focus on the overall API and code structure. Details like efficiency and error handling can be improved in future PRs.

have a minimum working and testable RPC implementation.
make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation
- For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object.
- For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...).
support blocking and non-blocking RequestCallback
- blocking means the callback won't return before sending out the response
- non-blocking can be achieved by enqueue the (from, request, RpcAgent&) tuple and use a different thread to process them. That is why there is an RpcAgent& arg in the param list.

Differential Revision: D15194693

Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. We are not exporting this diff until we finalize distributed autograd design and publish the API review publicly. https://fb.quip.com/FabTAZKVgQpf Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. We are not exporting this diff until we finalize distributed autograd design and publish the API review publicly. https://fb.quip.com/FabTAZKVgQpf Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/) ghstack-source-id: 86988305 Pull Request resolved: #23228

…rators" Stack from [ghstack](https://github.com/ezyang/ghstack): * **#23228 sync and async torch.distributed.rpc for builtin operators** Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation for #23110 * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. We are not exporting this diff until we finalize distributed autograd design and publish the API review publicly. https://fb.quip.com/FabTAZKVgQpf Pull Request resolved: #23228 ghstack-source-id: 87140239 Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

mrshenli · 2019-07-25T16:50:53Z

@pytorchbot retest this please

…rators" Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation for #23110 * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

torch/csrc/Module.cpp

…rators" Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation for #23110 * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. We are not exporting this diff until we finalize distributed autograd design and publish the API review publicly. https://fb.quip.com/FabTAZKVgQpf Pull Request resolved: #23228 ghstack-source-id: 87205606 Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

…rators" Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation for #23110 * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. We are not exporting this diff until we finalize distributed autograd design and publish the API review publicly. https://fb.quip.com/FabTAZKVgQpf Pull Request resolved: #23228 ghstack-source-id: 87224632 Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

xush6528 · 2019-08-06T19:18:01Z

I think, for a disallowed operation, there should also be test cases that assert it raises as expected, rather than getting stuck in some unexpected states.

Also, document these constraint in docstrings.

Yeah, both sync_rpc and join_rpc might need more tests.

Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: This is the first PR for #23110, and there will be many followup ones. So let's focus on the overall API and code structure. Details like efficiency and error handling can be improved in future PRs. * have a minimum working and testable RPC implementation. * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

mrshenli · 2019-08-06T19:20:26Z

rebase messed up files, will clean up

Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: This is the first PR for #23110, and there will be many followup ones. So let's focus on the overall API and code structure. Details like efficiency and error handling can be improved in future PRs. * have a minimum working and testable RPC implementation. * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)

mrshenli · 2019-08-06T20:14:58Z

@xush6528 I made changes to allow calling join_rpc multiple times (to be consistent with Thread.join())

mrshenli · 2019-08-06T21:20:53Z

Hi @driazati, what's the plan for landing this PR and #23241? I saw pickler/unpickler APIs are changed in #23241. Do you mind if I merge this one first and then we address the conflicts together?

xush6528 · 2019-08-06T22:12:12Z

torch/csrc/distributed/rpc/ProcessGroupAgent.cpp

+    Message message = deserialize(ss);
+
+    if (message.isRequest()) {
+      cb_(names_[srcRank], std::move(message), *this);


@mrshenli

Here is a call sequence that blocks.

Let me turn it into a unit test.

Yep, cb has to be non-blocking for UDFs. It will come in the next PR. For this one, as we only target builtin ops, blocking cb should suffice.

mrshenli · 2019-08-06T22:23:49Z

Errors are the same for three failed tests, and are irrelevant (also appears in other PRs, e.g., #23897). I am landing this PR.

19:57:38 CMake Error at third_party/fbgemm/third_party/asmjit/CMakeLists.txt:1 (cmake_minimum_required):
19:57:38   CMake 3.8 or higher is required.  You are running version 3.6.3

jianyuh · 2019-08-06T23:33:13Z

Errors are the same for three failed tests, and are irrelevant (also appears in other PRs, e.g., #23897). I am landing this PR.
19:57:38 CMake Error at third_party/fbgemm/third_party/asmjit/CMakeLists.txt:1 (cmake_minimum_required):
19:57:38   CMake 3.8 or higher is required.  You are running version 3.6.3

Please ignore the error message for asmjit CMake version: this is due to the upgrade of FBGEMM. We have reverted the PR in FBGEMM.

pritamdamania87

Made a pass through PR a little late, hopefully we can address some of these comments in a separate PR?

pritamdamania87 · 2019-08-06T18:37:59Z

test/test_rpc.py

+            n = i + self.rank + 1
+            ret = dist.rpc('worker%d' % dstRank, torch.add,
+                           args=(torch.ones(n, n), torch.ones(n, n)))
+            self.assertEqual(ret, torch.ones(n, n) * 2)


Wondering if we should test some errors and verify we handle them correctly, some examples:

Invalid argument types.

Invalid number of arguments.

Invalid operator.

Yep, @zhaojuanmao also mentioned that. This will come in followup PRs.

pritamdamania87 · 2019-08-06T18:41:29Z

test/test_rpc.py

+    sys.exit(0)
+
+
+def _wrap_with_rpc(func):


Do we need to use a decorator like this to setup and teardown each test? Usually python unit test frameworks use setUp and tearDown methods for this.

Good point, we could write.

class RpcTest(MultiProcessTestCase) @classmethod def setUpClass(cls): super(MultiProcessTestCase, cls).setUpClass() cls._run = _wrap_with_rpc(cls._run)

Good point!

pritamdamania87 · 2019-08-06T23:39:15Z

torch/csrc/distributed/rpc/ProcessGroupAgent.cpp

+      {message.id(), (int64_t) message.type()}, {torch::kInt64}
+  ));
+
+  torch::save(tensors, os);


Based on some profiling, @aazzolini mentioned torch::save and torch::load is pretty slow.

Yep, I also learnt that from @aazzolini. I will try the new pickle API when #23241 is merged

pritamdamania87 · 2019-08-06T23:41:20Z

torch/csrc/distributed/rpc/ProcessGroupAgent.cpp

+  torch::load(tensors, is);
+
+  TORCH_CHECK(tensors.size() >= 2, "Failed to deserialize a message.");
+  auto miscTensor = std::move(tensors.back());


nit: maybe call this metadataTensor.

pritamdamania87 · 2019-08-06T23:57:54Z

torch/csrc/distributed/rpc/ProcessGroupAgent.cpp

+  TORCH_CHECK(dstRank != pg_->getRank(), "ProcessGroupAgent does not support "
+    "making RPC calls to self.")
+
+  auto requestId = nextId();


nextId() is not thread safe.

nextId_ is atomic now, is that sufficient?

pritamdamania87 · 2019-08-07T00:27:02Z

torch/csrc/distributed/rpc/ScriptRet.cpp

+
+} // namespace
+
+ScriptRet::ScriptRet(at::IValue&& value) : value_(value) {}


Does an OP always return a single IValue?

Yes, according to @zdevito's comment:

A return should just be a single value. This is how python and TorchScript already work.

pritamdamania87 · 2019-08-07T00:33:37Z

torch/csrc/distributed/rpc/ScriptRet.h

+// Return value of a builtin operator or a TorchScript function.
+class TORCH_API ScriptRet final {
+ public:
+  explicit ScriptRet(at::IValue&& values);


pritamdamania87 · 2019-08-07T00:34:01Z

torch/csrc/distributed/rpc/functions.h

+namespace distributed {
+namespace rpc {
+
+void processRequestBlocking(


Add some documentation to this function and also some documentation to this file mentioning what is the purpose of this file and which methods belong here.

pritamdamania87 · 2019-08-07T00:43:02Z

torch/csrc/distributed/rpc/python_functions.cpp

+
+        return agent.send(
+            dstName, ScriptCall(op, std::move(stack)).toMessage());
+      } catch (std::runtime_error) {}


add a LOG(DEBUG) when catching errors?

@pritamdamania87 do you know which LOG I should use if I would like to make it work both internally and externally?

@mrshenli It looks like we do use glog in our code: https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/throughput_benchmark-inl.h#L73

pritamdamania87 · 2019-08-07T00:46:51Z

torch/distributed/rpc.py

+    return names
+
+
+def join_rpc():


As I mentioned above, this makes sense for ProcessGroup based RPCs, but not really something like a thrift implementation. We shouldn't have join and sync in the public APIs.

I would assume Thrift also have a join or shutdown to make sure the local RpcAgent is not exiting too early?

mrshenli · 2019-08-07T02:27:02Z

@pritamdamania87 thanks for the comments, will definitely address them in a followup PR.

ezyang · 2019-08-07T22:39:29Z

@mrshenli I think this diff breaks OS X binary builds:

Aug 07 00:00:03 + python -c 'import torch'
Aug 07 00:00:04 dyld: lazy symbol binding failed: Symbol not found: __ZN5torch11distributed3rpc16python_functionsEv
Aug 07 00:00:04   Referenced from: /Users/distiller/project/miniconda/envs/wheel_py36/lib/python3.6/site-packages/torch/lib/libtorch_python.dylib
Aug 07 00:00:04   Expected in: flat namespace
Aug 07 00:00:04 
Aug 07 00:00:04 dyld: Symbol not found: __ZN5torch11distributed3rpc16python_functionsEv
Aug 07 00:00:04   Referenced from: /Users/distiller/project/miniconda/envs/wheel_py36/lib/python3.6/site-packages/torch/lib/libtorch_python.dylib
Aug 07 00:00:04   Expected in: flat namespace

https://circleci.com/gh/pytorch/pytorch/2385105?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console

Summary: #23228 caused build failure on OSX, because rpc.h is included as long as USE_DISTRIBUTED=1, but rpc/init.cpp (and others) is only included when NOT APPLE. So, it cannot find python_functions defined in init.cpp on MacOS. This PR attempt to fix it by wrapping rpc.h with USE_C10D, which is only set when NOT APPLE. I tried this fix locally and it works. Pull Request resolved: #23998 Differential Revision: D16706087 Pulled By: mrshenli fbshipit-source-id: d04fe6717a181a3198289cdef51439708c2e291d

Summary: # Issue Implementation in pytorch#23228 has a recvLoop thread that blocks on running a requested RPC function. This can't meet the requirement in pytorch#23110 The proposal implies a worker could send out nested RPC. "Nested RPC" means, an RPC callee could send out another RPC, before the first RPC returns, and since the worker's recv thread would be busy waiting for the return of the first RPC function, there is no thread to receive the return value of the second RPC function. A diagram showing the case of this issue, pytorch#23228 (comment) # Solution - Add a test case to capture this requirement. - Add debugging utilities that could be quite useful for debugging tricky RPC cases. # Misc - Add debugging utility in common_distributed for tracing RPC calls. Differential Revision: D16682122 fbshipit-source-id: 062e77faa5e8467cb3cfe5b0a16333c2762768a9

Summary: Pull Request resolved: pytorch#24036 # Issue Implementation in pytorch#23228 has a recvLoop thread that blocks on running a requested RPC function. This can't meet the requirement in pytorch#23110, where the proposal implies a worker could send out nested RPC. "Nested RPC" means, an RPC callee could send out another RPC, before the first RPC returns, and since the worker's recv thread would be busy waiting for the return of the first RPC function, there is no available idle thread to receive the return value of the second RPC function. A diagram showing the case of this issue, pytorch#23228 (comment). A more extensive thinking by mrshenli in pytorch#23569 (comment). # Solution - Add a test case to capture this requirement. # Misc - Add debugging utilities that could be quite useful for tracing RPC behaviors and debugging tricky RPC cases similar to this. Differential Revision: D16682122 fbshipit-source-id: ffa78eb20af4e2cf9476998fa544ab940035cae9

mrshenli requested review from apaszke and pietern as code owners July 23, 2019 14:16

mrshenli changed the title ~~sync and async torch.distributed.rpc for builtin operators~~ [WIP] sync and async torch.distributed.rpc for builtin operators Jul 23, 2019

mrshenli mentioned this pull request Jul 23, 2019

Add collective communication APIs for Python objects. #23232

Closed

pytorchbot added the module: tests Issues related to tests (not the torch.testing module) label Jul 25, 2019

mrshenli commented Jul 25, 2019

View reviewed changes

torch/csrc/Module.cpp Outdated Show resolved Hide resolved

mrshenli changed the title ~~[WIP] sync and async torch.distributed.rpc for builtin operators~~ sync and async torch.distributed.rpc for builtin operators Jul 26, 2019

mrshenli requested review from ezyang, gchanan and soumith July 26, 2019 02:48

xush6528 reviewed Aug 6, 2019

View reviewed changes

facebook-github-bot closed this in 8b34907 Aug 6, 2019

zou3519 deleted the gh/mrshenli/7/head branch August 6, 2019 23:06

pritamdamania87 reviewed Aug 7, 2019

View reviewed changes

This was referenced Aug 7, 2019

[torch.distributed][RPC] An RPC callee could crash on RpcAgent::join(), if the caller terminates with unresolved future. #23948

Closed

[RPC] 3-worker deadlock if start ProcessGroupAgent::join with an unresolved future. #23975

Closed

mrshenli mentioned this pull request Aug 8, 2019

Fix build failure on OSX #23998

Closed

xush6528 mentioned this pull request Aug 8, 2019

[torch.distributed][RPC] Add test case for nested RPC #24036

Closed

This was referenced Aug 12, 2019

[WIP] Assign each RpcAgent a unique ID, and use ID to send RPC messages. #24189

Closed

Support Callbacks on Asynchronous RPC #24118

Closed


		} // namespace

		ScriptRet::ScriptRet(at::IValue&& value) : value_(value) {}

sync and async torch.distributed.rpc for builtin operators #23228

sync and async torch.distributed.rpc for builtin operators #23228

Uh oh!

Conversation

mrshenli commented Jul 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrshenli commented Jul 25, 2019

Uh oh!

Uh oh!

xush6528 commented Aug 6, 2019

Uh oh!

mrshenli commented Aug 6, 2019

Uh oh!

mrshenli commented Aug 6, 2019

Uh oh!

mrshenli commented Aug 6, 2019

Uh oh!

xush6528 Aug 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Aug 6, 2019

Uh oh!

jianyuh commented Aug 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pritamdamania87 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Aug 7, 2019

Uh oh!

ezyang commented Aug 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

mrshenli commented Jul 23, 2019 •

edited

Loading

xush6528 Aug 6, 2019 •

edited

Loading

jianyuh commented Aug 6, 2019 •

edited

Loading