[distributed] add known worker ids to distributed autograd context #26324

rohan-varma · 2019-09-16T22:40:50Z

Per #25525 we want to clean up distributed autograd context on all nodes, in addition to the local one. To do this, we want to send async RPCs to the other nodes telling them to clean up the context.

The first step for this is for a node's context to know about the other workers. This PR does two things:

Adds the necessary data structures and getter functions to DistAutogradContext
Refactors calls to addSendRpcBackward to take in the worker_id as an additional argument

As per pytorch#23110, each autograd pass would be assigned a unique autograd_context_id. In this change we introduce a DistAutogradContainer per worker which holds information for each autograd pass currently running. DistAutogradContainer has a map from the autograd_context_id to DistAutogradContext (which holds all the relevant information for the autograd pass). DistAutogradContext currently only stores the autograd_context_id and more information would be added to it later as we build out the rest of the framework. The autograd_context_id is a 64 bit globally unique integer where the first 16 bits are the worker_id and next 48 bits are auto-incrementing for uniqueness. Sample python code on how this would be used for distributed autograd: ``` import torch.distributed.autograd as dist_autograd worker_id = 0 dist_autograd.init(worker_id) with dist_autograd.context() as context_id: # forward pass... # backward pass... # optimizer step... ``` Differential Revision: [D16356694](https://our.internmc.facebook.com/intern/diff/D16356694/)

This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. Differential Revision: [D16903255](https://our.internmc.facebook.com/intern/diff/D16903255/)

As per pytorch#23110, each autograd pass would be assigned a unique autograd_context_id. In this change we introduce a DistAutogradContainer per worker which holds information for each autograd pass currently running. DistAutogradContainer has a map from the autograd_context_id to DistAutogradContext (which holds all the relevant information for the autograd pass). DistAutogradContext currently only stores the autograd_context_id and more information would be added to it later as we build out the rest of the framework. The autograd_context_id is a 64 bit globally unique integer where the first 16 bits are the worker_id and next 48 bits are auto-incrementing for uniqueness. Sample python code on how this would be used for distributed autograd: ``` import torch.distributed.autograd as dist_autograd worker_id = 0 dist_autograd.init(worker_id) with dist_autograd.context() as context_id: # forward pass... # backward pass... # optimizer step... ``` Differential Revision: [D16356694](https://our.internmc.facebook.com/intern/diff/D16356694/)

…art of RPC." This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. GitHub Issue: pytorch#23110 Differential Revision: [D16903255](https://our.internmc.facebook.com/intern/diff/D16903255/)

As per pytorch#23110, each autograd pass would be assigned a unique autograd_context_id. In this change we introduce a DistAutogradContainer per worker which holds information for each autograd pass currently running. DistAutogradContainer has a map from the autograd_context_id to DistAutogradContext (which holds all the relevant information for the autograd pass). DistAutogradContext currently only stores the autograd_context_id and more information would be added to it later as we build out the rest of the framework. The autograd_context_id is a 64 bit globally unique integer where the first 16 bits are the worker_id and next 48 bits are auto-incrementing for uniqueness. Sample python code on how this would be used for distributed autograd: ``` import torch.distributed.autograd as dist_autograd worker_id = 0 dist_autograd.init(worker_id) with dist_autograd.context() as context_id: # forward pass... # backward pass... # optimizer step... ``` Differential Revision: [D16356694](https://our.internmc.facebook.com/intern/diff/D16356694/)

…art of RPC." This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. GitHub Issue: pytorch#23110 Differential Revision: [D16903255](https://our.internmc.facebook.com/intern/diff/D16903255/)

As per pytorch#23110, each autograd pass would be assigned a unique autograd_context_id. In this change we introduce a DistAutogradContainer per worker which holds information for each autograd pass currently running. DistAutogradContainer has a map from the autograd_context_id to DistAutogradContext (which holds all the relevant information for the autograd pass). DistAutogradContext currently only stores the autograd_context_id and more information would be added to it later as we build out the rest of the framework. The autograd_context_id is a 64 bit globally unique integer where the first 16 bits are the worker_id and next 48 bits are auto-incrementing for uniqueness. Sample python code on how this would be used for distributed autograd: ``` import torch.distributed.autograd as dist_autograd worker_id = 0 dist_autograd.init(worker_id) with dist_autograd.context() as context_id: # forward pass... # backward pass... # optimizer step... ``` Differential Revision: [D16356694](https://our.internmc.facebook.com/intern/diff/D16356694/)

…art of RPC." This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. GitHub Issue: pytorch#23110 Differential Revision: [D16903255](https://our.internmc.facebook.com/intern/diff/D16903255/)

…ograd graph as part of RPC." This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. GitHub Issue: pytorch#23110 Differential Revision: [D16903255](https://our.internmc.facebook.com/intern/diff/D16903255/)

…art of RPC." This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. GitHub Issue: pytorch#23110 Differential Revision: [D16903255](https://our.internmc.facebook.com/intern/diff/D16903255/)

…ograd graph as part of RPC." This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. GitHub Issue: pytorch#23110 Differential Revision: [D16903255](https://our.internmc.facebook.com/intern/diff/D16903255/)

…art of RPC." This contains very basic functionality of adding 'send' autograd function to our autograd graph. The purpose of this change is to validate the basic structure proposed here makes sense. Once this makes sense, we can build upon this to address more complicated scenarios. At a high level we've added the following functionality: 1) Define a very simple 'SendRpcBackwards' autograd function. 2) Attach this function to appropriate tensors when we call an RPC. 3) Store the send function in our distributed autograd context. GitHub Issue: pytorch#23110 Differential Revision: [D16903255](https://our.internmc.facebook.com/intern/diff/D16903255/)

Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

…uiltin operators RPC." Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

… RPC." Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

…uiltin operators RPC." Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

… RPC." Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

…uiltin operators RPC." Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

… RPC." Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

…uiltin operators RPC." Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

… RPC." Master GH issue: pytorch#23110. This change builds upon pytorch#24876 and provides all the autograd hooks needed for a forward pass with distributed rpc for builtin operators. This change does not address distributed rpc for python UDFs and that will be addressed in follow up PRs. Summary of changes: 1. Attach send autograd functions when a request is sent from the client and response is sent from the server. 2. Attach receive autograd functions when a request is received on the server and a response is received on the client. 3. Generate a globally unique autograd_message_id for each send/recv autograd function pair to uniquely identify them. Differential Revision: [D17148077](https://our.internmc.facebook.com/intern/diff/D17148077/)

facebook-github-bot · 2019-10-15T02:37:08Z

@rohan-varma merged this pull request in b5e0fd4.

… when it is released on one node" Per #25525, we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in #26324) 4) Relevant unit tests Note: the current version is very simple and does not attempt to do cycle detection of RPCs or any retries. If `releaseContext` is called directly on one node, then that node will send RPCs to the other nodes it knows about to release their context. However, if a node receives an RPC that tells it to release its context, it will do so, but will not forward this request to other nodes that it knows about. This is to avoid cycles. The limitation of this approach is that the entire graph of nodes may not have their contexts released. In follow-up PRs, we can implement better cycle detection to solve this problem. Differential Revision: [D17920137](https://our.internmc.facebook.com/intern/diff/D17920137/) [ghstack-poisoned]

… when it is released on one node" Per #25525, we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in #26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. Differential Revision: [D17920137](https://our.internmc.facebook.com/intern/diff/D17920137/) [ghstack-poisoned]

…one. Pull Request resolved: #27951 we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in #26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. ghstack-source-id: 92259078 Differential Revision: [D17920137](https://our.internmc.facebook.com/intern/diff/D17920137/)

… when it is released on one node" Per #25525, we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in #26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. Differential Revision: [D17920137](https://our.internmc.facebook.com/intern/diff/D17920137/) [ghstack-poisoned]

…one. Pull Request resolved: #27951 we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in #26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. ghstack-source-id: 92261890 Differential Revision: [D17920137](https://our.internmc.facebook.com/intern/diff/D17920137/)

… when it is released on one node" Per #25525, we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in #26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. Differential Revision: [D17920137](https://our.internmc.facebook.com/intern/diff/D17920137/) [ghstack-poisoned]

…one. Pull Request resolved: #27951 we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in #26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. ghstack-source-id: 92269279 Differential Revision: [D17920137](https://our.internmc.facebook.com/intern/diff/D17920137/)

…ne node (#27951) Summary: Pull Request resolved: #27951 we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in #26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. ghstack-source-id: 92269279 Test Plan: Added/modified unit tests in `test/dist_autograd_test.py` Differential Revision: D17920137 fbshipit-source-id: 7403512ab5fcbc28d21c548b2e45319dd472e26a

Summary: Per pytorch#25525 we want to clean up distributed autograd context on all nodes, in addition to the local one. To do this, we want to send async RPCs to the other nodes telling them to clean up the context. The first step for this is for a node's context to know about the other workers. This PR does two things: 1) Adds the necessary data structures and getter functions to `DistAutogradContext` 2) Refactors calls to `addSendRpcBackward` to take in the `worker_id` as an additional argument Pull Request resolved: pytorch#26324 Differential Revision: D17769411 Pulled By: rohan-varma fbshipit-source-id: b7327d1209a574e2e88cb197edff3103024d51ad

…ne node (pytorch#27951) Summary: Pull Request resolved: pytorch#27951 we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in pytorch#26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. ghstack-source-id: 92269279 Test Plan: Added/modified unit tests in `test/dist_autograd_test.py` Differential Revision: D17920137 fbshipit-source-id: 7403512ab5fcbc28d21c548b2e45319dd472e26a

pritam and others added 23 commits August 19, 2019 16:32

first change

60dece6

rohan-varma requested review from mrshenli and pietern as code owners September 16, 2019 22:40

pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 16, 2019

rohan-varma removed request for mrshenli and pietern September 16, 2019 22:41

rvarm1 and others added 2 commits September 16, 2019 15:45

Add known worker IDs set and write to it

f2ab826

Added custom operator

4611dfd

facebook-github-bot closed this in b5e0fd4 Oct 14, 2019

facebook-github-bot added the merged label Oct 15, 2019

rohan-varma mentioned this pull request Oct 15, 2019

[distributed] cleanup dist autograd context on other nodes when it is released on one node #27951

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[distributed] add known worker ids to distributed autograd context #26324

[distributed] add known worker ids to distributed autograd context #26324

Uh oh!

rohan-varma commented Sep 16, 2019 •

edited

Loading

Uh oh!

facebook-github-bot commented Oct 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[distributed] add known worker ids to distributed autograd context #26324

[distributed] add known worker ids to distributed autograd context #26324

Uh oh!

Conversation

rohan-varma commented Sep 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Oct 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rohan-varma commented Sep 16, 2019 •

edited

Loading