-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add docs for RPC, dist autograd, and RRef modules #29276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
docs/source/model_parallel.rst
Outdated
| .. automodule:: torch.distributed.rpc | ||
| .. currentmodule:: torch.distributed.rpc | ||
|
|
||
| .. autofunction:: rpc_sync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to automatically pull in all public methods of a module instead of specifying each one like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritamdamania87 it looks like other doc-related commits list out each public method, see for example 8915e27#diff-7aa5f4b7f9dedcb7a6ce129b9d3c5eb8. The sphix documentation (http://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html) mentions we can recursively include members of a class/module, which I've updated this PR to use.
docs/source/model_parallel.rst
Outdated
| @@ -0,0 +1,11 @@ | |||
| .. role:: hidden | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this hidden on purpose? I think it should be fine to not hide this for now since it'll only be in master and won't show up in some stable docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep them hidden if the docs aren't polished completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also in response to similar question from @mrshenli)
As @pritamdamania mentioned, it's fine to have unfinished docs out in master; however, tagging them as hidden can be a helpful way to keep track of what still needs work prior to the final release.
docs/source/model_parallel.rst
Outdated
| .. role:: hidden | ||
| :class: hidden-section | ||
|
|
||
| Distributed RPC Framework - torch.distributed.autograd and torch.distributed.rpc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only Distributed RPC Framework should be sufficient, otherwise the entire title is too long for the title.
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Hey @jlin27, should we keep this hidden for now as we haven't finished polishing the docstrings yet?
docs/source/model_parallel.rst
Outdated
| RPC and RRef Framework | ||
| ==================================================== | ||
|
|
||
| Bbasics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basics
| Returns the result of running ``func`` on ``args`` and ``kwargs``. | ||
| Example:: | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: why do we need the new line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this makes the code appear nicely and in python format on the documentation page
docs/source/model_parallel.rst
Outdated
| Before using RPC and distributed autograd primitives, initialization must take place. First, a backend over which RPCs can be sent over must be initialized. The default (and currently, only available) implementation is the `ProcessGroup` backend, and must be initialized with `torch.distributed.init_process_group` before using other function. See the `documentation for torch.distributed <https://pytorch.org/docs/stable/distributed.html>`_ for additional details. Next, the local RPC agent can be initialized, after which the process will be able to send and receive RPCs from all other connected processes. | ||
|
|
||
| .. automodule:: torch.distributed.rpc | ||
| :members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritamdamania87 this will add all members of this module, but we'd want to be careful in using this since non-polished/private functions could show up.
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content LGTM! Please also get a stamp from @jlin27 on whether we should hide this for now and whether the structure is OK. Thanks!
docs/source/rpc.rst
Outdated
| Distributed RPC Framework | ||
| ===================================================== | ||
|
|
||
| The distributed RPC framework provides mechanisms for multi-machine model training through a set of primitives to allow for remote communication, and a higher-level API to automatically differentiate models split across several machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have everything on a single long line? Can we split the lines in this file? Looks like most doc files split lines after 80 chars.
|
@pritamdamania87 All the comments are addressed, could you take another look? Thanks! |
pritamdamania87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, although please check why the RRef class doesn't show up and see if we can fix it before landing.
|
@jlin27 Could you please take a look at the structure and whether it should be hidden before I land this? Thanks! |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| .. role:: hidden | ||
| :class: hidden-section | ||
|
|
||
| Distributed RPC Framework |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mark this as experimental for now. Check out this page as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do this in follow up PR
|
All of the failures are pre existing and not related to this PR. Proceeding with landing. |
|
@rohan-varma merged this pull request in 06ef4a7. |
Summary: Closes pytorch#28983. Documentation for `torch.distributed.rpc` and `torch.distributed.autograd` modules. Also fixes/tidies up some of the docstrings in rpc/autograd, and moves some functions to be private so they don't show up in the documentation. Note: Much of the text to describe/explain the RPC/RRef layers are taken from the following RFCs: pytorch#23110, pytorch#26759 Pull Request resolved: pytorch#29276 Differential Revision: D18478754 Pulled By: rohan-varma fbshipit-source-id: e9a7089baf5275304e5408d319eb9bf98e53fff8
Closes #28983. Documentation for
torch.distributed.rpcandtorch.distributed.autogradmodules. Also fixes/tidies up some of the docstrings in rpc/autograd, and moves some functions to be private so they don't show up in the documentation.Note: Much of the text to describe/explain the RPC/RRef layers are taken from the following RFCs: #23110, #26759