-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add distributed optimizer section to distributed autograd design doc. #30068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distributed optimizer section to distributed autograd design doc. #30068
Conversation
Differential Revision: [D18556536](https://our.internmc.facebook.com/intern/diff/D18556536/) [ghstack-poisoned]
Differential Revision: [D18556536](https://our.internmc.facebook.com/intern/diff/D18556536/) ghstack-source-id: 94176301 Pull Request resolved: #30068
| @@ -1,3 +1,6 @@ | |||
| .. warning:: | |||
| The :ref:`distributed-rpc-framework` is experimental and subject to change. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is distributed_autograd doc, should it actually say distributed autograd is experimental?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually links to the Distributed RPC Framework doc page and distributed autograd is a subsection there.
| ``RRef``. | ||
| 2. Takes a :class:`~torch.optim.Optimizer` class as the local | ||
| optimizer to run on all nodes. | ||
| 3. The distributed optimizer creates an instance of the local ``Optimizer`` on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 3 kind of duplicated with 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 is mentioning the inputs to the distributed optimizer, 3 is just mentioning that we use those to create RRefs remotely.
| each of the worker nodes and holds an ``RRef`` to them. | ||
| 4. When :meth:`torch.distributed.optim.DistributedOptimizer.step` is invoked, | ||
| the distributed optimizer uses RPC to remotely execute all the local | ||
| optimizers on the appropriate remote workers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to mentioned the optimizer param/grad sync/lock behavior here?
…design doc." Differential Revision: [D18556536](https://our.internmc.facebook.com/intern/diff/D18556536/) [ghstack-poisoned]
Pull Request resolved: #30068 ghstack-source-id: 94228719 Differential Revision: [D18556536](https://our.internmc.facebook.com/intern/diff/D18556536/)
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| 5. If multiple concurrent distributed optimizers are updating the same | ||
| parameters on a worker, these updates are serialized via a lock. | ||
|
|
||
| Simple end to end example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other section names use first capital letter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true for top level sections but not necessarily other sections. I mostly capitalized based on whether it looked appropriate :)
rohan-varma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good!
|
This pull request has been merged in 88ef402. |
Stack from ghstack:
Differential Revision: D18556536