Skip to content

Conversation

@pritamdamania87
Copy link
Contributor

@pritamdamania87 pritamdamania87 commented Nov 19, 2019

Stack from ghstack:

Differential Revision: D18556536

@@ -1,3 +1,6 @@
.. warning::
The :ref:`distributed-rpc-framework` is experimental and subject to change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is distributed_autograd doc, should it actually say distributed autograd is experimental?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually links to the Distributed RPC Framework doc page and distributed autograd is a subsection there.

``RRef``.
2. Takes a :class:`~torch.optim.Optimizer` class as the local
optimizer to run on all nodes.
3. The distributed optimizer creates an instance of the local ``Optimizer`` on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 3 kind of duplicated with 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 is mentioning the inputs to the distributed optimizer, 3 is just mentioning that we use those to create RRefs remotely.

each of the worker nodes and holds an ``RRef`` to them.
4. When :meth:`torch.distributed.optim.DistributedOptimizer.step` is invoked,
the distributed optimizer uses RPC to remotely execute all the local
optimizers on the appropriate remote workers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mentioned the optimizer param/grad sync/lock behavior here?

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

5. If multiple concurrent distributed optimizers are updating the same
parameters on a worker, these updates are serialized via a lock.

Simple end to end example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other section names use first capital letter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for top level sections but not necessarily other sections. I mostly capitalized based on whether it looked appropriate :)

Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 88ef402.

@facebook-github-bot facebook-github-bot deleted the gh/pritamdamania87/32/head branch November 23, 2019 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants