Skip to content

Conversation

@aazzolini
Copy link
Contributor

@aazzolini aazzolini commented Nov 6, 2019

Stack from ghstack:

Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized.
It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel.

Differential Revision: D18354586

Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized.
It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel.

Differential Revision: [D18354586](https://our.internmc.facebook.com/intern/diff/D18354586/)

[ghstack-poisoned]
aazzolini added a commit that referenced this pull request Nov 6, 2019
Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized.
It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel.

Differential Revision: [D18354586](https://our.internmc.facebook.com/intern/diff/D18354586/)

ghstack-source-id: 93381365
Pull Request resolved: #29304
@aazzolini
Copy link
Contributor Author

This PR replaces #28910 which was closed by mistake and couldn't be reopened.

For this version, I got rid of FunctionalOptimizer and made torch.optim.Optimizer work directly with DistributedOptimizer, simplifying the implementation and reducing new API footprint.

kwargs: arguments to pass to the optimizer constructor on each worker.
"""
def __init__(self, optimizer_class, params_rref, *args, **kwargs):
per_worker_params_rref = defaultdict(lambda: [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: defaultdict(list)

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is very simple indeed :)



if __name__ == '__main__':
unittest.main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken this test cannot run by itself, so this should go.

[rref.local_value().wait() for rref in local_params_rref],
*args,
**kwargs)
self.lock = Lock()
Copy link
Contributor

@vincentqb vincentqb Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the lock mentioned here, right? How would we modify this to get Hogwild?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There's no simple way of modifying the current implementation to get hogwild. We'd have to either 1) change the interface (FunctionalOptimizer); or some alternative such as keeping an object pool / or thread local instances of optim.Optimizers to avoid gradient sharing across threads. I can write a comment on why the lock is there.

Copy link
Contributor

@vincentqb vincentqb Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, just wanted to verify. A quick comment would be a good idea indeed. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't removing this lock give us some form of hogwild? I'm not sure how accurate this example is: https://pytorch.org/docs/stable/notes/multiprocessing.html#hogwild, although it does something similar where accumulating gradients on the tensors and running the optimizer could interleave among different processes.

@albanD
Copy link
Collaborator

albanD commented Nov 7, 2019

What is the plan for adding a tutorial / example?
Are you planning on adding this here? Or wait to have all the features to do a single tutorial with a full training?

@mrshenli
Copy link
Contributor

mrshenli commented Nov 7, 2019

What is the plan for adding a tutorial / example?

We need thorough docstrings for this one, then we can have a single tutorial for full training I think.

with self.lock:
for param, grad in all_local_grads.items():
param.grad = grad
self.optim.step()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is nice!

params_rref (list[RRef]): list of RRefs to local or remote parameters
to optimize.
args: arguments to pass to the optimizer constructor on each worker.
kwargs: arguments to pass to the optimizer constructor on each worker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add an example here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a convincing example will need a way to 1) create the remote module; 2) call it; 3) get a list of RRef params for it. We can add this after we have closed on a way of doing these.

specific parameters.

Args:
optimizer_class (FunctionalOptimizer): the class of optimizer to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no FunctionalOptimizer any more, do you mean optim.Optimizer?

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! My comments are mostly on docs.

to optimize.
args: arguments to pass to the optimizer constructor on each worker.
kwargs: arguments to pass to the optimizer constructor on each worker.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add a warning that we only make sure concurrent step() calls will not modify the same param at the same time, but they could still modify params on different owners in an interleaving way? This means that when an dist optimizer tries to apply a grad x to a param, the param might already be different from when grad x was computed. This behavior is by design. If the application needs to do global exclusive dist optimizer step(), they will have to synchronize it on their own.

Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized.
It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel.

Differential Revision: [D18354586](https://our.internmc.facebook.com/intern/diff/D18354586/)

[ghstack-poisoned]
aazzolini added a commit that referenced this pull request Nov 7, 2019
Pull Request resolved: #29304

Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized.
It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel.
ghstack-source-id: 93487483

Differential Revision: [D18354586](https://our.internmc.facebook.com/intern/diff/D18354586/)
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal test failure is real

Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized.
It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel.

Differential Revision: [D18354586](https://our.internmc.facebook.com/intern/diff/D18354586/)

[ghstack-poisoned]
aazzolini added a commit that referenced this pull request Nov 8, 2019
Pull Request resolved: #29304

Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized.
It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel.
ghstack-source-id: 93564364

Differential Revision: [D18354586](https://our.internmc.facebook.com/intern/diff/D18354586/)
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test all passed, let's land this!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b0cf43b.

@facebook-github-bot facebook-github-bot deleted the gh/aazzolini/5/head branch November 15, 2019 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants