Skip to content

Conversation

@pietern
Copy link
Contributor

@pietern pietern commented Apr 27, 2019

Stack:
    :black_circle:  #19821 Allow for iterations where no module parameter is used  💛

It is possible that not a single parameter is used during an
iteration. If this is the case, the prepare_for_backward function
marks all parameters as unused, kicks off reduction of all buckets,
and finalizes the reduction.

This is different from the prior implementation where we assumed that
autograd would produce a gradient for at least a single parameter.
We then used the autograd callback mechanism to queue a finalizer
callback. Now, this finalizer may be executed in line.

Differential Revision: D15113272

@pytorchbot pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Apr 27, 2019
Differential Revision: D15113272
Differential Version: 80846101
Differential Revision: D15113272
Differential Version: 80846136
@pietern pietern changed the title Stop relying on autograd finalizer hook in reducer Allow for iterations where no module parameter is used Apr 27, 2019
this->finalize_backward();
});
} else {
finalize_backward();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am debating with myself whether we should support this or error out with explicit message. What are the use cases for DDP without backward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should definitely support it. It's possible that for a few iterations there's no backward and for a few iterations there is backward (for example when doing MCTS / rollouts)

@pietern pietern deleted the export-D15113272 branch April 28, 2019 06:10
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 9b69da2.

zhangguanheng66 pushed a commit to zhangguanheng66/pytorch that referenced this pull request May 6, 2019
Summary:
Pull Request resolved: pytorch#19821

It is possible that not a single parameter is used during an
iteration. If this is the case, the `prepare_for_backward` function
marks all parameters as unused, kicks off reduction of all buckets,
*and* finalizes the reduction.

This is different from the prior implementation where we assumed that
autograd would produce a gradient for at least a single parameter.
We then used the autograd callback mechanism to queue a finalizer
callback. Now, this finalizer may be executed in line.

Reviewed By: mrshenli

Differential Revision: D15113272

fbshipit-source-id: dc91458b569cd8c106ddaeea558464b515683550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants