Finer grained consistency check in reducer #19901

pietern · 2019-04-29T03:30:30Z

Stack:
:white_circle: #19901 Finer grained consistency check in reducer 💛
:black_circle: #19897 Only call into reducer if torch.is_grad_enabled() 💚

The existing code used expect_autograd_hooks_ as a proxy for the
situation where finalization of the previous iteration is needed. This
is not correct, however, since you may decide to completely ignore the
output of a DDP wrapped module. If this is the case, and no gradients
have been passed to the reducer, it is fine to keep going. This commit
adds a new variable require_finalize_ that tracks whether the
finalization is really needed.

Differential Revision: D15118871

Differential Revision: D15118871 Differential Version: 80866996

Differential Revision: D15118871 Differential Version: 80867080

test/test_c10d.py

torch/csrc/distributed/c10d/reducer.cpp

Differential Revision: D15118871 Differential Version: 80867290

facebook-github-bot · 2019-04-29T07:04:56Z

This pull request has been merged in 8413600.

Summary: Pull Request resolved: #19901 The existing code used `expect_autograd_hooks_` as a proxy for the situation where finalization of the previous iteration is needed. This is not correct, however, since you may decide to completely ignore the output of a DDP wrapped module. If this is the case, and no gradients have been passed to the reducer, it is fine to keep going. This commit adds a new variable `require_finalize_` that tracks whether the finalization is really needed. Reviewed By: mrshenli Differential Revision: D15118871 fbshipit-source-id: 25938eaf1fe13e2940feae1312892b9d3da8a67d

Summary: Pull Request resolved: pytorch#19901 The existing code used `expect_autograd_hooks_` as a proxy for the situation where finalization of the previous iteration is needed. This is not correct, however, since you may decide to completely ignore the output of a DDP wrapped module. If this is the case, and no gradients have been passed to the reducer, it is fine to keep going. This commit adds a new variable `require_finalize_` that tracks whether the finalization is really needed. Reviewed By: mrshenli Differential Revision: D15118871 fbshipit-source-id: 25938eaf1fe13e2940feae1312892b9d3da8a67d

V1: Initial commit

aa35605

Differential Revision: D15118871 Differential Version: 80866996

pietern requested review from apaszke and mrshenli as code owners April 29, 2019 03:30

pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Apr 29, 2019

pietern mentioned this pull request Apr 29, 2019

Only call into reducer if torch.is_grad_enabled() #19897

Closed

V2: Fix

3f94343

Differential Revision: D15118871 Differential Version: 80867080

mrshenli reviewed Apr 29, 2019

View reviewed changes

test/test_c10d.py Show resolved Hide resolved

torch/csrc/distributed/c10d/reducer.cpp Show resolved Hide resolved

V3: Enhance test case

f960c84

Differential Revision: D15118871 Differential Version: 80867290

mrshenli approved these changes Apr 29, 2019

View reviewed changes

facebook-github-bot closed this in 8413600 Apr 29, 2019

facebook-github-bot added the merged label Apr 29, 2019

pietern deleted the export-D15118871 branch April 29, 2019 16:02

pietern mentioned this pull request Apr 29, 2019

Using DistributedDataParallel through NCCL throws RuntimeError #19840

Closed

philtrade mentioned this pull request Feb 17, 2020

DistributedDataParallel trainer interferes with GANTrainer.switch() fastai/fastai#2496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finer grained consistency check in reducer #19901

Finer grained consistency check in reducer #19901

Uh oh!

pietern commented Apr 29, 2019 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Apr 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Finer grained consistency check in reducer #19901

Finer grained consistency check in reducer #19901

Uh oh!

Conversation

pietern commented Apr 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Apr 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pietern commented Apr 29, 2019 •

edited

Loading