Support sparse gradients in DistributedDataParallel #19443

pietern · 2019-04-18T23:42:55Z

Stack:
:black_circle: #19443 Support sparse gradients in DistributedDataParallel 💛
:white_circle: #19146 Add sparse tensor allreduce 💛

This adds support for sparse gradients to the reducer as well as to
the DistributedDataParallel wrapper. Note that an out of band signal
is needed whether or not a dense parameter (e.g. an embedding) is
expected to receive a sparse gradient or not. This information is
passed to the bucket assignment computation routine and the reducer as
a vector of booleans. Every parameter for which we expect a sparse
gradient is assigned its own bucket, as we cannot easily group
multiple unrelated sparse tensors.

Differential Revision: D15007365

Differential Revision: D15007365 Differential Version: 80026012

mrshenli

Shall we add some tests?

torch/nn/parallel/distributed.py

torch/csrc/distributed/c10d/reducer.cpp

Differential Revision: D15007365 Differential Version: 85199090

Differential Revision: D15007365 Differential Version: 85203697

Differential Revision: D15007365 Differential Version: 85204462

pietern · 2019-06-19T13:58:01Z

@mrshenli Added a test case that confirms numerical equivalence between unwrapped and wrapped module.

pietern · 2019-06-19T17:59:29Z

@pytorchbot retest this please

facebook-github-bot · 2019-06-20T15:39:55Z

This pull request has been merged in 365de7b.

ezyang · 2019-06-20T17:02:08Z

This diff stack broke ~all tests on master. Sample failure: https://circleci.com/gh/pytorch/pytorch/2038576?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console

sheenG · 2021-08-17T07:24:25Z

hello, I wonder if the modification is effective in these .whl files ? torch_download

sheenG · 2021-08-17T07:26:07Z

I got torch by pip install torch==1.9.0 and I cannot set sparse=True in nn.Embedding under ddp.

YeDeming · 2022-01-29T16:29:00Z

I cannot set sparse=True in nn.Embedding under ddp on 1.9.1+cu102, too.

V1: Initial commit

16b28b9

Differential Revision: D15007365 Differential Version: 80026012

pietern requested review from apaszke and mrshenli as code owners April 18, 2019 23:42

pietern mentioned this pull request Apr 18, 2019

Add sparse tensor allreduce #19146

Closed

pietern added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 19, 2019

mrshenli reviewed Apr 22, 2019

View reviewed changes

torch/nn/parallel/distributed.py Show resolved Hide resolved

torch/csrc/distributed/c10d/reducer.cpp Show resolved Hide resolved

torch/csrc/distributed/c10d/reducer.cpp Show resolved Hide resolved

torch/csrc/distributed/c10d/reducer.cpp Outdated Show resolved Hide resolved

This was referenced Apr 23, 2019

DistributedDataParallel throws RuntimeError for sparse embeddings #17356

Closed

Support distributed reduce on sparse tensors #17369

Closed

[Feature Request] Support for Sparse CPU Embeddings in Mixed GPU/CPU model NVIDIA/apex#243

Open

V3: Rebase

fa4b1fd

Differential Revision: D15007365 Differential Version: 85199090

pytorchbot added the module: nn Related to torch.nn label Jun 19, 2019

pietern added 2 commits June 19, 2019 06:26

V4: Include test

bbed05f

Differential Revision: D15007365 Differential Version: 85203697

V5: Fix assert

0ef3d6f

Differential Revision: D15007365 Differential Version: 85204462

mrshenli approved these changes Jun 19, 2019

View reviewed changes

facebook-github-bot closed this in 365de7b Jun 20, 2019

facebook-github-bot added the merged label Jun 20, 2019

ezyang deleted the export-D15007365 branch July 19, 2019 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support sparse gradients in DistributedDataParallel #19443

Support sparse gradients in DistributedDataParallel #19443

Uh oh!

pietern commented Apr 18, 2019 •

edited

Loading

Uh oh!

mrshenli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pietern commented Jun 19, 2019

Uh oh!

pietern commented Jun 19, 2019

Uh oh!

facebook-github-bot commented Jun 20, 2019

Uh oh!

ezyang commented Jun 20, 2019

Uh oh!

sheenG commented Aug 17, 2021

Uh oh!

sheenG commented Aug 17, 2021

Uh oh!

YeDeming commented Jan 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Support sparse gradients in DistributedDataParallel #19443

Support sparse gradients in DistributedDataParallel #19443

Uh oh!

Conversation

pietern commented Apr 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pietern commented Jun 19, 2019

Uh oh!

pietern commented Jun 19, 2019

Uh oh!

facebook-github-bot commented Jun 20, 2019

Uh oh!

ezyang commented Jun 20, 2019

Uh oh!

sheenG commented Aug 17, 2021

Uh oh!

sheenG commented Aug 17, 2021

Uh oh!

YeDeming commented Jan 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

pietern commented Apr 18, 2019 •

edited

Loading