Skip to content

Conversation

@pietern
Copy link
Contributor

@pietern pietern commented Apr 18, 2019

Stack:
    :black_circle:  #19443 Support sparse gradients in DistributedDataParallel  💛
    :white_circle:  #19146 Add sparse tensor allreduce  💛

This adds support for sparse gradients to the reducer as well as to
the DistributedDataParallel wrapper. Note that an out of band signal
is needed whether or not a dense parameter (e.g. an embedding) is
expected to receive a sparse gradient or not. This information is
passed to the bucket assignment computation routine and the reducer as
a vector of booleans. Every parameter for which we expect a sparse
gradient is assigned its own bucket, as we cannot easily group
multiple unrelated sparse tensors.

Differential Revision: D15007365

Differential Revision: D15007365
Differential Version: 80026012
@pietern pietern added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 19, 2019
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add some tests?

Differential Revision: D15007365
Differential Version: 85199090
@pytorchbot pytorchbot added the module: nn Related to torch.nn label Jun 19, 2019
pietern added 2 commits June 19, 2019 06:26
Differential Revision: D15007365
Differential Version: 85203697
Differential Revision: D15007365
Differential Version: 85204462
@pietern
Copy link
Contributor Author

pietern commented Jun 19, 2019

@mrshenli Added a test case that confirms numerical equivalence between unwrapped and wrapped module.

@pietern
Copy link
Contributor Author

pietern commented Jun 19, 2019

@pytorchbot retest this please

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 365de7b.

@ezyang
Copy link
Contributor

ezyang commented Jun 20, 2019

@ezyang ezyang deleted the export-D15007365 branch July 19, 2019 15:48
@sheenG
Copy link

sheenG commented Aug 17, 2021

hello, I wonder if the modification is effective in these .whl files ? torch_download

@sheenG
Copy link

sheenG commented Aug 17, 2021

I got torch by pip install torch==1.9.0 and I cannot set sparse=True in nn.Embedding under ddp.

@YeDeming
Copy link

I cannot set sparse=True in nn.Embedding under ddp on 1.9.1+cu102, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: nn Related to torch.nn oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants