Skip to content

Data Parallel Implementation Improvements #21144

@mrshenli

Description

@mrshenli

After #20910, data parallel implementation still needs the following improvements:

  • Fix C++ data parallel for BN
  • Make sure C++ data parallel work for double backward
  • Move reduce_add and reduce_add_coalesced in torch/cuda/comm.py to C++.
  • Move Broadcast and ReduceAddCoalesced from torch/nn/parallel/_functions.py to C++.
  • Make C++ data parallel use ReduceAddCoalesced.
  • Consolidate C++ and Python module replicate.

Metadata

Metadata

Assignees

Labels

oncall: distributedAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions