Data Parallel Implementation Improvements

After #20910, data parallel implementation still needs the following improvements:

- [ ] Fix C++ data parallel for BN
- [ ] Make sure C++ data parallel work for double backward
- [ ] Move `reduce_add` and `reduce_add_coalesced` in `torch/cuda/comm.py` to C++.
- [ ] Move `Broadcast` and `ReduceAddCoalesced` from `torch/nn/parallel/_functions.py` to C++.
- [ ] Make C++ data parallel use `ReduceAddCoalesced`.
- [ ] Consolidate C++ and Python module replicate.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Parallel Implementation Improvements #21144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data Parallel Implementation Improvements #21144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions