-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
module: bootcampWe plan to do a full writeup on the issue, and then get someone to do it for onboardingWe plan to do a full writeup on the issue, and then get someone to do it for onboardingmodule: c10dIssues/PRs related to collective communications and process groupsIssues/PRs related to collective communications and process groupsmodule: complexRelated to complex number support in PyTorchRelated to complex number support in PyTorchoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuept_distributed_rampupRamp up tasks for new developers on PT distributedRamp up tasks for new developers on PT distributedtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🚀 Feature
As per title, complex numbers should be supported in torch.distributed.
Motivation
Distribute computing support for complex numbers came up in conversations with people at Argonne National Laboratory and Flatiron Institute. Currently, some of them use Uber's Horovod library for distributed computing. The operations that they commonly use are all_reduce and broadcasting operations.
Pitch
- NCCL only defines floating and integer data types, so one way to get complex working with NCCL is by viewing complex tensors as real using torch.view_as_real(complex_tensor) which returns an equivalent R^2 floating point tensor. Since the view shares the same storage with the original complex tensor, we probably don’t need to convert it back to complex. If needed,
torch.view_as_complexcan be used to convert the real tensor back to complex tensor. - Both torch.view_as_real and torch.view_as_complex are view operations and O(1).
>>>z=torch.randn(4, dtype=torch.cfloat)
>>>torch.view_as_real(z)
tensor([[-0.4226, 0.5459],
[ 0.9385, 1.1723],
[-0.9454, -0.3572],
[ 0.0624, 1.0193]])
- Ops that should work with complex:
- Testing
- https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d.py
- https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/distributed/distributed_test.py
cc @ezyang @anjali411 @dylanbespalko @mruberry @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski
mrshenli, osalpekar and cobalaminpritamdamania87
Metadata
Metadata
Assignees
Labels
module: bootcampWe plan to do a full writeup on the issue, and then get someone to do it for onboardingWe plan to do a full writeup on the issue, and then get someone to do it for onboardingmodule: c10dIssues/PRs related to collective communications and process groupsIssues/PRs related to collective communications and process groupsmodule: complexRelated to complex number support in PyTorchRelated to complex number support in PyTorchoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuept_distributed_rampupRamp up tasks for new developers on PT distributedRamp up tasks for new developers on PT distributedtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module