Add reduce_scatter_tensor in place of _reduce_scatter_base#85867
Add reduce_scatter_tensor in place of _reduce_scatter_base#85867
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85867
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit 945f466: The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
crcrpar
left a comment
There was a problem hiding this comment.
looks good to me, appreciate your swift action
| ) | ||
| return tensor_out | ||
|
|
||
| @sandcastle_skip_if(BACKEND != "nccl", "Only Nccl supports CUDA reduce_scatter_tensor") |
There was a problem hiding this comment.
nit: I think this decorator @requires_nccl() does a similar check
| @sandcastle_skip_if(BACKEND != "nccl", "Only Nccl supports CUDA reduce_scatter_tensor") | |
| @requires_nccl() |
| tensor([0, 2], device='cuda:0') # Rank 0 | ||
| tensor([4, 6], device='cuda:1') # Rank 1 | ||
|
|
||
| .. warning:: |
There was a problem hiding this comment.
Another nit, it looks like warning messages are usually placed before examples: https://docs-preview.pytorch.org/85867/distributed.html?highlight=reduce_scatter_tensor#torch.distributed.reduce_scatter_tensor
|
@pytorchbot merge |
|
@pytorchbot successfully started a merge job. Check the current status here. |
Merge failedReason: 2 additional jobs have failed, first few of them are: trunk ,trunk / linux-focal-rocm5.2-py3.7 / test (default, 1, 2, linux.rocm.gpu) If you believe this is an error, you can use the old behavior with Please reach out to the PyTorch DevX Team with feedback or questions! Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot rebase -m |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchbot rebase -b master |
|
@pytorchbot successfully started a rebase job. Check the current status here |
|
Successfully rebased |
a04af10 to
945f466
Compare
|
Here is the failed check: It does not seem related to my change. And the failure reason is: Also does not seem to be related to my change. @huydhn FYI. @pytorchbot merge -f "The one failure does not seem related to my change" |
|
@pytorchbot successfully started a merge job. Check the current status here. |
|
Hey @kwen2501. |
This is a twin PR similar to the one for `all_gather_into_tensor` (#85686). The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686. Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli Pull Request resolved: #85867 Approved by: https://github.com/crcrpar, https://github.com/H-Huang
Summary: remove deprecated _all_gather_base and _reduce_scatter_base APIs and use the new replacements - `_all_gather_base` -> `all_gather_into_tensor` - _`reduce_scatter_base` -> `reduce_scatter_tensor` - correct begin size for reduce_scatter_base see pytorch/pytorch#85686 and pytorch/pytorch#85867 for the changes in PyTorch Reviewed By: wesbland Differential Revision: D40703655 fbshipit-source-id: c871c05a8de687a34a124b60857879c983b8cc3d
This is a twin PR similar to the one for
all_gather_into_tensor(#85686).The philosophy for renaming
_reduce_scatter_baseinstead of merging it is described in #85686.Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli