Add all_gather_into_tensor in place of _all_gather_base by kwen2501 · Pull Request #85686 · pytorch/pytorch

kwen2501 · 2022-09-27T01:11:45Z

Description

This PR renames _all_gather_base to all_gather_into_tensor so that it is clearer in meaning.
The all_gather_into_tensor API differs from the all_gather API in the output it accepts -- a single, large tensor instead of a list of tensors.
This PR also adds deprecation warning to _all_gather_base.

Issue

_all_gather_base was implemented in #33924 to avoid unnecessary flattening. There was previous effort (#82639) to merge _all_gather_base with the existing all_gather API by detecting the parameter type passed in for the output.

There are, however, two "blockers" that make the merge difficult:
(i) The merge leads to backward compatibility break. We would need to change the parameter name tensor_list in all_gather to a general name output that can cover both tensor and tensor list.
(ii) Recently, the all_gather API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the _all_gather_base function, because that would require users to pass in additional tensor boundary information.

In view of the above, we decided to productize _all_gather_base as a separate function, but with a clearer name.

Testing

Added tests:

test_all_gather_into_cat_tensor_cuda -- output form as with torch.cat. For example:

        >>> tensor_in
        tensor([1, 2], device='cuda:0') # Rank 0
        tensor([3, 4], device='cuda:1') # Rank 1
        >>> tensor_out
        tensor([1, 2, 3, 4], device='cuda:0') # Rank 0
        tensor([1, 2, 3, 4], device='cuda:1') # Rank 1

test_all_gather_into_stack_tensor_cuda -- output form as with torch.stack. For example:

        >>> tensor_out2
        tensor([[1, 2],
                [3, 4]], device='cuda:0') # Rank 0
        tensor([[1, 2],
                [3, 4]], device='cuda:1') # Rank 1

The output form is determined by the shape of the output tensor passed by the user, no flag used.

Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang

pytorch-bot · 2022-09-27T01:11:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85686

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 1 Pending

As of commit 5829bab:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

rohan-varma

LGTM, thank you!

torch/distributed/distributed_c10d.py

crcrpar

thank you for the update

crcrpar · 2022-09-27T01:31:47Z

torch/distributed/distributed_c10d.py



-def _all_gather_base(output_tensor, input_tensor, group=None, async_op=False):
+def all_gather_into_tensor(output_tensor, input_tensor, group=None, async_op=False):


let's update the PyTorch web documentation by adding this to https://github.com/pytorch/pytorch/blob/master/docs/source/distributed.rst#collective-functions

Great suggestion. Added now.

crcrpar · 2022-09-27T01:32:54Z

torch/distributed/distributed_c10d.py

+def all_gather_into_tensor(output_tensor, input_tensor, group=None, async_op=False):
    """
-    Single tensor all gather. Gathers a single tensor from all ranks, and puts them in a single output tensor.
+    All ranks gather tensors from all other ranks and put them into a single


I'm not certain but is the first "ranks" or "All ranks" needed?

referenced what https://pytorch.org/docs/stable/distributed.html?highlight=all_gather#torch.distributed.all_gather says

Updated now to "Gather tensors from all ranks and put them into ..."

crcrpar

both updates look nice, the documentation has all_gather_into_tensor https://docs-preview.pytorch.org/85686/distributed.html#torch.distributed.all_gather_into_tensor
:)

kwen2501 · 2022-09-27T20:06:12Z

@pytorchbot merge

pytorchmergebot · 2022-09-27T20:07:47Z

@pytorchbot successfully started a merge job. Check the current status here and land check progress here.
The merge job was triggered with the land checks (-l) flag. If you did not specify this flag yourself, you are likely enrolled in the land checks rollout. This means that your change will be merged once all checks on your PR and the land checks have passed (ETA 4 Hours). If you need to coordinate lands between different changes and cannot risk a land race, please add the ciflow/trunk label to your PR and wait for signal to complete, and then land your changes in proper order. Having trunk, pull, and Lint pre-run on a PR will bypass land checks and the ETA should be immediate. If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

github-actions · 2022-09-27T22:51:10Z

Hey @kwen2501.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@rohan-varma

### Description - This PR renames `_all_gather_base` to `all_gather_into_tensor` so that it is clearer in meaning. - The `all_gather_into_tensor` API differs from the `all_gather` API in the output it accepts -- a single, large tensor instead of a list of tensors. - This PR also adds deprecation warning to `_all_gather_base`. ### Issue `_all_gather_base` was implemented in pytorch#33924 to avoid unnecessary flattening. There was previous effort (pytorch#82639) to merge `_all_gather_base` with the existing `all_gather` API by detecting the parameter type passed in for the output. There are, however, two "blockers" that make the merge difficult: (i) The merge leads to backward compatibility break. We would need to change the parameter name `tensor_list` in `all_gather` to a general name `output` that can cover both tensor and tensor list. (ii) Recently, the `all_gather` API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the `_all_gather_base` function, because that would require users to pass in additional tensor boundary information. In view of the above, we decided to productize `_all_gather_base` as a separate function, but with a clearer name. ### Testing Added tests: - `test_all_gather_into_cat_tensor_cuda` -- output form as with `torch.cat`. For example: ``` >>> tensor_in tensor([1, 2], device='cuda:0') # Rank 0 tensor([3, 4], device='cuda:1') # Rank 1 >>> tensor_out tensor([1, 2, 3, 4], device='cuda:0') # Rank 0 tensor([1, 2, 3, 4], device='cuda:1') # Rank 1 ``` - `test_all_gather_into_stack_tensor_cuda` -- output form as with `torch.stack`. For example: ``` >>> tensor_out2 tensor([[1, 2], [3, 4]], device='cuda:0') # Rank 0 tensor([[1, 2], [3, 4]], device='cuda:1') # Rank 1 ``` The output form is determined by the shape of the output tensor passed by the user, no flag used. Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang Pull Request resolved: pytorch#85686 Approved by: https://github.com/rohan-varma, https://github.com/crcrpar

@rohan-varma

This is a twin PR similar to the one for `all_gather_into_tensor` (#85686). The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686. Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli Pull Request resolved: #85867 Approved by: https://github.com/crcrpar, https://github.com/H-Huang

@rohan-varma

### Description - This PR renames `_all_gather_base` to `all_gather_into_tensor` so that it is clearer in meaning. - The `all_gather_into_tensor` API differs from the `all_gather` API in the output it accepts -- a single, large tensor instead of a list of tensors. - This PR also adds deprecation warning to `_all_gather_base`. ### Issue `_all_gather_base` was implemented in #33924 to avoid unnecessary flattening. There was previous effort (#82639) to merge `_all_gather_base` with the existing `all_gather` API by detecting the parameter type passed in for the output. There are, however, two "blockers" that make the merge difficult: (i) The merge leads to backward compatibility break. We would need to change the parameter name `tensor_list` in `all_gather` to a general name `output` that can cover both tensor and tensor list. (ii) Recently, the `all_gather` API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the `_all_gather_base` function, because that would require users to pass in additional tensor boundary information. In view of the above, we decided to productize `_all_gather_base` as a separate function, but with a clearer name. ### Testing Added tests: - `test_all_gather_into_cat_tensor_cuda` -- output form as with `torch.cat`. For example: ``` >>> tensor_in tensor([1, 2], device='cuda:0') # Rank 0 tensor([3, 4], device='cuda:1') # Rank 1 >>> tensor_out tensor([1, 2, 3, 4], device='cuda:0') # Rank 0 tensor([1, 2, 3, 4], device='cuda:1') # Rank 1 ``` - `test_all_gather_into_stack_tensor_cuda` -- output form as with `torch.stack`. For example: ``` >>> tensor_out2 tensor([[1, 2], [3, 4]], device='cuda:0') # Rank 0 tensor([[1, 2], [3, 4]], device='cuda:1') # Rank 1 ``` The output form is determined by the shape of the output tensor passed by the user, no flag used. Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang Pull Request resolved: #85686 Approved by: https://github.com/rohan-varma, https://github.com/crcrpar

@rohan-varma

This is a twin PR similar to the one for `all_gather_into_tensor` (#85686). The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686. Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli Pull Request resolved: #85867 Approved by: https://github.com/crcrpar, https://github.com/H-Huang

Summary: remove deprecated _all_gather_base and _reduce_scatter_base APIs and use the new replacements - `_all_gather_base` -> `all_gather_into_tensor` - _`reduce_scatter_base` -> `reduce_scatter_tensor` - correct begin size for reduce_scatter_base see pytorch/pytorch#85686 and pytorch/pytorch#85867 for the changes in PyTorch Reviewed By: wesbland Differential Revision: D40703655 fbshipit-source-id: c871c05a8de687a34a124b60857879c983b8cc3d

Add all_gather_into_tensor in place of _all_gather_base

95e9055

kwen2501 requested review from H-Huang, awgu, mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners September 27, 2022 01:11

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Sep 27, 2022

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 27, 2022

rohan-varma approved these changes Sep 27, 2022

View reviewed changes

torch/distributed/distributed_c10d.py Show resolved Hide resolved

crcrpar reviewed Sep 27, 2022

View reviewed changes

Add to documentation

5829bab

crcrpar approved these changes Sep 27, 2022

View reviewed changes

pytorchmergebot added the Merged label Sep 27, 2022

pytorchmergebot closed this in 775a22c Sep 27, 2022

kwen2501 added the topic: new features topic category label Sep 28, 2022

ZainRizvi added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR and removed ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Sep 28, 2022

kwen2501 mentioned this pull request Sep 28, 2022

Add reduce_scatter_tensor in place of _reduce_scatter_base #85867

Closed

kwen2501 mentioned this pull request Oct 1, 2022

all_gather supporting single tensor output #82639

Closed

muellerzr mentioned this pull request Sep 13, 2023

Reduce memory by using all_gather_into_tensor huggingface/accelerate#1968

Merged

5 tasks

github-actions bot deleted the all_gather_into_tensor branch March 29, 2024 01:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add all_gather_into_tensor in place of _all_gather_base#85686

Add all_gather_into_tensor in place of _all_gather_base#85686
kwen2501 wants to merge 2 commits intomasterfrom
all_gather_into_tensor

kwen2501 commented Sep 27, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 27, 2022 •

edited

Loading

Uh oh!

rohan-varma left a comment

Uh oh!

Uh oh!

crcrpar left a comment

Uh oh!

crcrpar Sep 27, 2022

Uh oh!

kwen2501 Sep 27, 2022

Uh oh!

crcrpar Sep 27, 2022

Uh oh!

kwen2501 Sep 27, 2022

Uh oh!

crcrpar left a comment

Uh oh!

kwen2501 commented Sep 27, 2022

Uh oh!

pytorchmergebot commented Sep 27, 2022

Uh oh!

github-actions bot commented Sep 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants



		def _all_gather_base(output_tensor, input_tensor, group=None, async_op=False):
		def all_gather_into_tensor(output_tensor, input_tensor, group=None, async_op=False):

Conversation

kwen2501 commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue

Testing

Uh oh!

pytorch-bot bot commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85686

✅ No Failures, 1 Pending

Uh oh!

rohan-varma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

crcrpar left a comment

Choose a reason for hiding this comment

Uh oh!

crcrpar Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

kwen2501 Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

crcrpar Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

kwen2501 Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

crcrpar left a comment

Choose a reason for hiding this comment

Uh oh!

kwen2501 commented Sep 27, 2022

Uh oh!

pytorchmergebot commented Sep 27, 2022

Uh oh!

github-actions bot commented Sep 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kwen2501 commented Sep 27, 2022 •

edited

Loading

pytorch-bot bot commented Sep 27, 2022 •

edited

Loading