Add all_gather_into_tensor in place of _all_gather_base#85686
Add all_gather_into_tensor in place of _all_gather_base#85686
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85686
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 1 PendingAs of commit 5829bab: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
crcrpar
left a comment
There was a problem hiding this comment.
thank you for the update
|
|
||
|
|
||
| def _all_gather_base(output_tensor, input_tensor, group=None, async_op=False): | ||
| def all_gather_into_tensor(output_tensor, input_tensor, group=None, async_op=False): |
There was a problem hiding this comment.
let's update the PyTorch web documentation by adding this to https://github.com/pytorch/pytorch/blob/master/docs/source/distributed.rst#collective-functions
There was a problem hiding this comment.
Great suggestion. Added now.
| def all_gather_into_tensor(output_tensor, input_tensor, group=None, async_op=False): | ||
| """ | ||
| Single tensor all gather. Gathers a single tensor from all ranks, and puts them in a single output tensor. | ||
| All ranks gather tensors from all other ranks and put them into a single |
There was a problem hiding this comment.
I'm not certain but is the first "ranks" or "All ranks" needed?
referenced what https://pytorch.org/docs/stable/distributed.html?highlight=all_gather#torch.distributed.all_gather says
There was a problem hiding this comment.
Updated now to "Gather tensors from all ranks and put them into ..."
crcrpar
left a comment
There was a problem hiding this comment.
both updates look nice, the documentation has all_gather_into_tensor https://docs-preview.pytorch.org/85686/distributed.html#torch.distributed.all_gather_into_tensor
:)
|
@pytorchbot merge |
|
@pytorchbot successfully started a merge job. Check the current status here and land check progress here. |
|
Hey @kwen2501. |
### Description - This PR renames `_all_gather_base` to `all_gather_into_tensor` so that it is clearer in meaning. - The `all_gather_into_tensor` API differs from the `all_gather` API in the output it accepts -- a single, large tensor instead of a list of tensors. - This PR also adds deprecation warning to `_all_gather_base`. ### Issue `_all_gather_base` was implemented in pytorch#33924 to avoid unnecessary flattening. There was previous effort (pytorch#82639) to merge `_all_gather_base` with the existing `all_gather` API by detecting the parameter type passed in for the output. There are, however, two "blockers" that make the merge difficult: (i) The merge leads to backward compatibility break. We would need to change the parameter name `tensor_list` in `all_gather` to a general name `output` that can cover both tensor and tensor list. (ii) Recently, the `all_gather` API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the `_all_gather_base` function, because that would require users to pass in additional tensor boundary information. In view of the above, we decided to productize `_all_gather_base` as a separate function, but with a clearer name. ### Testing Added tests: - `test_all_gather_into_cat_tensor_cuda` -- output form as with `torch.cat`. For example: ``` >>> tensor_in tensor([1, 2], device='cuda:0') # Rank 0 tensor([3, 4], device='cuda:1') # Rank 1 >>> tensor_out tensor([1, 2, 3, 4], device='cuda:0') # Rank 0 tensor([1, 2, 3, 4], device='cuda:1') # Rank 1 ``` - `test_all_gather_into_stack_tensor_cuda` -- output form as with `torch.stack`. For example: ``` >>> tensor_out2 tensor([[1, 2], [3, 4]], device='cuda:0') # Rank 0 tensor([[1, 2], [3, 4]], device='cuda:1') # Rank 1 ``` The output form is determined by the shape of the output tensor passed by the user, no flag used. Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang Pull Request resolved: pytorch#85686 Approved by: https://github.com/rohan-varma, https://github.com/crcrpar
This is a twin PR similar to the one for `all_gather_into_tensor` (#85686). The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686. Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli Pull Request resolved: #85867 Approved by: https://github.com/crcrpar, https://github.com/H-Huang
### Description - This PR renames `_all_gather_base` to `all_gather_into_tensor` so that it is clearer in meaning. - The `all_gather_into_tensor` API differs from the `all_gather` API in the output it accepts -- a single, large tensor instead of a list of tensors. - This PR also adds deprecation warning to `_all_gather_base`. ### Issue `_all_gather_base` was implemented in #33924 to avoid unnecessary flattening. There was previous effort (#82639) to merge `_all_gather_base` with the existing `all_gather` API by detecting the parameter type passed in for the output. There are, however, two "blockers" that make the merge difficult: (i) The merge leads to backward compatibility break. We would need to change the parameter name `tensor_list` in `all_gather` to a general name `output` that can cover both tensor and tensor list. (ii) Recently, the `all_gather` API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the `_all_gather_base` function, because that would require users to pass in additional tensor boundary information. In view of the above, we decided to productize `_all_gather_base` as a separate function, but with a clearer name. ### Testing Added tests: - `test_all_gather_into_cat_tensor_cuda` -- output form as with `torch.cat`. For example: ``` >>> tensor_in tensor([1, 2], device='cuda:0') # Rank 0 tensor([3, 4], device='cuda:1') # Rank 1 >>> tensor_out tensor([1, 2, 3, 4], device='cuda:0') # Rank 0 tensor([1, 2, 3, 4], device='cuda:1') # Rank 1 ``` - `test_all_gather_into_stack_tensor_cuda` -- output form as with `torch.stack`. For example: ``` >>> tensor_out2 tensor([[1, 2], [3, 4]], device='cuda:0') # Rank 0 tensor([[1, 2], [3, 4]], device='cuda:1') # Rank 1 ``` The output form is determined by the shape of the output tensor passed by the user, no flag used. Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang Pull Request resolved: #85686 Approved by: https://github.com/rohan-varma, https://github.com/crcrpar
This is a twin PR similar to the one for `all_gather_into_tensor` (#85686). The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686. Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli Pull Request resolved: #85867 Approved by: https://github.com/crcrpar, https://github.com/H-Huang
Summary: remove deprecated _all_gather_base and _reduce_scatter_base APIs and use the new replacements - `_all_gather_base` -> `all_gather_into_tensor` - _`reduce_scatter_base` -> `reduce_scatter_tensor` - correct begin size for reduce_scatter_base see pytorch/pytorch#85686 and pytorch/pytorch#85867 for the changes in PyTorch Reviewed By: wesbland Differential Revision: D40703655 fbshipit-source-id: c871c05a8de687a34a124b60857879c983b8cc3d
Description
_all_gather_basetoall_gather_into_tensorso that it is clearer in meaning.all_gather_into_tensorAPI differs from theall_gatherAPI in the output it accepts -- a single, large tensor instead of a list of tensors._all_gather_base.Issue
_all_gather_basewas implemented in #33924 to avoid unnecessary flattening. There was previous effort (#82639) to merge_all_gather_basewith the existingall_gatherAPI by detecting the parameter type passed in for the output.There are, however, two "blockers" that make the merge difficult:
(i) The merge leads to backward compatibility break. We would need to change the parameter name
tensor_listinall_gatherto a general nameoutputthat can cover both tensor and tensor list.(ii) Recently, the
all_gatherAPI has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the_all_gather_basefunction, because that would require users to pass in additional tensor boundary information.In view of the above, we decided to productize
_all_gather_baseas a separate function, but with a clearer name.Testing
Added tests:
test_all_gather_into_cat_tensor_cuda-- output form as withtorch.cat. For example:test_all_gather_into_stack_tensor_cuda-- output form as withtorch.stack. For example:The output form is determined by the shape of the output tensor passed by the user, no flag used.
Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang