-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[SymmetricMemory] support specifying group_name at rendezvous time #139529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139529
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 1911f8b with merge base b86b534 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ous time" Before this PR, users need to call `empty_strided_p2p` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device="cuda", group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device="cuda") symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationale for this change: - This allows the same allocation to participate in multiple symmetric memory formations under different groups - More natural UX cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
…ous time" Before this PR, users need to call `empty_strided_p2p` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device="cuda", group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device="cuda") symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationale for this change: - This allows the same allocation to participate in multiple symmetric memory formations under different groups - More natural UX cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
|
@pytorchbot merge |
|
This PR needs to be approved by an authorized maintainer before merge. |
…Is: empty() and rendezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs: empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…rendezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs: empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…ous time" Before this PR, users need to call `empty_strided_p2p()` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device, group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device) symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationales for this change: - This allows the same allocation to establish symmetric memory under different groups - Specifying `group_name` at rendezvous time instead of allocation time is a more natural UX cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
…Is empty() and rendezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…endezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
lw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a tad lost with the indirections between (Cuda)SymmetricMemory, (Cuda)SymmetricMemoryAllocator, AllocationRef and Block. I assume some of it is due to the need to be somewhat device-agnostic in some parts of the code? And, now, to the need for multiple process groups to map to the same tensor?
Apart from that, however, the code seems to make sense to me!
| dtype: torch.dtype, | ||
| device: torch.device, | ||
| group_name: str, | ||
| group_name: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to keep backwards compatibility? SymmetricMemory is a prototype feature, right?
Then again, I'm probably chill with this idea only because I haven't yet landed my code that makes use of SymmetricMemory :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to keep backwards compatibility?
No
SymmetricMemory is a prototype feature, right?
You are right. And this function is a private pybind of a prototype feature. I just wanted make the change less disruptive to early adopters :)
| #if !defined(USE_ROCM) && defined(PYTORCH_C10_DRIVER_API_SUPPORTED) | ||
| auto driver_api = c10::cuda::DriverAPI::get(); | ||
| c10::cuda::CUDAGuard guard(device_idx); | ||
| device_idx = static_cast<int>(guard.current_device().index()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to move an existing guard up so that future changes relying on cudaSetDevice become more foolproof.
Yes.
I'll add some proper documentation to these. |
…ous time" Before this PR, users need to call `empty_strided_p2p()` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device, group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device) symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationales for this change: - This allows the same allocation to establish symmetric memory under different groups - Specifying `group_name` at rendezvous time instead of allocation time is a more natural UX cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
…Is empty() and rendezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…endezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…ous time" Before this PR, users need to call `empty_strided_p2p()` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device, group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device) symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationales for this change: - This allows the same allocation to establish symmetric memory under different groups - Specifying `group_name` at rendezvous time instead of allocation time is a more natural UX cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
…Is empty() and rendezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…endezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…ous time" Before this PR, users need to call `empty_strided_p2p()` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device, group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device) symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationales for this change: - This allows the same allocation to establish symmetric memory under different groups - Specifying `group_name` at rendezvous time instead of allocation time is a more natural UX cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
…Is empty() and rendezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…endezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…#139677) Previously `SymmetricMemory` only had private pybind APIs: ```python from torch.distributed._symmetric_memory import _SymmetricMemory t = _SymmetricMemory.empty_strided_p2p( size=(64,), stride=(1,), dtype=torch.float32, device=device, ) symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name) ``` This PR introduces user-facing APIs empty() and rendezvous(): ```python import torch.distributed._symmetric_memory as symm_mem t = symm_mem.empty(64, device="cuda") symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name) ``` Notable differences compared to the pybind APIs: - `empty()` now resembles `torch.empty()`: - shape can either be an integer sequence or pack - no need to/can't specify stride anymore - device can either be `torch.device` or string - `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API. - Currently, the pybind API still support specifying `group_name` at rendezvous time. This PR does not change the behavior of the pybind APIs. Pull Request resolved: #139677 Approved by: https://github.com/lw ghstack dependencies: #139529
…pytorch#139677) Previously `SymmetricMemory` only had private pybind APIs: ```python from torch.distributed._symmetric_memory import _SymmetricMemory t = _SymmetricMemory.empty_strided_p2p( size=(64,), stride=(1,), dtype=torch.float32, device=device, ) symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name) ``` This PR introduces user-facing APIs empty() and rendezvous(): ```python import torch.distributed._symmetric_memory as symm_mem t = symm_mem.empty(64, device="cuda") symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name) ``` Notable differences compared to the pybind APIs: - `empty()` now resembles `torch.empty()`: - shape can either be an integer sequence or pack - no need to/can't specify stride anymore - device can either be `torch.device` or string - `group_name` needs to be specified at rendezvous time as opposed to allocation time. See pytorch#139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API. - Currently, the pybind API still support specifying `group_name` at rendezvous time. This PR does not change the behavior of the pybind APIs. Pull Request resolved: pytorch#139677 Approved by: https://github.com/lw ghstack dependencies: pytorch#139529
…ytorch#139529) Before this PR, users need to call `empty_strided_p2p()` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device, group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device) symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationales for this change: - This allows the same allocation to establish symmetric memory under different groups - Specifying `group_name` at rendezvous time instead of allocation time is a more natural UX Pull Request resolved: pytorch#139529 Approved by: https://github.com/lw
…pytorch#139677) Previously `SymmetricMemory` only had private pybind APIs: ```python from torch.distributed._symmetric_memory import _SymmetricMemory t = _SymmetricMemory.empty_strided_p2p( size=(64,), stride=(1,), dtype=torch.float32, device=device, ) symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name) ``` This PR introduces user-facing APIs empty() and rendezvous(): ```python import torch.distributed._symmetric_memory as symm_mem t = symm_mem.empty(64, device="cuda") symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name) ``` Notable differences compared to the pybind APIs: - `empty()` now resembles `torch.empty()`: - shape can either be an integer sequence or pack - no need to/can't specify stride anymore - device can either be `torch.device` or string - `group_name` needs to be specified at rendezvous time as opposed to allocation time. See pytorch#139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API. - Currently, the pybind API still support specifying `group_name` at rendezvous time. This PR does not change the behavior of the pybind APIs. Pull Request resolved: pytorch#139677 Approved by: https://github.com/lw ghstack dependencies: pytorch#139529
…ytorch#139529) Before this PR, users need to call `empty_strided_p2p()` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device, group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device) symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationales for this change: - This allows the same allocation to establish symmetric memory under different groups - Specifying `group_name` at rendezvous time instead of allocation time is a more natural UX Pull Request resolved: pytorch#139529 Approved by: https://github.com/lw
…pytorch#139677) Previously `SymmetricMemory` only had private pybind APIs: ```python from torch.distributed._symmetric_memory import _SymmetricMemory t = _SymmetricMemory.empty_strided_p2p( size=(64,), stride=(1,), dtype=torch.float32, device=device, ) symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name) ``` This PR introduces user-facing APIs empty() and rendezvous(): ```python import torch.distributed._symmetric_memory as symm_mem t = symm_mem.empty(64, device="cuda") symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name) ``` Notable differences compared to the pybind APIs: - `empty()` now resembles `torch.empty()`: - shape can either be an integer sequence or pack - no need to/can't specify stride anymore - device can either be `torch.device` or string - `group_name` needs to be specified at rendezvous time as opposed to allocation time. See pytorch#139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API. - Currently, the pybind API still support specifying `group_name` at rendezvous time. This PR does not change the behavior of the pybind APIs. Pull Request resolved: pytorch#139677 Approved by: https://github.com/lw ghstack dependencies: pytorch#139529
…ytorch#139529) Before this PR, users need to call `empty_strided_p2p()` with a `group_name`: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device, group_name="0") symm_mem = _SymmetricMemory.rendezvous(tensor) ``` Users can now omit `group_name` at allocation time and specify it later at rendezvous time: ```python tensor = _SymmetricMemory.empty_strided_p2p((1024,), (1,), device=device) symm_mem = _SymmetricMemory.rendezvous(tensor, group_name="0") ``` Rationales for this change: - This allows the same allocation to establish symmetric memory under different groups - Specifying `group_name` at rendezvous time instead of allocation time is a more natural UX Pull Request resolved: pytorch#139529 Approved by: https://github.com/lw
…pytorch#139677) Previously `SymmetricMemory` only had private pybind APIs: ```python from torch.distributed._symmetric_memory import _SymmetricMemory t = _SymmetricMemory.empty_strided_p2p( size=(64,), stride=(1,), dtype=torch.float32, device=device, ) symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name) ``` This PR introduces user-facing APIs empty() and rendezvous(): ```python import torch.distributed._symmetric_memory as symm_mem t = symm_mem.empty(64, device="cuda") symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name) ``` Notable differences compared to the pybind APIs: - `empty()` now resembles `torch.empty()`: - shape can either be an integer sequence or pack - no need to/can't specify stride anymore - device can either be `torch.device` or string - `group_name` needs to be specified at rendezvous time as opposed to allocation time. See pytorch#139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API. - Currently, the pybind API still support specifying `group_name` at rendezvous time. This PR does not change the behavior of the pybind APIs. Pull Request resolved: pytorch#139677 Approved by: https://github.com/lw ghstack dependencies: pytorch#139529
Stack from ghstack (oldest at bottom):
Before this PR, users need to call
empty_strided_p2p()with agroup_name:Users can now omit
group_nameat allocation time and specify it later at rendezvous time:Rationales for this change:
group_nameat rendezvous time instead of allocation time is a more natural UXcc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o