[SymmMem] Use global pe for put and get #162394

kwen2501 · 2025-09-08T16:58:55Z

Stack from ghstack (oldest at bottom):

NVSHMEM put/get APIs take global PE instead of local counterpart. So we'd need to do a translation within the kernel.

Also added a sub-group test for dispatch and combine mimic'ing the Expert Parallel cases.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim

[ghstack-poisoned]

pytorch-bot · 2025-09-08T16:58:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162394

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bebd4cf with merge base 7a83cf4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 23b3c11 Pull-Request-resolved: #162394

torch/csrc/distributed/c10d/symm_mem/nvshmem_extension.cu

[ghstack-poisoned]

ghstack-source-id: e25d386 Pull-Request-resolved: #162394

[ghstack-poisoned]

ghstack-source-id: 857d227 Pull-Request-resolved: #162394

kwen2501 · 2025-09-09T01:23:02Z

@pytorchbot merge

pytorchmergebot · 2025-09-09T01:25:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

NVSHMEM put/get APIs take global PE instead of local counterpart. So we'd need to do a translation within the kernel. Also added a sub-group test for dispatch and combine mimic'ing the Expert Parallel cases. Pull Request resolved: pytorch#162394 Approved by: https://github.com/ngimel, https://github.com/fegin ghstack dependencies: pytorch#162320

Update

9edd327

[ghstack-poisoned]

pytorch-bot bot added ciflow/h100-symm-mem oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Sep 8, 2025

kwen2501 added a commit that referenced this pull request Sep 8, 2025

[SymmMem] Use global pe for put and get

f6d8250

ghstack-source-id: 23b3c11 Pull-Request-resolved: #162394

kwen2501 mentioned this pull request Sep 7, 2025

[SymmMem] Add team pool to hold duplicated teams for the same rank group #162320

Closed

kwen2501 requested review from fduwjj, fegin and ngimel September 8, 2025 17:06

Skylion007 reviewed Sep 8, 2025

View reviewed changes

torch/csrc/distributed/c10d/symm_mem/nvshmem_extension.cu Outdated Show resolved Hide resolved

Update

0e878f1

[ghstack-poisoned]

kwen2501 added a commit that referenced this pull request Sep 8, 2025

[SymmMem] Use global pe for put and get

4e7303b

ghstack-source-id: e25d386 Pull-Request-resolved: #162394

Update

bebd4cf

[ghstack-poisoned]

kwen2501 added a commit that referenced this pull request Sep 8, 2025

[SymmMem] Use global pe for put and get

ce64c99

ghstack-source-id: 857d227 Pull-Request-resolved: #162394

kwen2501 added the topic: bug fixes topic category label Sep 8, 2025

ngimel approved these changes Sep 8, 2025

View reviewed changes

fegin approved these changes Sep 8, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 9, 2025

pytorchmergebot added the merging label Sep 9, 2025

pytorchmergebot added the Merged label Sep 9, 2025

pytorchmergebot closed this in 065c446 Sep 9, 2025

pytorchmergebot removed the merging label Sep 9, 2025

github-actions bot deleted the gh/kwen2501/233/head branch October 10, 2025 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SymmMem] Use global pe for put and get #162394

[SymmMem] Use global pe for put and get #162394

Uh oh!

kwen2501 commented Sep 8, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

kwen2501 commented Sep 9, 2025

Uh oh!

pytorchmergebot commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SymmMem] Use global pe for put and get #162394

[SymmMem] Use global pe for put and get #162394

Uh oh!

Conversation

kwen2501 commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162394

✅ No Failures

Uh oh!

Uh oh!

kwen2501 commented Sep 9, 2025

Uh oh!

pytorchmergebot commented Sep 9, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kwen2501 commented Sep 8, 2025 •

edited

Loading

pytorch-bot bot commented Sep 8, 2025 •

edited

Loading