Skip to content

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Sep 4, 2025

Stack from ghstack (oldest at bottom):

Summary:
As we have multiple backends, _SymmetricMemory should not be imported together with NVSHMEM related modules

cc @H-Huang @awgu @wanchaol @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162142

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 591b87d with merge base ed77e23 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Sep 4, 2025
@fegin fegin requested review from kwen2501 and ngimel September 4, 2025 06:26
[ghstack-poisoned]
try:
from torch._C._distributed_c10d import _SymmetricMemory
except ImportError:
from torch.distributed._C_stubs import _SymmetricMemory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we always do this, to uncouple import from nvshmem even if it's available, and import nvshmem explicitly when needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not forget nvshmem4py that exists now... sigh...

@fegin
Copy link
Contributor Author

fegin commented Sep 5, 2025

Will land this PR first to fix the bug as the current code breaks async TP if people do not have NVSHMEM.

@fegin
Copy link
Contributor Author

fegin commented Sep 5, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 5, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x b074a82f64144c8ae552ede9fbe2223d94807d9c returned non-zero exit code 1

Auto-merging torch/distributed/_distributed_c10d.py
CONFLICT (content): Merge conflict in torch/distributed/_distributed_c10d.py
error: could not apply b074a82f641... [SymmMEM] Allow to import _SymmetricMemory when NVSHMEM is not available
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

[ghstack-poisoned]
fegin added a commit that referenced this pull request Sep 9, 2025
Summary:
As we have multiple backends, _SymmetricMemory should not be imported together with NVSHMEM related modules

ghstack-source-id: 2dfb6aa
Pull-Request-resolved: #162142
@fegin
Copy link
Contributor Author

fegin commented Sep 9, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ezyang
Copy link
Contributor

ezyang commented Sep 10, 2025

This PR got indirectly reverted from #162568 I need to remember to amend it in

ezyang added a commit to ezyang/pytorch that referenced this pull request Sep 10, 2025
…ibuted modules importable even when backend not built (pytorch#159889)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Differential Revision: D82113620
pytorch-bot bot pushed a commit that referenced this pull request Sep 11, 2025
…modules importable even when backend not built (#159889) (#162594)

Summary:
Pull Request resolved: #162594

Original: D81957844 and D81957923

Also, #162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: dcci, H-Huang

Differential Revision: D82113620
pytorchmergebot pushed a commit that referenced this pull request Sep 12, 2025
…modules importable even when backend not built (#159889) (#162594)

Summary:
Original: D81957844 and D81957923

Also, #162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: #162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
pytorchmergebot pushed a commit that referenced this pull request Sep 12, 2025
…modules importable even when backend not built (#159889) (#162594)

Summary:
Original: D81957844 and D81957923

Also, #162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: #162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
pytorchmergebot pushed a commit that referenced this pull request Sep 22, 2025
…modules importable even when backend not built (#159889) (#162594)

Summary:
Original: D81957844 and D81957923

Also, #162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: #162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…ibuted modules importable even when backend not built (pytorch#159889) (pytorch#162594)

Summary:
Original: D81957844 and D81957923

Also, pytorch#162142 is patched in as well

#buildall

Test Plan:
sandcastle and oss ci

Rollback Plan:

Reviewed By: H-Huang

Pull Request resolved: pytorch#162594
Approved by: https://github.com/H-Huang, https://github.com/dcci
@github-actions github-actions bot deleted the gh/fegin/313/head branch October 11, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/h100-symm-mem ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants