Skip to content

Conversation

@kwen2501
Copy link
Collaborator

@kwen2501 kwen2501 commented Sep 4, 2025

Stack from ghstack (oldest at bottom):

#161321 bumped NVSHMEM version to 3.3.24 for CUDA 13, leaving CUDA 12 with 3.3.20.
This PR bumps the NVSHMEM version to 3.3.24 for CUDA 12 as well.

[ghstack-poisoned]
@kwen2501 kwen2501 requested a review from a team as a code owner September 4, 2025 20:31
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162206

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 194cd4e with merge base 1f0b01d (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kwen2501 added a commit that referenced this pull request Sep 4, 2025
ghstack-source-id: 134aa80
Pull-Request-resolved: #162206
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Sep 4, 2025
@kwen2501
Copy link
Collaborator Author

kwen2501 commented Sep 4, 2025

@tinglvv Does this PR make sense? Can you please review? Thanks!

@tinglvv
Copy link
Collaborator

tinglvv commented Sep 4, 2025

Thanks! Looks good.

@kwen2501
Copy link
Collaborator Author

kwen2501 commented Sep 4, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 4, 2025
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • OSS CI (alband, dagitses, pytorch/pytorch-dev-infra)
  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@Skylion007
Copy link
Collaborator

Perfect, I was planning on doing this anymore:

NVSHMEM_VERSION=3.3.24
Looks like it's already properly updated for the static builds.

@Skylion007
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_12-cuda12_8-test / test

Details for Dev Infra team Raised by workflow job

@Skylion007
Copy link
Collaborator

@atalman We need some new S3 uploads for nvidia wheels

@kwen2501
Copy link
Collaborator Author

kwen2501 commented Sep 9, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x cf1eb07b4d98ccb3890d4205c8c131ed4a021d86 returned non-zero exit code 1

Auto-merging .github/scripts/generate_binary_build_matrix.py
CONFLICT (content): Merge conflict in .github/scripts/generate_binary_build_matrix.py
Auto-merging .github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-manywheel-nightly.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-binary-manywheel-nightly.yml
error: could not apply cf1eb07b4d9... Use same NVSHMEM version across CUDA builds
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

[ghstack-poisoned]
@kwen2501
Copy link
Collaborator Author

kwen2501 commented Sep 9, 2025

There is a land conflict which leads to mismatched yml generation.
Fixed.

@kwen2501
Copy link
Collaborator Author

kwen2501 commented Sep 9, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

@Skylion007
Copy link
Collaborator

@kwen2501 New NVSHMEM just dropped on PYPI that can use IBGDA on more devices. Should we upgrade it across the board?

@Skylion007
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kwen2501
Copy link
Collaborator Author

@Skylion007 Thanks!
Yep, makes sense to upgrade to 3.4 in main.
For 2.9 release, can we keep it 3.3 just to be safe? :)

@Skylion007
Copy link
Collaborator

@Skylion007 Thanks!

Yep, makes sense to upgrade to 3.4 in main.

For 2.9 release, can we keep it 3.3 just to be safe? :)

Let's do it

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
@github-actions github-actions bot deleted the gh/kwen2501/230/head branch October 13, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants