Skip to content

Conversation

@tinglvv
Copy link
Collaborator

@tinglvv tinglvv commented Oct 14, 2025

13.0.U2 is posted, adding to nightlies
Why we want to upgrade: CUDA 13.0.U2 included a new release from cuBLAS that

  1. Enabled opt-in fixed-point emulation for FP64 matmuls (D/ZGEMM) which improves performance and power-efficiency.
  2. Improved performance on NVIDIA DGX Spark for FP16/BF16 and FP8 GEMMs.
  3. adds BF16x9 FP32 emulation support for SYRK and HERK routines.
    Reference: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-release-13-0-update-2

cc @atalman @malfet @ptrblck @nWEIdia

@tinglvv tinglvv requested review from a team and jeffdaily as code owners October 14, 2025 21:12
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 14, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165470

Note: Links to docs will display an error until the docs builds have been completed.

❌ 24 New Failures, 1 Cancelled Job, 4 Unrelated Failures

As of commit 269473d with merge base 23c55c5 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@tinglvv tinglvv changed the title Upgrade to CUDA 13.0.2 for nightly binaries [CD] Upgrade to CUDA 13.0.2 for nightly binaries Oct 14, 2025
@tinglvv tinglvv mentioned this pull request Oct 14, 2025
15 tasks
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 17, 2025
@atalman
Copy link
Contributor

atalman commented Oct 17, 2025

@tinglvv
Copy link
Collaborator Author

tinglvv commented Oct 21, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased 13.0-u2-nightly onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout 13.0-u2-nightly && git pull --rebase)

@tinglvv tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Oct 21, 2025
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@atalman
Copy link
Contributor

atalman commented Oct 27, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 27, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@tinglvv
Copy link
Collaborator Author

tinglvv commented Oct 28, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented Oct 28, 2025

@pytorchmergebot revert -m "Sorry reverting for now, to restore trunk health" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@tinglvv your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Oct 28, 2025
@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Oct 28, 2025
@atalman
Copy link
Contributor

atalman commented Oct 28, 2025

@pytorchbot merge -f "docker build issue is resolved, merging this"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented Nov 3, 2025

@pytorchmergebot revert -m "reverting broke Docker nightly builds" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

Reverting PR 165470 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit 544b443ea1d1a9b19e65f981168a01cb87a2d333 returned non-zero exit code 1

Auto-merging .ci/docker/common/install_cuda.sh
Auto-merging .github/scripts/generate_binary_build_matrix.py
CONFLICT (content): Merge conflict in .github/scripts/generate_binary_build_matrix.py
Auto-merging .github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-manywheel-nightly.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-binary-manywheel-nightly.yml
error: could not revert 544b443ea1d... [CD] Upgrade to CUDA 13.0.2 for nightly binaries (#165470)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source Reverted topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants