[BE]: Update cudnn to 9.10.2.21 #155576

Skylion007 · 2025-06-10T17:22:19Z

Update to CUDNN 9.10.2.21

pytorch-bot · 2025-06-10T17:22:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155576

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures, 3 Unrelated Failures

As of commit 8315757 with merge base 9328a7f ():

NEW FAILURES - The following jobs have failed:

macos-arm64-binary-wheel / wheel-py3_10-cpu-build (gh)
ninja: build stopped: subcommand failed
macos-arm64-binary-wheel / wheel-py3_11-cpu-build (gh)
ninja: build stopped: subcommand failed
macos-arm64-binary-wheel / wheel-py3_12-cpu-build (gh)
ninja: build stopped: subcommand failed
macos-arm64-binary-wheel / wheel-py3_13-cpu-build (gh)
ninja: build stopped: subcommand failed
macos-arm64-binary-wheel / wheel-py3_13t-cpu-build (gh)
ninja: build stopped: subcommand failed
macos-arm64-binary-wheel / wheel-py3_9-cpu-build (gh)
ninja: build stopped: subcommand failed
pull / linux-jammy-cuda12.6-py3.10-gcc11-sm89 / test (default, 1, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float32
pull / linux-jammy-cuda12.6-py3.10-gcc11-sm89 / test (default, 3, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh)
test_expanded_weights.py::TestExpandedWeightModuleCUDA::test_module_nn_GRU_eval_mode_cuda_float32

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

docker-builds / docker-build (pytorch-linux-jammy-aarch64-py3.10-gcc11, linux.arm64.m7g.4xlarge) (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
trunk / before-test / target-determination (gh) (similar failure)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (trunk failure)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

.ci/docker/common/install_cudnn.sh

nWEIdia · 2025-06-10T18:07:46Z

cc @atalman for help uploading the cudnn packages.

nWEIdia · 2025-06-10T20:04:12Z

.ci/docker/common/install_cuda.sh

I wouldn't bump the cudnn version for cu126, as we don't have enough test combination done with cu126 and this 9.10.2.21.
Recommend keeping cuDNN version unchanged for cuda 12.6.
But open to what others prefer.

The other aspect to this (cuda 12.6 + 9.10.2.21) is that as time goes by, the cuda 12.6 is getting tested less in CI due to ongoing efforts to move CI from cuda 12.6 to 12.8.

Important performance updates for SDPA on A100s/H100s here though, and better to support fewer CUDNN versions.

nWEIdia · 2025-06-10T22:25:01Z

Aligned with #154980 cc @tinglvv

Skylion007 · 2025-06-10T17:50:16Z

.ci/docker/common/install_cudnn.sh

Skylion007 · 2025-06-11T14:42:58Z

.ci/docker/common/install_cuda.sh

Important performance updates for SDPA on A100s/H100s here though, and better to support fewer CUDNN versions.

Skylion007 · 2025-06-11T16:31:47Z

@pytorchbot merge -i

pytorchmergebot · 2025-06-11T16:33:54Z

Merge started

Your change will be merged while ignoring the following 7 checks: pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge, unstable), macos-arm64-binary-wheel / wheel-py3_13t-cpu-build, macos-arm64-binary-wheel / wheel-py3_10-cpu-build, macos-arm64-binary-wheel / wheel-py3_12-cpu-build, macos-arm64-binary-wheel / wheel-py3_11-cpu-build, macos-arm64-binary-wheel / wheel-py3_9-cpu-build, macos-arm64-binary-wheel / wheel-py3_13-cpu-build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-11T20:30:15Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2025-06-11T20:31:51Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

tinglvv · 2025-06-11T20:35:09Z

Hi @Skylion007, seems 12.8 was missed in this PR .ci/docker/common/install_cuda.sh. Please followup with a fix to update, thanks.

malfet · 2025-06-11T22:02:00Z

@pytorchbot revert -m "breaks the same test again (I remember there were a version that adjusted tolerances), see https://hud.pytorch.org/hud/pytorch/pytorch/bc3972b80a7abe85036f48b610532fce39ea5097/1?per_page=50&name_filter=gcc11-sm89&mergeEphemeralLF=true" -c nosignal

pytorchmergebot · 2025-06-11T22:03:39Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 2d3615f. Reverted #155576 on behalf of https://github.com/malfet due to breaks the same test again (I remember there were a version that adjusted tolerances), see https://hud.pytorch.org/hud/pytorch/pytorch/bc3972b80a7abe85036f48b610532fce39ea5097/1?per_page=50&name_filter=gcc11-sm89&mergeEphemeralLF=true ([comment](#155576 (comment)))

pytorchmergebot · 2025-06-11T22:03:48Z

@Skylion007 your PR has been successfully reverted.

atalman · 2025-06-11T22:23:18Z

I tuned the test here so landing this PR should fix it: #155234

atalman · 2025-06-12T12:48:25Z

@pytorchmergebot merge -f "fix for the failure is deployed #155234"

pytorchmergebot · 2025-06-12T12:50:02Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Skylion007 · 2025-10-20T17:49:43Z

Apologies here, should have double check this before @atalman remerged the PR. Interesting this should have raised a warning on CUDNN frontend logger, but surprised nobody reported it. I guess because the nightlies are statically linked often?

Skylion007 requested review from albanD, atalman, eqy, ezyang, malfet and nWEIdia June 10, 2025 17:22

Skylion007 requested review from a team and jeffdaily as code owners June 10, 2025 17:22

pytorch-bot bot added the topic: not user facing topic category label Jun 10, 2025

eqy approved these changes Jun 10, 2025

View reviewed changes

Skylion007 mentioned this pull request Jun 10, 2025

[BE]: Update to CUDNN 9.10.1.4 (part 2) #155575

Closed

pytorchbot added the open source label Jun 10, 2025

nWEIdia reviewed Jun 10, 2025

View reviewed changes

.ci/docker/common/install_cudnn.sh Outdated Show resolved Hide resolved

Skylion007 force-pushed the skylion007/update-cudnn-9-10-2-21 branch from 6634853 to 2e411ca Compare June 10, 2025 18:57

atalman approved these changes Jun 10, 2025

View reviewed changes

nWEIdia reviewed Jun 10, 2025

View reviewed changes

nWEIdia added the ciflow/inductor label Jun 10, 2025

tinglvv mentioned this pull request Jun 10, 2025

Add CUDA 12.9.1 x86 nightly binaries #154980

Closed

Skylion007 added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Jun 11, 2025

Skylion007 commented Jun 11, 2025

View reviewed changes

[BE]: Update cudnn to 9.10.2.21

8315757

Skylion007 force-pushed the skylion007/update-cudnn-9-10-2-21 branch from 2e411ca to 8315757 Compare June 11, 2025 15:07

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 11, 2025

pytorchmergebot added the merging label Jun 11, 2025

atalman mentioned this pull request Jun 11, 2025

Windows AMI Update cudnn for cuda 12.6 and 12.8 pytorch/test-infra#6742

Merged

pytorchmergebot closed this in 2d3615f Jun 11, 2025

pytorchmergebot added Merged and removed merging labels Jun 11, 2025

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Jun 11, 2025

pytorchmergebot reopened this Jun 11, 2025

pytorchmergebot added the merging label Jun 12, 2025

pytorchmergebot closed this in 9cced33 Jun 12, 2025

pytorchmergebot removed the merging label Jun 12, 2025

malfet mentioned this pull request Jul 18, 2025

torch 2.8 RC gives 10000 larger output difference in some transformers tests #157274

Closed

mikaylagawarecki mentioned this pull request Jul 22, 2025

per_sample_grads regression in 2.8 RC #158787

Closed

atalman mentioned this pull request Oct 20, 2025

[CUDA][cuDNN][2.9] CUDA 12.8 wheels ship with cuDNN 9.10.2 but CI is installing 9.8.0 #165801

Closed

This was referenced Oct 20, 2025

Update cuDNN 9.10.2 in Manylinux 2.28 Docker files #165913

Closed

[BE]: Update cudnn to 9.14.0.64 #165773

Open

[BE]: Update cudnn to 9.10.2.21 #155576

[BE]: Update cudnn to 9.10.2.21 #155576

Uh oh!

Conversation

Skylion007 commented Jun 10, 2025

Uh oh!

pytorch-bot bot commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155576

❌ 8 New Failures, 3 Unrelated Failures

Uh oh!

Uh oh!

nWEIdia commented Jun 10, 2025

Uh oh!

nWEIdia Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nWEIdia Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

nWEIdia commented Jun 10, 2025

Uh oh!

Skylion007 Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Jun 11, 2025

Uh oh!

pytorchmergebot commented Jun 11, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 11, 2025

Uh oh!

pytorchmergebot commented Jun 11, 2025

Merge started

Uh oh!

tinglvv commented Jun 11, 2025

Uh oh!

malfet commented Jun 11, 2025

Uh oh!

pytorchmergebot commented Jun 11, 2025

Uh oh!

pytorchmergebot commented Jun 11, 2025

Uh oh!

atalman commented Jun 11, 2025

Uh oh!

atalman commented Jun 12, 2025

Uh oh!

pytorchmergebot commented Jun 12, 2025

Merge started

Uh oh!

Skylion007 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

pytorch-bot bot commented Jun 10, 2025 •

edited

Loading

nWEIdia Jun 10, 2025 •

edited

Loading