[cuDNN][SDPA] Check-in test for #166211 #166570

eqy · 2025-10-29T20:46:29Z

Repros without the neeed for specific tensor data.
Should be passing with cuDNN frontend 1.15.0 which current main has.

cc @csarofeen @ptrblck @xwang233

pytorch-bot · 2025-10-29T20:46:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166570

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm failures during provisioning step due to network issues

✅ No Failures

As of commit a9a6d97 with merge base fc540ce ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2025-10-29T20:46:39Z

CC @malfet @atalman

atalman

lgtm. Thank you

malfet · 2025-10-29T21:12:03Z

test/test_transformers.py

+    @skipIfRocm
+    @unittest.skipIf(not PLATFORM_SUPPORTS_CUDNN_ATTENTION, "cudnn Attention is not supported on this system")


Why do we want to skip the test on those platforms?

This is explicitly testing for a cuDNN bug: with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.CUDNN_ATTENTION):

Wish we had a strict CUDAOnly for this then.

test/test_transformers.py

Co-authored-by: Nikita Shulga <[email protected]>

drisspg · 2025-10-30T02:23:42Z

test/test_transformers.py

+            k.requires_grad = True
+            v.requires_grad = True
+
+            grad_attn_output = torch.randn(*shape, device='cuda', dtype=torch.bfloat16) * scale


nit: use torch.autograd.grad to reuse the exact input tensors

drisspg · 2025-10-30T02:24:05Z

test/test_transformers.py

+                attn_output.backward(grad_attn_output)
+
+            for x, x_ref in zip((q, k, v), (q_ref, k_ref, v_ref)):
+                self.assertEqual(x.grad, x_ref.grad, atol=10.0, rtol=0.05)


can you add a note note on the tolerances

"i made them up"
well, scaling things up made the tolerances hard to adjust here and we're just checking for NaN, will add that in comment

maybe we should just check for NaN?

test/test_transformers.py

Co-authored-by: Aaron Gokaslan <[email protected]>

eqy · 2025-11-03T17:04:12Z

@pytorchmergebot merge

pytorchmergebot · 2025-11-03T17:06:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Lucaskabela · 2025-11-03T23:28:17Z

Checking for 2.9.1 tracking - does this close #166211?

Lucaskabela · 2025-11-03T23:28:31Z

@eqy

eqy · 2025-11-03T23:37:34Z

No, this is just a test, we need a cudnn front end submodule bump as otherwise the 2.9 brand will fail this test @Lucaskabela

Lucaskabela · 2025-11-03T23:50:06Z

Got it - @eqy @malfet do we have a PR bumping this we can cherrypick into the 2.9 branch already?

eqy · 2025-11-04T00:02:02Z

The upgrade to 1.15.0 frontend resolves this, but that might be too invasive. I'll open a PR proposing a frontend 1.12.2 bump.

eqy · 2025-11-04T00:12:59Z

Feel free to cherrypick this PR btw, as it adds a test that we would want in 2.9.1 after #166912 is merged

Repros without the neeed for specific tensor data. Should be passing with cuDNN frontend 1.15.0 which current `main` has. Pull Request resolved: #166570 Approved by: https://github.com/atalman Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Aaron Gokaslan <[email protected]>

atalman · 2025-11-05T20:20:04Z

@pytorchbot cherry-pick --onto release/2.9 --fixes "cudnn-frontend test" -c regression

Repros without the neeed for specific tensor data. Should be passing with cuDNN frontend 1.15.0 which current `main` has. Pull Request resolved: #166570 Approved by: https://github.com/atalman Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Aaron Gokaslan <[email protected]> (cherry picked from commit 71a2e93)

pytorchbot · 2025-11-05T20:26:54Z

Cherry picking #166570

The cherry pick PR is at #167121 and it is linked with issue cudnn-frontend test. The following tracker issues are updated:

[v2.9.1] Release Tracker #166758 (comment)

Details for Dev Infra team

Raised by workflow job

[cuDNN][SDPA] Check-in test for #166211 (#166570) Repros without the neeed for specific tensor data. Should be passing with cuDNN frontend 1.15.0 which current `main` has. Pull Request resolved: #166570 Approved by: https://github.com/atalman (cherry picked from commit 71a2e93) Co-authored-by: Eddie Yan <[email protected]> Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Aaron Gokaslan <[email protected]>

check in

1fc0130

eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support open source topic: not user facing topic category module: sdpa All things related to torch.nn.functional.scaled_dot_product_attentiion labels Oct 29, 2025

atalman approved these changes Oct 29, 2025

View reviewed changes

malfet reviewed Oct 29, 2025

View reviewed changes

eqy and others added 2 commits October 29, 2025 18:07

Update test/test_transformers.py

5b0af25

Co-authored-by: Nikita Shulga <[email protected]>

Update test/test_transformers.py

6804f64

Co-authored-by: Nikita Shulga <[email protected]>

drisspg reviewed Oct 30, 2025

View reviewed changes

Skylion007 reviewed Oct 30, 2025

View reviewed changes

test/test_transformers.py Outdated Show resolved Hide resolved

eqy and others added 2 commits October 30, 2025 10:42

Update test/test_transformers.py

6805717

Co-authored-by: Aaron Gokaslan <[email protected]>

address review comments

a9a6d97

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 3, 2025

pytorchmergebot added the merging label Nov 3, 2025

pytorchmergebot added the Merged label Nov 3, 2025

pytorchmergebot closed this in 71a2e93 Nov 3, 2025

pytorchmergebot removed the merging label Nov 3, 2025

pytorchbot mentioned this pull request Nov 5, 2025

[v2.9.1] Release Tracker #166758

Closed

		@skipIfRocm
		@unittest.skipIf(not PLATFORM_SUPPORTS_CUDNN_ATTENTION, "cudnn Attention is not supported on this system")

[cuDNN][SDPA] Check-in test for #166211 #166570

[cuDNN][SDPA] Check-in test for #166211 #166570

Uh oh!

Conversation

eqy commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166570

❗ 1 Active SEVs

✅ No Failures

Uh oh!

eqy commented Oct 29, 2025

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

malfet Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

eqy Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

drisspg Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

drisspg Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

eqy Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

drisspg Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eqy commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Merge started

Uh oh!

Lucaskabela commented Nov 3, 2025

Uh oh!

Lucaskabela commented Nov 3, 2025

Uh oh!

eqy commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lucaskabela commented Nov 3, 2025

Uh oh!

eqy commented Nov 4, 2025

Uh oh!

eqy commented Nov 4, 2025

Uh oh!

atalman commented Nov 5, 2025

Uh oh!

pytorchbot commented Nov 5, 2025

Cherry picking #166570

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

eqy commented Oct 29, 2025 •

edited

Loading

pytorch-bot bot commented Oct 29, 2025 •

edited

Loading

eqy commented Nov 3, 2025 •

edited

Loading