[BE]: Update CUDNN for Linux to 9.5.1.17 for 12.6 only #137978

Skylion007 · 2024-10-15T13:24:46Z

Significantly faster, better CUDNN Attention especially on Hopper (FA3 implementation?)
Lots of bugfixes
Better performance
More numerically stable / fixed heuristics
More functionality for SDPA

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

pytorch-bot · 2024-10-15T13:24:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137978

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

❌ 5 New Failures

As of commit 47a29bd with merge base 44afaac ():

NEW FAILURES - The following jobs have failed:

linux-binary-manywheel / manywheel-py3_10-xpu-build / build (gh)
/pytorch/.ci/manywheel/build_cpu.sh: line 24: /opt/intel/oneapi/pytorch-gpu-dev-0.5/oneapi-vars.sh: No such file or directory
linux-binary-manywheel / manywheel-py3_11-xpu-build / build (gh)
/pytorch/.ci/manywheel/build_cpu.sh: line 24: /opt/intel/oneapi/pytorch-gpu-dev-0.5/oneapi-vars.sh: No such file or directory
linux-binary-manywheel / manywheel-py3_12-xpu-build / build (gh)
/pytorch/.ci/manywheel/build_cpu.sh: line 24: /opt/intel/oneapi/pytorch-gpu-dev-0.5/oneapi-vars.sh: No such file or directory
linux-binary-manywheel / manywheel-py3_13-xpu-build / build (gh)
/pytorch/.ci/manywheel/build_cpu.sh: line 24: /opt/intel/oneapi/pytorch-gpu-dev-0.5/oneapi-vars.sh: No such file or directory
linux-binary-manywheel / manywheel-py3_9-xpu-build / build (gh)
/pytorch/.ci/manywheel/build_cpu.sh: line 24: /opt/intel/oneapi/pytorch-gpu-dev-0.5/oneapi-vars.sh: No such file or directory

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2024-10-15T13:31:45Z

@eqy Do we need to update any checks for CUDNN Attention here to take advantage of the new features, or do we just leave it as is for now and fix in a subsequent PR?

eqy · 2024-10-15T18:53:35Z

@eqy Do we need to update any checks for CUDNN Attention here to take advantage of the new features, or do we just leave it as is for now and fix in a subsequent PR?

Let's do it step by step in case something breaks ;)

eqy · 2024-10-15T18:53:58Z

CC @nWEIdia

.ci/docker/common/install_cudnn.sh

drisspg

Looks good, any reason why the update to accuracy?

Skylion007 · 2024-10-16T15:32:37Z

Looks good, any reason why the update to accuracy?

Better heuristics / accuracy as a result of CUDNN would have caused unexpected successes otherwise

Skylion007 · 2024-10-16T15:32:56Z

@drisspg How should I land this? I think I need the CUDNN uploaded to the S3.

drisspg · 2024-10-16T16:18:51Z

Just to confirm, what else is needed

Land: Update to cudnn 9.5.0.50 builder#2014 in parallel which updates windows builds?
And then making sure the wheels are upload to org/whl/nvidia-cudnn-cu12/ ?

@atalman
I do see this file: https://github.com/pytorch/test-infra/blob/main/s3_management/update_dependencies.py
We just need to upload the newest version from here right: https://pypi.org/project/nvidia-cudnn-cu12/#history?

Not sure if this section is still up to date: https://github.com/pytorch/builder/blob/bcd0972459afd130a1c44b7386ae10c69cc1d30b/CUDA_UPGRADE_GUIDE.MD#upgrade-cudnn-version-only

nWEIdia · 2024-10-16T16:32:01Z

I realize test-infra windows AMI changes may also be needed: example https://github.com/pytorch/test-infra/pull/1523/files

A more recent reference: pytorch/test-infra@2364201

malfet · 2024-11-19T16:52:59Z

.ci/docker/common/install_cuda.sh


 NCCL_VERSION=v2.21.5-1
-CUDNN_VERSION=9.1.0.70
+CUDNN_VERSION=9.5.1.17


No that it matters, but PR would have been 3 lines shorter, if global CUDNN_VERSION to be kept at 9.1, but 12.6 one updated to 9.5

@atalman , @seemethere : we need to pay attention to metadata and make sure that poetry is still usable for cuda-12.6 if we keep different versions for different releases (I suspect right now it is not)

@malfet Intentional, I want those lines deleted 1 by 1 in the future. Any new CUDA versions should default to the global default.

@malfet Metadata for different system should be used from CUDA 12.4 - the one we publish to pypi. It should not affect poetry issue.

.ci/docker/common/install_cudnn.sh

nWEIdia

install_cudnn.sh seems to have a if else syntax issue.

nWEIdia

Thanks for the update!

atalman · 2024-11-20T21:53:00Z

@pytorchbot merge

pytorchmergebot · 2024-11-20T21:54:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-20T21:55:27Z

Merge failed

Reason: 5 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-xpu-build / build, linux-binary-manywheel / manywheel-py3_10-xpu-build / build, linux-binary-manywheel / manywheel-py3_13-xpu-build / build, linux-binary-manywheel / manywheel-py3_12-xpu-build / build, linux-binary-manywheel / manywheel-py3_11-xpu-build / build

Details for Dev Infra team

Raised by workflow job

atalman · 2024-11-20T23:09:21Z

@pytorchmergebot merge -f "xpu failures are not related"

pytorchmergebot · 2024-11-20T23:11:25Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@Skylion007

Thanks to #137978 from @Skylion007 which bumps to cuDNN 9.5.1 the broken assumption of dO strides == O strides is fixed Note that there is still the restriction that the innermost stride of the grad output is 1 (this is almost always guaranteed because this condition is required of the input tensors). The main exception would be in test code that does e.g., `.sum().backward()` which yields grad output tensors with strides `[0, 0, 0, 0]`. CC @drisspg Pull Request resolved: #141147 Approved by: https://github.com/drisspg

Fixes pytorch#123649 Use Manylinux 2_28 Docker builds for PyTorch Nightly builds This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024. Information: https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based manylinux_2_28 (AlmaLinux 8 based) Toolchain: GCC 13 Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including: Debian 10+ Ubuntu 18.10+ Fedora 29+ CentOS/RHEL 8+ This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978 Pull Request resolved: pytorch#138732 Approved by: https://github.com/Skylion007, https://github.com/malfet

Fixes pytorch#123649 Use Manylinux 2_28 Docker builds for PyTorch Nightly builds This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024. Information: https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based manylinux_2_28 (AlmaLinux 8 based) Toolchain: GCC 13 Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including: Debian 10+ Ubuntu 18.10+ Fedora 29+ CentOS/RHEL 8+ This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978 Pull Request resolved: pytorch#138732 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn

* Significantly faster, better CUDNN Attention especially on Hopper (FA3 implementation?) * Lots of bugfixes * Better performance * More numerically stable / fixed heuristics * More functionality for SDPA Pull Request resolved: pytorch#137978 Approved by: https://github.com/eqy, https://github.com/drisspg, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/malfet

@Skylion007

Thanks to pytorch#137978 from @Skylion007 which bumps to cuDNN 9.5.1 the broken assumption of dO strides == O strides is fixed Note that there is still the restriction that the innermost stride of the grad output is 1 (this is almost always guaranteed because this condition is required of the input tensors). The main exception would be in test code that does e.g., `.sum().backward()` which yields grad output tensors with strides `[0, 0, 0, 0]`. CC @drisspg Pull Request resolved: pytorch#141147 Approved by: https://github.com/drisspg

Fixes pytorch#123649 Use Manylinux 2_28 Docker builds for PyTorch Nightly builds This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024. Information: https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based manylinux_2_28 (AlmaLinux 8 based) Toolchain: GCC 13 Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including: Debian 10+ Ubuntu 18.10+ Fedora 29+ CentOS/RHEL 8+ This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978 Pull Request resolved: pytorch#138732 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn

Skylion007 requested review from a team and jeffdaily as code owners October 15, 2024 13:24

pytorch-bot bot added the topic: not user facing topic category label Oct 15, 2024

Skylion007 requested review from albanD, atalman, eqy, ezyang and nWEIdia October 15, 2024 13:24

Skylion007 added the ciflow/inductor label Oct 15, 2024

pytorchbot added the open source label Oct 15, 2024

Skylion007 requested a review from drisspg October 15, 2024 13:50

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 15, 2024

Skylion007 force-pushed the skylion007/update-cudnn-9-5-0-50 branch from 25fecc4 to 4b5c106 Compare October 15, 2024 15:47

pytorch-bot bot added the module: dynamo label Oct 15, 2024

Skylion007 requested a review from malfet October 15, 2024 16:47

eqy approved these changes Oct 15, 2024

View reviewed changes

nWEIdia reviewed Oct 15, 2024

View reviewed changes

.ci/docker/common/install_cudnn.sh Show resolved Hide resolved

Skylion007 mentioned this pull request Oct 16, 2024

Update to cudnn 9.5.0.50 pytorch/builder#2014

Merged

drisspg approved these changes Oct 16, 2024

View reviewed changes

nWEIdia approved these changes Oct 16, 2024

View reviewed changes

Skylion007 added ciflow/trunk Trigger trunk jobs on your pull request better-engineering Relatively self-contained tasks for better engineering contributors labels Oct 16, 2024

Skylion007 changed the title ~~[BE]: Update CUDNN for Linux to 9.5.1.17~~ [BE]: Update CUDNN for Linux to 9.5.1.17 for 12.6 only Nov 19, 2024

malfet reviewed Nov 19, 2024

View reviewed changes

nWEIdia reviewed Nov 19, 2024

View reviewed changes

.ci/docker/common/install_cudnn.sh Outdated Show resolved Hide resolved

nWEIdia requested changes Nov 19, 2024

View reviewed changes

Update cudnn only CUDA 12.6

47a29bd

Skylion007 force-pushed the skylion007/update-cudnn-9-5-0-50 branch from a737d3e to 47a29bd Compare November 19, 2024 19:01

Skylion007 requested a review from nWEIdia November 19, 2024 19:01

nWEIdia approved these changes Nov 19, 2024

View reviewed changes

eqy mentioned this pull request Nov 20, 2024

[cuDNN][SDPA] Update cuDNN grad output layout check #141147

Closed

pytorchmergebot added the merging label Nov 20, 2024

pytorchmergebot removed the merging label Nov 20, 2024

pytorchmergebot added the merging label Nov 20, 2024

pytorchmergebot closed this in 765a347 Nov 20, 2024

pytorchmergebot added Merged and removed merging labels Nov 20, 2024

[BE]: Update CUDNN for Linux to 9.5.1.17 for 12.6 only #137978

[BE]: Update CUDNN for Linux to 9.5.1.17 for 12.6 only #137978

Uh oh!

Conversation

Skylion007 commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137978

❗ 1 Active SEVs

❌ 5 New Failures

Uh oh!

Skylion007 commented Oct 15, 2024

Uh oh!

eqy commented Oct 15, 2024

Uh oh!

eqy commented Oct 15, 2024

Uh oh!

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Oct 16, 2024

Uh oh!

Skylion007 commented Oct 16, 2024

Uh oh!

drisspg commented Oct 16, 2024

Uh oh!

nWEIdia commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet Nov 19, 2024

Choose a reason for hiding this comment

Uh oh!

Skylion007 Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atalman Nov 20, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nWEIdia left a comment

Choose a reason for hiding this comment

Uh oh!

nWEIdia left a comment

Choose a reason for hiding this comment

Uh oh!

atalman commented Nov 20, 2024

Uh oh!

pytorchmergebot commented Nov 20, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 20, 2024

Merge failed

Uh oh!

atalman commented Nov 20, 2024

Uh oh!

pytorchmergebot commented Nov 20, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Skylion007 commented Oct 15, 2024 •

edited

Loading

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading

nWEIdia commented Oct 16, 2024 •

edited

Loading

Skylion007 Nov 19, 2024 •

edited

Loading