-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[BE]: Update CUDNN for Linux to 9.5.1.17 for 12.6 only #137978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BE]: Update CUDNN for Linux to 9.5.1.17 for 12.6 only #137978
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137978
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 5 New FailuresAs of commit 47a29bd with merge base 44afaac ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@eqy Do we need to update any checks for CUDNN Attention here to take advantage of the new features, or do we just leave it as is for now and fix in a subsequent PR? |
25fecc4 to
4b5c106
Compare
Let's do it step by step in case something breaks ;) |
|
CC @nWEIdia |
drisspg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, any reason why the update to accuracy?
Better heuristics / accuracy as a result of CUDNN would have caused unexpected successes otherwise |
|
@drisspg How should I land this? I think I need the CUDNN uploaded to the S3. |
|
Just to confirm, what else is needed
@atalman Not sure if this section is still up to date: https://github.com/pytorch/builder/blob/bcd0972459afd130a1c44b7386ae10c69cc1d30b/CUDA_UPGRADE_GUIDE.MD#upgrade-cudnn-version-only |
|
I realize test-infra windows AMI changes may also be needed: example https://github.com/pytorch/test-infra/pull/1523/files A more recent reference: pytorch/test-infra@2364201 |
|
|
||
| NCCL_VERSION=v2.21.5-1 | ||
| CUDNN_VERSION=9.1.0.70 | ||
| CUDNN_VERSION=9.5.1.17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that it matters, but PR would have been 3 lines shorter, if global CUDNN_VERSION to be kept at 9.1, but 12.6 one updated to 9.5
@atalman , @seemethere : we need to pay attention to metadata and make sure that poetry is still usable for cuda-12.6 if we keep different versions for different releases (I suspect right now it is not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet Intentional, I want those lines deleted 1 by 1 in the future. Any new CUDA versions should default to the global default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet Metadata for different system should be used from CUDA 12.4 - the one we publish to pypi. It should not affect poetry issue.
nWEIdia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
install_cudnn.sh seems to have a if else syntax issue.
a737d3e to
47a29bd
Compare
nWEIdia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 5 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-xpu-build / build, linux-binary-manywheel / manywheel-py3_10-xpu-build / build, linux-binary-manywheel / manywheel-py3_13-xpu-build / build, linux-binary-manywheel / manywheel-py3_12-xpu-build / build, linux-binary-manywheel / manywheel-py3_11-xpu-build / build Details for Dev Infra teamRaised by workflow job |
|
@pytorchmergebot merge -f "xpu failures are not related" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Thanks to #137978 from @Skylion007 which bumps to cuDNN 9.5.1 the broken assumption of dO strides == O strides is fixed Note that there is still the restriction that the innermost stride of the grad output is 1 (this is almost always guaranteed because this condition is required of the input tensors). The main exception would be in test code that does e.g., `.sum().backward()` which yields grad output tensors with strides `[0, 0, 0, 0]`. CC @drisspg Pull Request resolved: #141147 Approved by: https://github.com/drisspg
Fixes pytorch#123649 Use Manylinux 2_28 Docker builds for PyTorch Nightly builds This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024. Information: https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based manylinux_2_28 (AlmaLinux 8 based) Toolchain: GCC 13 Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including: Debian 10+ Ubuntu 18.10+ Fedora 29+ CentOS/RHEL 8+ This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978 Pull Request resolved: pytorch#138732 Approved by: https://github.com/Skylion007, https://github.com/malfet
Fixes pytorch#123649 Use Manylinux 2_28 Docker builds for PyTorch Nightly builds This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024. Information: https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based manylinux_2_28 (AlmaLinux 8 based) Toolchain: GCC 13 Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including: Debian 10+ Ubuntu 18.10+ Fedora 29+ CentOS/RHEL 8+ This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978 Pull Request resolved: pytorch#138732 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
* Significantly faster, better CUDNN Attention especially on Hopper (FA3 implementation?) * Lots of bugfixes * Better performance * More numerically stable / fixed heuristics * More functionality for SDPA Pull Request resolved: pytorch#137978 Approved by: https://github.com/eqy, https://github.com/drisspg, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/malfet
Thanks to pytorch#137978 from @Skylion007 which bumps to cuDNN 9.5.1 the broken assumption of dO strides == O strides is fixed Note that there is still the restriction that the innermost stride of the grad output is 1 (this is almost always guaranteed because this condition is required of the input tensors). The main exception would be in test code that does e.g., `.sum().backward()` which yields grad output tensors with strides `[0, 0, 0, 0]`. CC @drisspg Pull Request resolved: pytorch#141147 Approved by: https://github.com/drisspg
Fixes pytorch#123649 Use Manylinux 2_28 Docker builds for PyTorch Nightly builds This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024. Information: https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based manylinux_2_28 (AlmaLinux 8 based) Toolchain: GCC 13 Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including: Debian 10+ Ubuntu 18.10+ Fedora 29+ CentOS/RHEL 8+ This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978 Pull Request resolved: pytorch#138732 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec