Skip to content

Conversation

@Skylion007
Copy link
Collaborator

@Skylion007 Skylion007 commented Oct 15, 2024

  • Significantly faster, better CUDNN Attention especially on Hopper (FA3 implementation?)
  • Lots of bugfixes
  • Better performance
  • More numerically stable / fixed heuristics
  • More functionality for SDPA

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

@Skylion007 Skylion007 requested review from a team and jeffdaily as code owners October 15, 2024 13:24
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137978

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 5 New Failures

As of commit 47a29bd with merge base 44afaac (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Skylion007
Copy link
Collaborator Author

@eqy Do we need to update any checks for CUDNN Attention here to take advantage of the new features, or do we just leave it as is for now and fix in a subsequent PR?

@Skylion007 Skylion007 requested a review from drisspg October 15, 2024 13:50
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 15, 2024
@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-5-0-50 branch from 25fecc4 to 4b5c106 Compare October 15, 2024 15:47
@Skylion007 Skylion007 requested a review from malfet October 15, 2024 16:47
@eqy
Copy link
Collaborator

eqy commented Oct 15, 2024

@eqy Do we need to update any checks for CUDNN Attention here to take advantage of the new features, or do we just leave it as is for now and fix in a subsequent PR?

Let's do it step by step in case something breaks ;)

@eqy
Copy link
Collaborator

eqy commented Oct 15, 2024

CC @nWEIdia

Copy link
Contributor

@drisspg drisspg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, any reason why the update to accuracy?

@Skylion007
Copy link
Collaborator Author

Looks good, any reason why the update to accuracy?

Better heuristics / accuracy as a result of CUDNN would have caused unexpected successes otherwise

@Skylion007
Copy link
Collaborator Author

@drisspg How should I land this? I think I need the CUDNN uploaded to the S3.

@Skylion007 Skylion007 added ciflow/trunk Trigger trunk jobs on your pull request better-engineering Relatively self-contained tasks for better engineering contributors labels Oct 16, 2024
@drisspg
Copy link
Contributor

drisspg commented Oct 16, 2024

Just to confirm, what else is needed

  1. Land: Update to cudnn 9.5.0.50 builder#2014 in parallel which updates windows builds?
  2. And then making sure the wheels are upload to org/whl/nvidia-cudnn-cu12/ ?

@atalman
I do see this file: https://github.com/pytorch/test-infra/blob/main/s3_management/update_dependencies.py
We just need to upload the newest version from here right: https://pypi.org/project/nvidia-cudnn-cu12/#history?

Not sure if this section is still up to date: https://github.com/pytorch/builder/blob/bcd0972459afd130a1c44b7386ae10c69cc1d30b/CUDA_UPGRADE_GUIDE.MD#upgrade-cudnn-version-only

@nWEIdia
Copy link
Collaborator

nWEIdia commented Oct 16, 2024

I realize test-infra windows AMI changes may also be needed: example https://github.com/pytorch/test-infra/pull/1523/files

A more recent reference: pytorch/test-infra@2364201

@Skylion007 Skylion007 changed the title [BE]: Update CUDNN for Linux to 9.5.1.17 [BE]: Update CUDNN for Linux to 9.5.1.17 for 12.6 only Nov 19, 2024

NCCL_VERSION=v2.21.5-1
CUDNN_VERSION=9.1.0.70
CUDNN_VERSION=9.5.1.17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that it matters, but PR would have been 3 lines shorter, if global CUDNN_VERSION to be kept at 9.1, but 12.6 one updated to 9.5

@atalman , @seemethere : we need to pay attention to metadata and make sure that poetry is still usable for cuda-12.6 if we keep different versions for different releases (I suspect right now it is not)

Copy link
Collaborator Author

@Skylion007 Skylion007 Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet Intentional, I want those lines deleted 1 by 1 in the future. Any new CUDA versions should default to the global default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet Metadata for different system should be used from CUDA 12.4 - the one we publish to pypi. It should not affect poetry issue.

Copy link
Collaborator

@nWEIdia nWEIdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install_cudnn.sh seems to have a if else syntax issue.

@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-5-0-50 branch from a737d3e to 47a29bd Compare November 19, 2024 19:01
@Skylion007 Skylion007 requested a review from nWEIdia November 19, 2024 19:01
Copy link
Collaborator

@nWEIdia nWEIdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@atalman
Copy link
Contributor

atalman commented Nov 20, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented Nov 20, 2024

@pytorchmergebot merge -f "xpu failures are not related"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Nov 26, 2024
Thanks to #137978 from @Skylion007 which bumps to cuDNN 9.5.1 the broken assumption of dO strides == O strides is fixed

Note that there is still the restriction that the innermost stride of the grad output is 1 (this is almost always guaranteed because this condition is required of the input tensors). The main exception would be in test code that does e.g., `.sum().backward()` which yields grad output tensors with strides `[0, 0, 0, 0]`.

CC @drisspg

Pull Request resolved: #141147
Approved by: https://github.com/drisspg
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Fixes pytorch#123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978

Pull Request resolved: pytorch#138732
Approved by: https://github.com/Skylion007, https://github.com/malfet
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Fixes pytorch#123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978

Pull Request resolved: pytorch#138732
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Thanks to pytorch#137978 from @Skylion007 which bumps to cuDNN 9.5.1 the broken assumption of dO strides == O strides is fixed

Note that there is still the restriction that the innermost stride of the grad output is 1 (this is almost always guaranteed because this condition is required of the input tensors). The main exception would be in test code that does e.g., `.sum().backward()` which yields grad output tensors with strides `[0, 0, 0, 0]`.

CC @drisspg

Pull Request resolved: pytorch#141147
Approved by: https://github.com/drisspg
fmo-mt pushed a commit to fmo-mt/pytorch that referenced this pull request Dec 11, 2024
Fixes pytorch#123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978

Pull Request resolved: pytorch#138732
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

better-engineering Relatively self-contained tasks for better engineering contributors ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants