Skip to content

Conversation

@Skylion007
Copy link
Collaborator

@Skylion007 Skylion007 commented Mar 11, 2025

Also cu12.6 is an on old CUDNN version, we may want to upgrade it for all the performance reasons as I don't see a manywheel linux reason to stay back on the old 9.5 release. I might split that into it's own PR. This one just updates CU126 to the latest and greatest.

@Skylion007 Skylion007 requested review from eqy, jansel, malfet and nWEIdia March 11, 2025 14:44
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148963

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit 8b23833 with merge base f1787ee (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 11, 2025
@Skylion007 Skylion007 requested a review from tinglvv March 11, 2025 14:46
@Skylion007
Copy link
Collaborator Author

@tinglvv Opened the most recent PR for updating CUDNN for 12.8, any reason we didn't also update for 12.6? We had a version split previously due to ABI compatibility due to the manylinux upgrade, by that shouldn't be an issue anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably merged with 12.8 too, no reason to keep 12.6 on an old CUDNN version when there a lot of performance fixes that apply to Hopper in newer releases too now

@Skylion007 Skylion007 added the better-engineering Relatively self-contained tasks for better engineering contributors label Mar 11, 2025
@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-8-0-87 branch from 632751a to 8b23833 Compare March 11, 2025 15:10
@Skylion007 Skylion007 marked this pull request as ready for review March 11, 2025 15:45
@Skylion007 Skylion007 requested review from a team and jeffdaily as code owners March 11, 2025 15:45
@Skylion007
Copy link
Collaborator Author

@jansel Should we update CU126's libraries in this PR or another one?

@eqy
Copy link
Collaborator

eqy commented Mar 11, 2025

I would consider a separate PR, background is that 9.7+ is for Blackwell.
In the past we have not be super active in bumping cuDNN versions for older CUDA toolkit versions.

"nvidia-cuda-runtime-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cuda-cupti-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cudnn-cu12==9.7.1.26; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cudnn-cu12==9.8.0.87; platform_system == 'Linux' and platform_machine == 'x86_64' | "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this may need a synchronization point where @atalman usually helps us with uploading 9.8.0.87 nvidia-cudnn-cu12 first? Or this has already been done?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to upload it to our s3 bucket unfortunately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@nWEIdia nWEIdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just had a question on uploading pypi cudnn wheel to AWS S3.

@tinglvv tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Mar 11, 2025
@tinglvv
Copy link
Collaborator

tinglvv commented Mar 11, 2025

LGTM, if the ciflow/binaries pass then we are good to merge.

@Skylion007 Skylion007 requested a review from atalman March 11, 2025 17:20
@jansel
Copy link
Contributor

jansel commented Mar 11, 2025

@jansel Should we update CU126's libraries in this PR or another one?

Smaller PRs would be easier.

@Skylion007
Copy link
Collaborator Author

Thanks for uploading the binaries @atalman but it seems like the S3 bucket is returning a 403 error on the wheels.

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge -i

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thank you @Skylion007

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 13, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 6 checks: pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge), macos-arm64-binary-wheel / wheel-py3_10-cpu-build, macos-arm64-binary-wheel / wheel-py3_11-cpu-build, macos-arm64-binary-wheel / wheel-py3_13-cpu-build, macos-arm64-binary-wheel / wheel-py3_12-cpu-build, macos-arm64-binary-wheel / wheel-py3_9-cpu-build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

better-engineering Relatively self-contained tasks for better engineering contributors ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants