-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[CUDA] [CI]: Enable CUDA 12.4 CI #121956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] [CI]: Enable CUDA 12.4 CI #121956
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121956
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit a4a5d05 with merge base 5ea956a ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
malfet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think it needs some discussion, because if we are to stop building/testing CUDA-11.8, it means we are loosing Keplers, don't we?
@atalman is there a doc whether this is intended
why not mantain 11.8 as cuda 11 and 12.4 as cuda 12? And skip 12.1. I mean maintain always two versions of cuda, |
|
We discussed for a short term, we would have 11.8, 12.1, and 12.4. I will need to refactor this PR to add back 11.8. |
|
when will be merged? 😋 |
|
12.4 workflows are failing. Still working on coming up with a fix. |
|
@ptrblck @nWEIdia nvidia cudnn9 now is available nvidia-cudnn-cu12 9.0.0.312 |
|
@johnnynunez Yes, it is! We will focus on 12.4 in this PR and follow up with the cuDNN update separately to avoid creating confusing issues pointing to the CUDA and cuDNN update. |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
3678581 to
53df6a6
Compare
|
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
|
Successfully rebased |
bd6bbf3 to
c8c7ddf
Compare
Fixes issues encountered in pytorch#121956 Pull Request resolved: pytorch#125944 Approved by: https://github.com/atalman
|
Hi @nWEIdia please disable the failing tests. We will follow up on this in the issue you opened |
PR. Require manual testing. Planning to do it via a separate PR.
|
@pytorchmergebot merge -f "All required tests are pasing" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
For the context, after this change lands in trunk, the new CUDA 12.4 build starts to fails on newly created open PyTorch PR. Here is what happens:
This is not an ideal rollout, but the way for now is to ask folks to rebase onto main |
Discovered by @clee2000. The change was introduced in #121956 Pull Request resolved: #127121 Approved by: https://github.com/clee2000, https://github.com/Skylion007
Reference PR: pytorch#93406 Co-authored-by: Aidyn-A <[email protected]> Pull Request resolved: pytorch#121956 Approved by: https://github.com/atalman
Discovered by @clee2000. The change was introduced in pytorch#121956 Pull Request resolved: pytorch#127121 Approved by: https://github.com/clee2000, https://github.com/Skylion007
Reference PR: #93406
cc @atalman @malfet @ptrblck @eqy