-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add CUDA 12.4 workflows #121684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CUDA 12.4 workflows #121684
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121684
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 456757f with merge base a03b9a2 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@ptrblck @nWEIdia why not with cudnn 9.0? It's major improve on flash attention, and I saw that headers were updated to cudnn 9.0 |
|
@johnnynunez It's not available as described in the review: #121684 (comment) |
I see.. thanks |
|
when will it merged? :) |
|
@ptrblck @nWEIdia nvidia cudnn9 now is available nvidia-cudnn-cu12 9.0.0.312 |
a02e1d9 to
fb5fc09
Compare
GHA results show this is needed to fix errors in pytorch/pytorch#121684 Reference: pytorch#1374
|
@nWEIdia while windows AMI is not yet in place we would need to add only Linux part of things. |
GHA results show this is needed to fix errors in pytorch/pytorch#121684 Reference: #1374
|
Hi @malfet, for pip wheels, the build seems successful, but the test job failed: installing it required the presence of cu124/ AWS directory. https://github.com/pytorch/pytorch/actions/runs/8701283769/job/23866760262 |
reference: https://docs.nvidia.com/cuda/archive/12.4.0/cuda-toolkit-release-notes/index.html#id6 Linux x86_64 Driver Version | Windows x86_64 Driver Version CUDA 12.4 GA >=550.54.14 | >=551.61
This reverts commit aaa24b9.
driver-version: "550.54.15"
This reverts commit 0e23765.
550.54.15 since
pytorch/test-infra@d5695df
is in main branch of pytorch/test-infra
We need to keep 12.1 here still since this is the default wheel we will be uploading to pypi 12.4 should still be experimental build for now. Co-authored-by: Andrey Talman <[email protected]>
97c18d7 to
6d474e5
Compare
|
@pytorchmergebot merge -f "All required jobs are passing" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Reference: #98492 Co-authored-by: Andrey Talman <[email protected]> Pull Request resolved: #121684 Approved by: https://github.com/atalman
Trying to keep a record of the steps before I lose track of it. - 1st Commit: Similar to pytorch/builder#1720 - 2nd Commit: Update CUDA 12.4 CI CUDA versions from 12.4.0 to 12.4.1 mapping to changes in https://github.com/pytorch/pytorch/pull/125944/files - 3rd Commit: update for aarch64 install_cuda_aarch64.sh docker step - 4th Commit: aaa456e Related #121684 - Synchronization point: Meta helps uploading pypi cuda dependencies specified in .github/scripts/generate_binary_build_matrix.py - The above pypi upload is done (thanks Andrey!), restarted jobs like https://github.com/pytorch/pytorch/actions/runs/10188203670/job/28369471321 - 7753234, use temporary docker containers (generated from a previous successful container build). If merged, these containers would be rebuilt, therefore testing them now. (5th commit) - 6th commit 5f93c62: revert the 5th commit. Update, done but have to debug seemingly irrelevant failures (rocm/xpu/mps) Pull Request resolved: #132202 Approved by: https://github.com/Skylion007, https://github.com/eqy, https://github.com/atalman
Reference: #98492
cc @albanD @ptrblck @atalman @malfet