Skip to content

Conversation

@tinglvv
Copy link
Collaborator

@tinglvv tinglvv commented Feb 4, 2025

#145570

Adding Cuda 12.8 and keeping 12.6 for the sbsa build, supported CUDA_ARCH: 9.0, 10.0, 12.0

Refactor the binaries matrix for cuda sbsa build. Previously cuda-aarch64 was hardcoded to cuda 12.6. Now reads 12.6 and 12.8, new build naming example manywheel-py3_9-cuda-aarch64-12_8-build

TODO: once 12.8 is stable, remove 12.6 in sbsa

cc @atalman @malfet @ptrblck @nWEIdia

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Feb 4, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146378

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 1 Cancelled Job, 1 Unrelated Failure

As of commit f8136e4 with merge base 0463cb6 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@tinglvv tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Feb 4, 2025
@tinglvv tinglvv marked this pull request as ready for review February 4, 2025 08:47
@tinglvv tinglvv requested a review from a team as a code owner February 4, 2025 08:47
@tinglvv tinglvv marked this pull request as draft February 4, 2025 08:47
@tinglvv tinglvv marked this pull request as ready for review February 4, 2025 10:40
@tinglvv tinglvv marked this pull request as draft February 4, 2025 10:41
@tinglvv tinglvv marked this pull request as ready for review February 4, 2025 18:11
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, perhaps we should deprecate CUDA 12.6 aarch64 builds once CUDA 12.8 is available

@tinglvv
Copy link
Collaborator Author

tinglvv commented Feb 4, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 4, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm6.3-py3.10 / test (default, 2, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

@tinglvv
Copy link
Collaborator Author

tinglvv commented Feb 4, 2025

ROCM build failures are known SEV. Windows build failure should be unrelated? Restarted in https://github.com/pytorch/pytorch/actions/runs/13132625013/job/36681731706, but not sure if we need to wait for the result as this change doesnt concern Windows.

@tinglvv
Copy link
Collaborator Author

tinglvv commented Feb 5, 2025

@pytorchbot merge -i "rocm failures are known"

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 5, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: rocm failures are known

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@tinglvv
Copy link
Collaborator Author

tinglvv commented Feb 5, 2025

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 5 checks: windows-binary-wheel / wheel-py3_11-cuda11_8-build, linux-binary-libtorch-cxx11-abi / libtorch-rocm6_2_4-shared-with-deps-cxx11-abi-test, trunk / linux-focal-rocm6.3-py3.10 / test (default, 1, 2, linux.rocm.gpu.2), trunk / linux-focal-rocm6.3-py3.10 / test (default, 2, 2, linux.rocm.gpu.2), trunk / linux-focal-rocm6.3-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.4)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants