-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[ROCm] upgrade nightly wheels to rocm6.4 #151355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151355
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 New FailuresAs of commit 9452804 with merge base daf2ccf ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@huydhn Is this something you can help with? The only difference between the previous passed build and current failing build seems to be that the former used non-lf CI runners while the latter uses lf CI runners |
|
@pytorchbot rebase -b viable/strict |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
fd5b13c to
a4395ee
Compare
|
A bit unrelated: why are we using new ECRs here instead of tags? We stopped creating new ECRs and just use tags for different builds |
But current failure are due to the fact that LF runners do not have push permissions to new ECRs |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
a4395ee to
4486527
Compare
@malfet Yes, we'd like to move away from using ECRs to just tags. But I think that requires changing the naming convention for the docker repo to be ROCm-version agnostic, and have the ROCm version as part of the tag:
https://github.com/pytorch/pytorch/actions/runs/14502430577/job/40685004248#step:4:154 So need to have: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/manylinux2_28-builder-rocm:6.4-f8555c14c97c7831a7f9e6eb8220b15ecbc8cb40OR 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/manylinux2_28-builder:rocm6.4-f8555c14c97c7831a7f9e6eb8220b15ecbc8cb40instead of 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/manylinux2_28-builder-rocm6.4:f8555c14c97c7831a7f9e6eb8220b15ecbc8cb40
|
|
@pytorchbot merge -f "only failures are due to rocm 6.4 builder images not refreshed w/ magma package and available for the 6.4 wheels; images built fine, should work out okay" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd