Skip to content

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Mar 3, 2025

Fix nightly build failure during arm64 docker build (since 02.21.2025): https://github.com/pytorch/pytorch/actions/runs/13452177170/job/37588508155#step:12:851

Error:

#10 73.62 Segmentation fault (core dumped)
#10 73.67 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#10 73.85 Segmentation fault (core dumped)
#10 73.85 dpkg: error processing package libc-bin (--configure):
#10 73.85  installed libc-bin package post-installation script subprocess returned error exit status 139

Looks like we are hitting: moby/buildkit#5783

Update setup-qemu and buildkit actions to v3 and buildkit to v0.19.0

Please note: CUDA 12.8 error is not related to this failure in nightly cpu arm64. Looks like we are trying to install release torch when running on PR. Cuda 12.8 build is not released yet, hence a failure. Will send followup to make sure we are using nightly torch when running on PR.

@atalman atalman requested a review from a team as a code owner March 3, 2025 21:56
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148372

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 46 Pending

As of commit f052b12 with merge base a414138 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 3, 2025
@atalman atalman changed the title Docker pin qemu - try pinning Docker release - pin buildkit to v0.19.0 Mar 3, 2025
@atalman
Copy link
Contributor Author

atalman commented Mar 3, 2025

@pytorchmergebot merge -f "lint and docker builds are clear"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit to min-jean-cho/pytorch that referenced this pull request Mar 5, 2025
Fix nightly build failure during arm64 docker build (since 02.21.2025): https://github.com/pytorch/pytorch/actions/runs/13452177170/job/37588508155#step:12:851

Error:
```
pytorch#10 73.62 Segmentation fault (core dumped)
pytorch#10 73.67 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
pytorch#10 73.85 Segmentation fault (core dumped)
pytorch#10 73.85 dpkg: error processing package libc-bin (--configure):
pytorch#10 73.85  installed libc-bin package post-installation script subprocess returned error exit status 139
```
Looks like we are hitting: moby/buildkit#5783

Update setup-qemu and buildkit actions to v3 and buildkit to v0.19.0

Please note: CUDA 12.8 error is not related to this failure in nightly cpu arm64. Looks like we are trying to install release torch when running on PR. Cuda 12.8 build is not released yet, hence a failure. Will send followup to make sure we are using nightly torch when running on PR.

Pull Request resolved: pytorch#148372
Approved by: https://github.com/seemethere
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants