Skip to content

Pushing a multi-platform image to ghcr.io results in an endless loop #834

@reconman

Description

@reconman

If you build an image for multiple CPU architectures at the same time and use --push, the upload of the images will often get stuck in an endless loop.

The following line is printed over and over again:

error: failed to copy: failed to do request: Put "https://ghcr.io/v2/reconman/example-buildx-push/blobs/upload/a5521203-2c8d-49d5-bcde-d9ba8500a5b0?digest=sha256%3A1e1235e447358303a2d2975f6078eb4f82db3b64fe1ef840976f6033eac9a19f": write tcp 172.17.0.2:40356->140.82.113.33:443: write: connection reset by peer

I'm able to easily reproduce the issue by building a python-based image with all architectures allowed by the base image: https://github.com/reconman/example-buildx-push

I increased the number of layers by adding some RUN commands because I'm suspecting that it increases the failure chance.

When I changed --push to type=oci,dest=/tmp/image.tar and ran the following containerd commands manually, I encountered containerd/containerd#2706, so it may be related to that?

sudo ctr i import --base-name ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} --digests --all-platforms /tmp/image.tar
while IFS= read -r line; do
  sudo ctr i push --user "${{ github.actor }}:${{ secrets.GITHUB_TOKEN }}" $line;
done <<< "${{ steps.meta.outputs.tags }}"

Here are the Github workflow logs with the Buildkit debug flag enabled: logs_1.zip

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions