Skip to content

ppc64le - CRI Image Pull Timeout Tests Failing #7197

@Jenkins-J

Description

@Jenkins-J

Description

When the CRI Integration tests are run on a device with the ppc64 architecture, the TestCRIImagePullTimeout/HoldingContentOpenWriter and TestCRIImagePullTimeout/NoDataTransferred tests fail. The resulting error message states that the manifest should be locked, causing the call to PullImage to timeout or return a canceled context. Instead an error is returned, causing the tests to fail.

This issue appeared on a ppc64le machine with the Ubuntu 20.04.4 operating system. The same tests pass on an x86_64 machine with the Ubuntu 20.04.4 operating system.

Steps to reproduce the issue

  1. Clone the containerd repository to a ppc64 machine
  2. Install and setup all prerequisites as outlined in the BUILDING.md file
  3. Run the CRI Integration tests. (make cri-integration)

Describe the results you received and expected

Received:

The two tests fail with a message saying that the manifest should be locked, resulting in a timeout or a canceled context rather than returning an error.

=== CONT  TestCRIImagePullTimeout/NoDataTransferred
    image_pull_timeout_test.go:251: 
                Error Trace:    /home/containerd_test/containerd/image_pull_timeout_test.go:251
                Error:          Not equal: 
                                expected: *fmt.wrapError(&fmt.wrapError{msg:"no match for platform in manifest: not found", err:(*errors.errorString)(0xc000190650)})
                                actual  : *errors.errorString(&errors.errorString{s:"context canceled"})
                Test:           TestCRIImagePullTimeout/NoDataTransferred
                Messages:       [0] expected canceled error, but got (failed to pull and unpack image "127.0.0.1:37041/containerd/registry:2.7": no match for platform in manifest: not found)
time="2022-07-21T12:14:34-04:00" level=info msg="stop pulling image 127.0.0.1:37041/containerd/registry:2.7: active requests=0, bytes read=914"
    image_pull_timeout_test.go:252: 
                Error Trace:    /home/containerd_test/containerd/image_pull_timeout_test.go:252
                Error:          Not equal: 
                                expected: false
                                actual  : true
                Test:           TestCRIImagePullTimeout/NoDataTransferred
                Messages:       [0] expected to hit circuit breaker
time="2022-07-21T12:14:34-04:00" level=info msg="Get image filesystem path \"/tmp/TestCRIImagePullTimeoutNoDataTransferred3773157104/001/root/io.containerd.snapshotter.v1.overlayfs\""
=== CONT  TestCRIImagePullTimeout/HoldingContentOpenWriter
    image_pull_timeout_test.go:164: PullImage should not return because the manifest has been locked, but got error=failed to pull and unpack image "ghcr.io/containerd/registry:2.7": no match for platform in manifest: not found
time="2022-07-21T12:14:34-04:00" level=info msg="stop pulling image ghcr.io/containerd/registry:2.7: active requests=0, bytes read=914"
time="2022-07-21T12:14:34-04:00" level=warning msg="content garbage collection failed" error="lstat /tmp/TestCRIImagePullTimeoutHoldingContentOpenWriter1377758082/001/root/io.containerd.content.v1.content/blobs: no such file or directory"
=== CONT  TestCRIImagePullTimeout/NoDataTransferred
time="2022-07-21T12:14:35-04:00" level=info msg="stop pulling image 127.0.0.1:37041/containerd/registry:2.7: active requests=0, bytes read=914"
    image_pull_timeout_test.go:251: 
                Error Trace:    /home/containerd_test/containerd/image_pull_timeout_test.go:251
                Error:          Not equal: 
                                expected: *fmt.wrapError(&fmt.wrapError{msg:"no match for platform in manifest: not found", err:(*errors.errorString)(0xc000190650)})
                                actual  : *errors.errorString(&errors.errorString{s:"context canceled"})
                Test:           TestCRIImagePullTimeout/NoDataTransferred
                Messages:       [1] expected canceled error, but got (failed to pull and unpack image "127.0.0.1:37041/containerd/registry:2.7": no match for platform in manifest: not found)
    image_pull_timeout_test.go:252: 
                Error Trace:    /home/containerd_test/containerd/image_pull_timeout_test.go:252
                Error:          Not equal: 
                                expected: false
                                actual  : true
                Test:           TestCRIImagePullTimeout/NoDataTransferred
                Messages:       [1] expected to hit circuit breaker
--- FAIL: TestCRIImagePullTimeout (0.00s)
    --- FAIL: TestCRIImagePullTimeout/HoldingContentOpenWriter (1.66s)
    --- FAIL: TestCRIImagePullTimeout/NoDataTransferred (2.23s)
FAIL
+ test_exit_code=1

Expected:

Both tests should pass.

=== CONT  TestCRIImagePullTimeout/NoDataTransferred
time="2022-07-21T19:36:34Z" level=info msg="metadata content store policy set" policy=shared
time="2022-07-21T19:36:34Z" level=info msg="metadata content store policy set" policy=shared
time="2022-07-21T19:36:34Z" level=info msg="Get image filesystem path \"/tmp/TestCRIImagePullTimeoutHoldingContentOpenWriter1652422797/001/root/io.containerd.snapshotter.v1.overlayfs\""
time="2022-07-21T19:36:34Z" level=info msg="Get image filesystem path \"/tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.snapshotter.v1.overlayfs\""
=== CONT  TestCRIImagePullTimeout/HoldingContentOpenWriter
    image_pull_timeout_test.go:143: locked the manifest {MediaType:application/vnd.docker.distribution.manifest.v2+json Digest:sha256:b0b8dd398630cbb819d9a9c2fbd50561370856874b5d5d935be2e0af07c0ff4c Size:1363 URLs:[] Annotations:map[] Data:[] Platform:0xc00071e900}
    image_pull_timeout_test.go:143: locked the manifest {MediaType:application/vnd.docker.distribution.manifest.v2+json Digest:sha256:6de6b4d5063876c92220d0438ae6068c778d9a2d3845b3d5c57a04a307998df6 Size:1363 URLs:[] Annotations:map[] Data:[] Platform:0xc00071e960}
    image_pull_timeout_test.go:143: locked the manifest {MediaType:application/vnd.docker.distribution.manifest.v2+json Digest:sha256:c11a277a91045f91866550314a988f937366bc2743859aa0f6ec8ef57b0458ce Size:1363 URLs:[] Annotations:map[] Data:[] Platform:0xc00071e9c0}
time="2022-07-21T19:37:00Z" level=warning msg="after 3149824 bytes transferred, enable breaker and retransfer after 1m40s"
time="2022-07-21T19:37:09Z" level=error msg="cancel pulling image 127.0.0.1:36251/containerd/registry:2.7 because of no progress in 5s"
time="2022-07-21T19:37:09Z" level=error msg="failed to forward response: context canceled"
time="2022-07-21T19:37:09Z" level=warning msg="content garbage collection failed" error="unlinkat /tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.content.v1.content/ingest/d9bf94c74add0c5a4271fd84895a041fe984451d95d2e62f512cfe1401bd2057: directory not empty"
time="2022-07-21T19:37:09Z" level=info msg="Get image filesystem path \"/tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.snapshotter.v1.overlayfs\""
time="2022-07-21T19:37:10Z" level=info msg="stop pulling image ghcr.io/containerd/registry:2.7: active requests=0, bytes read=9946871"
time="2022-07-21T19:37:10Z" level=info msg="failed to resume the status from path /tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.content.v1.content/ingest/d9bf94c74add0c5a4271fd84895a041fe984451d95d2e62f512cfe1401bd2057: failed reading status of resume write: stat /tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.content.v1.content/ingest/d9bf94c74add0c5a4271fd84895a041fe984451d95d2e62f512cfe1401bd2057/data: no such file or directory: not found. will recreate them"
time="2022-07-21T19:37:10Z" level=warning msg="after 3149824 bytes transferred, enable breaker and retransfer after 1m40s"
time="2022-07-21T19:37:19Z" level=error msg="cancel pulling image 127.0.0.1:36251/containerd/registry:2.7 because of no progress in 5s"
time="2022-07-21T19:37:19Z" level=error msg="failed to forward response: context canceled"
--- PASS: TestCRIImagePullTimeout (0.00s)
    --- PASS: TestCRIImagePullTimeout/HoldingContentOpenWriter (35.80s)
    --- PASS: TestCRIImagePullTimeout/NoDataTransferred (45.12s)
PASS
+ test_exit_code=0

What version of containerd are you using?

git main branch

Any other relevant information

ppc64 runc --version:

runc version 1.1.3
commit: v1.1.3-0-g6724737f
spec: 1.0.2-dev
go: go1.18.3
libseccomp: 2.5.1

ppc64 uname:

Linux rdr-runtimes-containerd-dev 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:53:30 UTC 2020 ppc64le ppc64le ppc64le GNU/Linux

x86_64 runc --version:

runc version 1.1.3
commit: v1.1.3-0-g6724737f
spec: 1.0.2-dev
go: go1.18.4
libseccomp: 2.5.1

x86_64 uname:

Linux rdr-containerd-dev 5.4.0-1023-ibm #25-Ubuntu SMP Tue May 24 16:50:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Show configuration if it is related to CRI plugin.

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions