Description
When the CRI Integration tests are run on a device with the ppc64 architecture, the TestCRIImagePullTimeout/HoldingContentOpenWriter and TestCRIImagePullTimeout/NoDataTransferred tests fail. The resulting error message states that the manifest should be locked, causing the call to PullImage to timeout or return a canceled context. Instead an error is returned, causing the tests to fail.
This issue appeared on a ppc64le machine with the Ubuntu 20.04.4 operating system. The same tests pass on an x86_64 machine with the Ubuntu 20.04.4 operating system.
Steps to reproduce the issue
- Clone the containerd repository to a ppc64 machine
- Install and setup all prerequisites as outlined in the BUILDING.md file
- Run the CRI Integration tests. (
make cri-integration)
Describe the results you received and expected
Received:
The two tests fail with a message saying that the manifest should be locked, resulting in a timeout or a canceled context rather than returning an error.
=== CONT TestCRIImagePullTimeout/NoDataTransferred
image_pull_timeout_test.go:251:
Error Trace: /home/containerd_test/containerd/image_pull_timeout_test.go:251
Error: Not equal:
expected: *fmt.wrapError(&fmt.wrapError{msg:"no match for platform in manifest: not found", err:(*errors.errorString)(0xc000190650)})
actual : *errors.errorString(&errors.errorString{s:"context canceled"})
Test: TestCRIImagePullTimeout/NoDataTransferred
Messages: [0] expected canceled error, but got (failed to pull and unpack image "127.0.0.1:37041/containerd/registry:2.7": no match for platform in manifest: not found)
time="2022-07-21T12:14:34-04:00" level=info msg="stop pulling image 127.0.0.1:37041/containerd/registry:2.7: active requests=0, bytes read=914"
image_pull_timeout_test.go:252:
Error Trace: /home/containerd_test/containerd/image_pull_timeout_test.go:252
Error: Not equal:
expected: false
actual : true
Test: TestCRIImagePullTimeout/NoDataTransferred
Messages: [0] expected to hit circuit breaker
time="2022-07-21T12:14:34-04:00" level=info msg="Get image filesystem path \"/tmp/TestCRIImagePullTimeoutNoDataTransferred3773157104/001/root/io.containerd.snapshotter.v1.overlayfs\""
=== CONT TestCRIImagePullTimeout/HoldingContentOpenWriter
image_pull_timeout_test.go:164: PullImage should not return because the manifest has been locked, but got error=failed to pull and unpack image "ghcr.io/containerd/registry:2.7": no match for platform in manifest: not found
time="2022-07-21T12:14:34-04:00" level=info msg="stop pulling image ghcr.io/containerd/registry:2.7: active requests=0, bytes read=914"
time="2022-07-21T12:14:34-04:00" level=warning msg="content garbage collection failed" error="lstat /tmp/TestCRIImagePullTimeoutHoldingContentOpenWriter1377758082/001/root/io.containerd.content.v1.content/blobs: no such file or directory"
=== CONT TestCRIImagePullTimeout/NoDataTransferred
time="2022-07-21T12:14:35-04:00" level=info msg="stop pulling image 127.0.0.1:37041/containerd/registry:2.7: active requests=0, bytes read=914"
image_pull_timeout_test.go:251:
Error Trace: /home/containerd_test/containerd/image_pull_timeout_test.go:251
Error: Not equal:
expected: *fmt.wrapError(&fmt.wrapError{msg:"no match for platform in manifest: not found", err:(*errors.errorString)(0xc000190650)})
actual : *errors.errorString(&errors.errorString{s:"context canceled"})
Test: TestCRIImagePullTimeout/NoDataTransferred
Messages: [1] expected canceled error, but got (failed to pull and unpack image "127.0.0.1:37041/containerd/registry:2.7": no match for platform in manifest: not found)
image_pull_timeout_test.go:252:
Error Trace: /home/containerd_test/containerd/image_pull_timeout_test.go:252
Error: Not equal:
expected: false
actual : true
Test: TestCRIImagePullTimeout/NoDataTransferred
Messages: [1] expected to hit circuit breaker
--- FAIL: TestCRIImagePullTimeout (0.00s)
--- FAIL: TestCRIImagePullTimeout/HoldingContentOpenWriter (1.66s)
--- FAIL: TestCRIImagePullTimeout/NoDataTransferred (2.23s)
FAIL
+ test_exit_code=1
Expected:
Both tests should pass.
=== CONT TestCRIImagePullTimeout/NoDataTransferred
time="2022-07-21T19:36:34Z" level=info msg="metadata content store policy set" policy=shared
time="2022-07-21T19:36:34Z" level=info msg="metadata content store policy set" policy=shared
time="2022-07-21T19:36:34Z" level=info msg="Get image filesystem path \"/tmp/TestCRIImagePullTimeoutHoldingContentOpenWriter1652422797/001/root/io.containerd.snapshotter.v1.overlayfs\""
time="2022-07-21T19:36:34Z" level=info msg="Get image filesystem path \"/tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.snapshotter.v1.overlayfs\""
=== CONT TestCRIImagePullTimeout/HoldingContentOpenWriter
image_pull_timeout_test.go:143: locked the manifest {MediaType:application/vnd.docker.distribution.manifest.v2+json Digest:sha256:b0b8dd398630cbb819d9a9c2fbd50561370856874b5d5d935be2e0af07c0ff4c Size:1363 URLs:[] Annotations:map[] Data:[] Platform:0xc00071e900}
image_pull_timeout_test.go:143: locked the manifest {MediaType:application/vnd.docker.distribution.manifest.v2+json Digest:sha256:6de6b4d5063876c92220d0438ae6068c778d9a2d3845b3d5c57a04a307998df6 Size:1363 URLs:[] Annotations:map[] Data:[] Platform:0xc00071e960}
image_pull_timeout_test.go:143: locked the manifest {MediaType:application/vnd.docker.distribution.manifest.v2+json Digest:sha256:c11a277a91045f91866550314a988f937366bc2743859aa0f6ec8ef57b0458ce Size:1363 URLs:[] Annotations:map[] Data:[] Platform:0xc00071e9c0}
time="2022-07-21T19:37:00Z" level=warning msg="after 3149824 bytes transferred, enable breaker and retransfer after 1m40s"
time="2022-07-21T19:37:09Z" level=error msg="cancel pulling image 127.0.0.1:36251/containerd/registry:2.7 because of no progress in 5s"
time="2022-07-21T19:37:09Z" level=error msg="failed to forward response: context canceled"
time="2022-07-21T19:37:09Z" level=warning msg="content garbage collection failed" error="unlinkat /tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.content.v1.content/ingest/d9bf94c74add0c5a4271fd84895a041fe984451d95d2e62f512cfe1401bd2057: directory not empty"
time="2022-07-21T19:37:09Z" level=info msg="Get image filesystem path \"/tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.snapshotter.v1.overlayfs\""
time="2022-07-21T19:37:10Z" level=info msg="stop pulling image ghcr.io/containerd/registry:2.7: active requests=0, bytes read=9946871"
time="2022-07-21T19:37:10Z" level=info msg="failed to resume the status from path /tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.content.v1.content/ingest/d9bf94c74add0c5a4271fd84895a041fe984451d95d2e62f512cfe1401bd2057: failed reading status of resume write: stat /tmp/TestCRIImagePullTimeoutNoDataTransferred862974065/001/root/io.containerd.content.v1.content/ingest/d9bf94c74add0c5a4271fd84895a041fe984451d95d2e62f512cfe1401bd2057/data: no such file or directory: not found. will recreate them"
time="2022-07-21T19:37:10Z" level=warning msg="after 3149824 bytes transferred, enable breaker and retransfer after 1m40s"
time="2022-07-21T19:37:19Z" level=error msg="cancel pulling image 127.0.0.1:36251/containerd/registry:2.7 because of no progress in 5s"
time="2022-07-21T19:37:19Z" level=error msg="failed to forward response: context canceled"
--- PASS: TestCRIImagePullTimeout (0.00s)
--- PASS: TestCRIImagePullTimeout/HoldingContentOpenWriter (35.80s)
--- PASS: TestCRIImagePullTimeout/NoDataTransferred (45.12s)
PASS
+ test_exit_code=0
What version of containerd are you using?
git main branch
Any other relevant information
ppc64 runc --version:
runc version 1.1.3
commit: v1.1.3-0-g6724737f
spec: 1.0.2-dev
go: go1.18.3
libseccomp: 2.5.1
ppc64 uname:
Linux rdr-runtimes-containerd-dev 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:53:30 UTC 2020 ppc64le ppc64le ppc64le GNU/Linux
x86_64 runc --version:
runc version 1.1.3
commit: v1.1.3-0-g6724737f
spec: 1.0.2-dev
go: go1.18.4
libseccomp: 2.5.1
x86_64 uname:
Linux rdr-containerd-dev 5.4.0-1023-ibm #25-Ubuntu SMP Tue May 24 16:50:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Show configuration if it is related to CRI plugin.
No response
Description
When the CRI Integration tests are run on a device with the ppc64 architecture, the
TestCRIImagePullTimeout/HoldingContentOpenWriterandTestCRIImagePullTimeout/NoDataTransferredtests fail. The resulting error message states that the manifest should be locked, causing the call toPullImageto timeout or return a canceled context. Instead an error is returned, causing the tests to fail.This issue appeared on a ppc64le machine with the Ubuntu 20.04.4 operating system. The same tests pass on an x86_64 machine with the Ubuntu 20.04.4 operating system.
Steps to reproduce the issue
make cri-integration)Describe the results you received and expected
Received:
The two tests fail with a message saying that the manifest should be locked, resulting in a timeout or a canceled context rather than returning an error.
Expected:
Both tests should pass.
What version of containerd are you using?
git main branch
Any other relevant information
ppc64 runc --version:
ppc64 uname:
Linux rdr-runtimes-containerd-dev 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:53:30 UTC 2020 ppc64le ppc64le ppc64le GNU/Linuxx86_64 runc --version:
x86_64 uname:
Linux rdr-containerd-dev 5.4.0-1023-ibm #25-Ubuntu SMP Tue May 24 16:50:46 UTC 2022 x86_64 x86_64 x86_64 GNU/LinuxShow configuration if it is related to CRI plugin.
No response