Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OCI/Image Volume Source support #10579

Merged
merged 1 commit into from
Feb 18, 2025

Conversation

wzshiming
Copy link
Contributor

Fixed #10496

@k8s-ci-robot
Copy link

Hi @wzshiming. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@wzshiming wzshiming force-pushed the feat/oci-image-mount branch 2 times, most recently from 4446688 to fdc84b3 Compare August 12, 2024 06:38
@wzshiming wzshiming force-pushed the feat/oci-image-mount branch 22 times, most recently from 8de3699 to 1589b41 Compare August 12, 2024 16:40
mountPath := "/image-mount"
EnsureImageExists(t, testMountImage)
EnsureImageExists(t, testImage)
testImageMountSELinux(t, testImage, testMountImage, mountPath, "s0:c4,c5", "system_u:object_r:container_file_t:s0:c4,c5 pause")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wzshiming

I think this case fails.

   default: --- PASS: TestImageMount (1.84s)
    default: === RUN   TestImageMountSELinux
    default:     main_test.go:786: Image "registry.k8s.io/pause:3.10" already exists, not pulling.
    default:     main_test.go:786: Image "registry.k8s.io/e2e-test-images/resource-consumer:1.10" already exists, not pulling.
    default:     container_log_test.go:145:
    default:         	Error Trace:	/root/go/src/github.com/containerd/containerd/integration/container_log_test.go:145
    default:         	            				/root/go/src/github.com/containerd/containerd/integration/image_mount_test.go:113
    default:         	            				/root/go/src/github.com/containerd/containerd/integration/image_mount_test.go:67
    default:         	Error:      	Should be true
    default:         	Test:       	TestImageMountSELinux
    default: --- FAIL: TestImageMountSELinux (2.47s)

Would you please show more log in your commit? It would be easier to fix. Thanks

Copy link
Contributor Author

@wzshiming wzshiming Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error Trace: /root/go/src/github.com/containerd/containerd/integration/container_log_test.go:145

func checkContainerLog(t *testing.T, log string, messages []string) {
lines := strings.Split(strings.TrimSpace(log), "\n")
require.Len(t, lines, len(messages), "log line number should match")
for i, line := range lines {
ts, msg, ok := strings.Cut(line, " ")
require.True(t, ok)
_, err := time.Parse(time.RFC3339Nano, ts)

Maybe we can just retest, it doesn't seem to be caused by this change.

I will tried to find out why

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check what it will show to us after adding debug log

criruntime "k8s.io/cri-api/pkg/apis/runtime/v1"
)

func TestImageMountOnWS(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails as well.

=== RUN   TestImageMountOnWS
    main_test.go:790: Pull test image "ghcr.io/containerd/windows/nanoserver:ltsc2022"
    container_log_test.go:148: 
        	Error Trace:	D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/container_log_test.go:148
        	            				D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/image_mount_test.go:160
        	            				D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/image_mount_windows_test.go:37
        	Error:      	Not equal: 
        	            	expected: "stdout F License.txt"
        	            	actual  : "stderr F The system cannot find the path specified."
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-stdout F License.txt
        	            	+stderr F The system cannot find the path specified.
        	Test:       	TestImageMountOnWS
        	Messages:   	log content should match
E0213 19:33:03.238072    5016 remote_runtime.go:159] StopPodSandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a" from runtime service failed: rpc error: code = Unknown desc = failed to cleanup image mounts for sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to convert to full path: The system cannot find the path specified.
    main_test.go:271: 
        	Error Trace:	D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/main_test.go:271
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1176
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1354
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1684
        	Error:      	Received unexpected error:
        	            	rpc error: code = Unknown desc = failed to cleanup image mounts for sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to convert to full path: The system cannot find the path specified.
        	Test:       	TestImageMountOnWS
E0213 19:33:03.280214    5016 remote_runtime.go:179] RemovePodSandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a" from runtime service failed: rpc error: code = Unknown desc = failed to forcibly stop sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to cleanup image mounts for sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to convert to full path: The system cannot find the path specified.
    main_test.go:272: 
        	Error Trace:	D:/a/containerd/containerd/src/github.com/containerd/containerd/integration/main_test.go:272
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1176
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1354
        	            				C:/hostedtoolcache/windows/go/1.23.5/x64/src/testing/testing.go:1684
        	Error:      	Received unexpected error:
        	            	rpc error: code = Unknown desc = failed to forcibly stop sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to cleanup image mounts for sandbox "88c5479a1a9360c8682c5f3f06b17890f87b4f5da17e0edf4871d3c02403d41a": failed to convert to full path: The system cannot find the path specified.
        	Test:       	TestImageMountOnWS
--- FAIL: TestImageMountOnWS (17.97s)

Copy link
Contributor Author

@wzshiming wzshiming Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that leaves Windows Server support, which may need someone more familiar with Windows to figure out exactly what is causing the problem.
I've tried everything I can find, but to no avail. I will remove it, just support Linux first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it, just support Linux first.

@wzshiming sure. we can have track item to add test in windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works!

@wzshiming wzshiming force-pushed the feat/oci-image-mount branch from 2eb28ee to 4f429ff Compare February 14, 2025 02:29
@wzshiming wzshiming force-pushed the feat/oci-image-mount branch 2 times, most recently from 238b526 to 7bf9343 Compare February 17, 2025 10:17
@carlory
Copy link

carlory commented Feb 17, 2025

FYI cri adds the subpath support for image volume type: kubernetes/kubernetes#130135

@wzshiming wzshiming force-pushed the feat/oci-image-mount branch 3 times, most recently from 8bda7ec to bca0a3c Compare February 17, 2025 11:40
@mikebrow
Copy link
Member

FYI cri adds the subpath support for image volume type: kubernetes/kubernetes#130135

nod .. let's get this one merged and follow up with subpath, unless subpath has already merged and the test is required.

@mikebrow
Copy link
Member

=== RUN TestImageMount
main_test.go:786: Image "registry.k8s.io/pause:3.10" already exists, not pulling.
main_test.go:790: Pull test image "ghcr.io/containerd/alpine:3.14.0"
image_mount_test.go:165:
Error Trace: /home/runner/work/containerd/containerd/integration/image_mount_test.go:165
/home/runner/work/containerd/containerd/integration/image_mount_test.go:44
Error: Received unexpected error:
timeout exceeded
Test: TestImageMount
--- FAIL: TestImageMount (31.71s)

hmm.. a timeout .. maybe use BusyBox it should always be smaller..

@fuweid
Copy link
Member

fuweid commented Feb 18, 2025

   default: time="2025-02-17T12:02:03.557783499Z" level=debug msg="remove snapshot" key=/run/containerd-test/io.containerd.grpc.v1.cri/image-volumes/e282eb3b5a1a8337b5e9016253b41f3359460325ec91e4d0e0dd966ad5187bca/ee6521f290b2168b6e0935a181d4cff9be1ac3f505666ef0e3c98fae8199917a snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.558221517Z" level=info msg="Container df376860f476b3d601d8c82e6ab95adcd1b7e49688fe937fd688ed8b54feccab: CDI devices from CRI Config.CDIDevices: []"
    default: time="2025-02-17T12:02:03.559726745Z" level=debug msg="schedule snapshotter cleanup" snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.560854785Z" level=debug msg="removed snapshot" key=k8s.io/95/e3c6fc9bf3c590cd333f266b2379748796bc7e9f1ea2ed840f06a018f19ab15e snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.561669736Z" level=debug msg="removed snapshot" key=k8s.io/99//run/containerd-test/io.containerd.grpc.v1.cri/image-volumes/e282eb3b5a1a8337b5e9016253b41f3359460325ec91e4d0e0dd966ad5187bca/ee6521f290b2168b6e0935a181d4cff9be1ac3f505666ef0e3c98fae8199917a snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.562498973Z" level=debug msg="removed snapshot" key=k8s.io/97/dbe9f8f1545805758f5aefc6b671fb8f320c8e1a2bcab1f9156e527d9552d7af snapshotter=overlayfs
    default: time="2025-02-17T12:02:03.563695813Z" level=info msg="CreateContainer within sandbox \"e282eb3b5a1a8337b5e9016253b41f3359460325ec91e4d0e0dd966ad5187bca\" for &ContainerMetadata{Name:test-image-mount-container,Attempt:0,} returns container id \"df376860f476b3d601d8c82e6ab95adcd1b7e49688fe937fd688ed8b54feccab\""

Checked the code and found that snapshot creation doesn't use lease so that containerd GC cleanups the volume after creation. That's why the log content is empty. Since the GC cleanups snapshots async, CI could be happy if ls -Z executes before GC.

please consider using the following code to fix.

diff --git a/internal/cri/server/container_image_mount.go b/internal/cri/server/container_image_mount.go
index 4e84da212..b8d9cf59c 100644
--- a/internal/cri/server/container_image_mount.go
+++ b/internal/cri/server/container_image_mount.go
@@ -23,6 +23,7 @@ import (
        "path/filepath"

        containerd "github.com/containerd/containerd/v2/client"
+       "github.com/containerd/containerd/v2/core/leases"
        "github.com/containerd/containerd/v2/core/mount"
        "github.com/containerd/errdefs"
        "github.com/containerd/log"
@@ -39,6 +40,11 @@ func (c *criService) mutateMounts(
        sandboxID string,
        platform imagespec.Platform,
 ) error {
+       if err := c.ensureLeaseExist(ctx, sandboxID); err != nil {
+               return fmt.Errorf("failed to ensure lease %v for sandbox: %w", sandboxID, err)
+       }
+
+       ctx = leases.WithLease(ctx, sandboxID)
        for _, m := range extraMounts {
                err := c.mutateImageMount(ctx, m, snapshotter, sandboxID, platform)
                if err != nil {
@@ -48,6 +54,17 @@ func (c *criService) mutateMounts(
        return nil
 }

+func (c *criService) ensureLeaseExist(ctx context.Context, sandboxID string) error {
+       leaseSvc := c.client.LeasesService()
+       _, err := leaseSvc.Create(ctx, leases.WithID(sandboxID))
+       if err != nil {
+               if errdefs.IsAlreadyExists(err) {
+                       err = nil
+               }
+       }
+       return err
+}
+
 func (c *criService) mutateImageMount(
        ctx context.Context,
        extraMount *runtime.Mount,
diff --git a/internal/cri/server/sandbox_remove.go b/internal/cri/server/sandbox_remove.go
index f7e1ed880..b2f728e6d 100644
--- a/internal/cri/server/sandbox_remove.go
+++ b/internal/cri/server/sandbox_remove.go
@@ -21,6 +21,7 @@ import (
        "fmt"
        "time"

+       "github.com/containerd/containerd/v2/core/leases"
        "github.com/containerd/containerd/v2/pkg/tracing"
        "github.com/containerd/errdefs"
        "github.com/containerd/log"
@@ -56,6 +57,12 @@ func (c *criService) RemovePodSandbox(ctx context.Context, r *runtime.RemovePodS
                return nil, fmt.Errorf("failed to forcibly stop sandbox %q: %w", id, err)
        }

+       if err := c.client.LeasesService().Delete(ctx, leases.Lease{ID: id}); err != nil {
+               if !errdefs.IsNotFound(err) {
+                       return nil, fmt.Errorf("failed to delete lease for sandbox %q: %w", id, err)
+               }
+       }
+
        // Return error if sandbox network namespace is not closed yet.
        if sandbox.NetNS != nil {
                nsPath := sandbox.NetNS.GetPath()
diff --git a/internal/cri/server/sandbox_run.go b/internal/cri/server/sandbox_run.go
index e895d66b3..5e25a007f 100644
--- a/internal/cri/server/sandbox_run.go
+++ b/internal/cri/server/sandbox_run.go
@@ -31,6 +31,7 @@ import (
        "github.com/containerd/typeurl/v2"
        runtime "k8s.io/cri-api/pkg/apis/runtime/v1"

+       "github.com/containerd/containerd/v2/core/leases"
        sb "github.com/containerd/containerd/v2/core/sandbox"
        "github.com/containerd/containerd/v2/internal/cri/annotations"
        "github.com/containerd/containerd/v2/internal/cri/bandwidth"
@@ -87,6 +88,22 @@ func (c *criService) RunPodSandbox(ctx context.Context, r *runtime.RunPodSandbox
                }
        }()

+       leaseSvc := c.client.LeasesService()
+       ls, lerr := leaseSvc.Create(ctx, leases.WithID(id))
+       if lerr != nil {
+               return nil, fmt.Errorf("failed to create lease for sandbox name %q: %w", name, lerr)
+       }
+       defer func() {
+               if retErr != nil {
+                       deferCtx, deferCancel := util.DeferContext()
+                       defer deferCancel()
+
+                       if derr := leaseSvc.Delete(deferCtx, ls); derr != nil {
+                               log.G(deferCtx).WithError(derr).Error("failed to delete lease during cleanup")
+                       }
+               }
+       }()
+
        var (
                err         error
                sandboxInfo = sb.Sandbox{ID: id}

@wzshiming wzshiming force-pushed the feat/oci-image-mount branch 2 times, most recently from b3e5b26 to c812018 Compare February 18, 2025 06:08
@wzshiming wzshiming force-pushed the feat/oci-image-mount branch from c812018 to 1ec10d9 Compare February 18, 2025 07:59
@wzshiming
Copy link
Contributor Author

@fuweid it works, thank you!!!

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

let's get this one merged first. Thanks @wzshiming!

@fuweid fuweid added this pull request to the merge queue Feb 18, 2025
Merged via the queue into containerd:main with commit e6ecbdd Feb 18, 2025
59 checks passed
saschagrunert added a commit to saschagrunert/test-infra that referenced this pull request Feb 24, 2025
…lpha-features`

We now explicitly enable the image volume tests in the job, because it
should be now supported after the merge of
containerd/containerd#10579.

We also still keep it in the alpha tests, because with the beta
graduation it will purposely disabled by default.

Signed-off-by: Sascha Grunert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add OCI Volume Source support