Skip to content

Unpack images with per-layer labels for specific runtime#12835

Merged
mikebrow merged 1 commit intocontainerd:mainfrom
fidencio:topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter
Feb 24, 2026
Merged

Unpack images with per-layer labels for specific runtime#12835
mikebrow merged 1 commit intocontainerd:mainfrom
fidencio:topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter

Conversation

@fidencio
Copy link
Copy Markdown
Contributor

@fidencio fidencio commented Jan 29, 2026

Remote/proxy snapshotters like nydus need per-layer annotations on each
snapshot (cri.image-ref, cri.layer-digest, cri.manifest-digest,
cri.image-layers) so they can lazily fetch content inside the guest VM.
During a normal PullImage, these annotations are set by
AppendInfoHandlerWrapper and flow through the core/unpack.Unpacker to
each layer's Prepare/Commit call.

However, when an image is already present for one snapshotter (e.g.,
overlayfs) and needs to be used with a different one (e.g., nydus for
Kata), no pull occurs. The image must be unpacked into the target
snapshotter with the correct per-layer labels.

Replace the image.Unpack() fallback in customopts.WithNewSnapshot with
unpackImage, which leverages the existing core/unpack.Unpacker and wraps
the image handler with AppendInfoHandlerWrapper when snapshot annotations
are enabled (!DisableSnapshotAnnotations). This reuses the same unpack
machinery as PullImage, including retry handling, parallel layer
support, and deduplication.

@github-project-automation github-project-automation Bot moved this to Needs Triage in Pull Request Review Jan 29, 2026
@dosubot dosubot Bot added area/cri Container Runtime Interface (CRI) kind/bug labels Jan 29, 2026
fidencio added a commit to fidencio/containerd that referenced this pull request Jan 29, 2026
When users configure a snapshotter in the runtime config (e.g.,
`plugins."io.containerd.cri.v1.runtime".containerd.runtimes.kata.snapshotter`),
the CRI image service was not aware of this configuration. This caused
images to be pulled with the default snapshotter instead of the
runtime-specific one, because the image service's runtimePlatforms map
was not populated with these runtime-to-snapshotter mappings.

Let's make sure that during the CRI plugin init, we iterate over all the
configured runtimes, and propagate any snapshotter configuration to the
image service.

This was found out while working on containerd#12835.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio
Copy link
Copy Markdown
Contributor Author

cc @mikebrow @dims @dmcgowan, please, take a look, as it does solve some of the most infamous issues we've been facing with Kata Containers + nydus as a remote snapshotter.

fidencio added a commit to fidencio/containerd that referenced this pull request Jan 29, 2026
When users configure a snapshotter in the runtime config (e.g.,
`plugins."io.containerd.cri.v1.runtime".containerd.runtimes.kata.snapshotter`),
the CRI image service was not aware of this configuration. This caused
images to be pulled with the default snapshotter instead of the
runtime-specific one, because the image service's runtimePlatforms map
was not populated with these runtime-to-snapshotter mappings.

Let's make sure that during the CRI plugin init, we iterate over all the
configured runtimes, and propagate any snapshotter configuration to the
image service.

This was found out while working on containerd#12835.

Signed-off-by: Fabiano Fidêncio <[email protected]>
@fidencio fidencio force-pushed the topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter branch from 276ee69 to a615820 Compare January 30, 2026 06:51
fidencio added a commit to fidencio/containerd that referenced this pull request Jan 30, 2026
When users configure a snapshotter in the runtime config (e.g.,
`plugins."io.containerd.cri.v1.runtime".containerd.runtimes.kata.snapshotter`),
the CRI image service was not aware of this configuration. This caused
images to be pulled with the default snapshotter instead of the
runtime-specific one, because the image service's runtimePlatforms map
was not populated with these runtime-to-snapshotter mappings.

Let's make sure that during the CRI plugin init, we iterate over all the
configured runtimes, and propagate any snapshotter configuration to the
image service.

The issue was found while working on containerd#12835.

Signed-off-by: Fabiano Fidêncio <[email protected]>
Copy link
Copy Markdown
Member

@ChengyuZhu6 ChengyuZhu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only works with the pause image?

@fidencio fidencio force-pushed the topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter branch from a615820 to 9207487 Compare January 30, 2026 13:23
@fidencio
Copy link
Copy Markdown
Contributor Author

I think this only works with the pause image?

Indeed, the first version of the patch fixed only that immediate case. I've expanded it now.

@fidencio fidencio force-pushed the topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter branch 3 times, most recently from ab0b1a7 to 27ac2a7 Compare January 30, 2026 18:16
@fidencio fidencio changed the title cri: re-pull images not unpacked for target snapshotter cri: ensure images are unpacked for target snapshotter Jan 30, 2026
@fidencio
Copy link
Copy Markdown
Contributor Author

In order to have this cleanly backported to the release/2.2 branch, we'd need 7ef50ac also backported (which was introduced by @mxpv on #12773).

@fidencio fidencio marked this pull request as draft January 31, 2026 18:29
@fidencio fidencio marked this pull request as ready for review February 1, 2026 10:19
@dosubot
Copy link
Copy Markdown

dosubot Bot commented Feb 1, 2026

Related Documentation

No published documentation to review for changes on this repository.

Write your first living document

How did I do? Any feedback?  Join Discord

@fidencio
Copy link
Copy Markdown
Contributor Author

fidencio commented Feb 1, 2026

I have tested this throughout the Kata Containers NVIDIA CI and from the containerd side this was enough to get our tests passing.

Let me scratch that and do some more endurance tests.

@fidencio fidencio marked this pull request as draft February 1, 2026 12:28
@fidencio fidencio force-pushed the topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter branch from 27ac2a7 to 7662c0f Compare February 1, 2026 17:07
@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Feb 1, 2026
@fidencio fidencio force-pushed the topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter branch 3 times, most recently from 92abf95 to 7fe0c56 Compare February 21, 2026 10:30
@fidencio
Copy link
Copy Markdown
Contributor Author

And here we go again ... after #12849 changes I had to re-work this, as before that one we had a single flow: ensureImageExists -> PullImage (which detects stale state and re-pulls) -> returns fresh image -> Create uses that fresh image directly ... now we have Controller.Create doing c.client.GetImage() directly, and the pulling happening separately in criService.ensurePauseImageExists() at the RunPodSandbox level.

Okay, with that said, I had re-worked the PR, re-tested it, and my the tests on my side are passing. However, at this point I DO NOT think this is a material for a backport anymore.

@fidencio fidencio force-pushed the topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter branch 2 times, most recently from 168c933 to 45f847d Compare February 21, 2026 11:13
@fidencio
Copy link
Copy Markdown
Contributor Author

As a follow-up, as looking at the repo it doesn't seem like we have anything similar to that, but I'd like to have a test with containerd + nydus-snapshotter (proxy mode) + kata-containers added to ensure we do NOT regress on this one.

What would be the best way to proceed?

@mikebrow
Copy link
Copy Markdown
Member

mikebrow commented Feb 21, 2026

As a follow-up, as looking at the repo it doesn't seem like we have anything similar to that, but I'd like to have a test with containerd + nydus-snapshotter (proxy mode) + kata-containers added to ensure we do NOT regress on this one.

What would be the best way to proceed?

Not sure there is a best way.. more a set of choices... 1) Maybe add another k8s e2e node bucket over here github.com/containerd/nydus-snapshotter? 2) Possibly configure two different runtimes with two different core snapshotters here in containerd/containerd/integration. 3) Possibly a two runtime/snapshotter config test over in kubernetes-sigs/cri-tools. 5) obvious one .. test it in kata-containers, aligns with testing the new kata v4 sandbox.

/cc @estesp @dmcgowan @samuelkarp thoughts on best ways to test out the cross runtime/cross snapshotter variations to sus out issues with the experimental snapshotter/runtime feature?

@fidencio
Copy link
Copy Markdown
Contributor Author

Not sure there is a best way.. more a set of choices... 1) Maybe add another k8s e2e node bucket over here github.com/containerd/nydus-snapshotter? 2) Possibly configure two different runtimes with two different core snapshotters here in containerd/containerd/integration. 3) Possibly a two runtime/snapshotter config test over in kubernetes-sigs/cri-tools. 5) obvious one .. test it in kata-containers, aligns with testing the new kata v4 sandbox.

What I'm going to do for now is:

  • setup a scheduled job that runs every day on a free-runner, from my fork
  • each day it'll deploy k8s with the latest containerd from main
  • it'll setup 2 runtime handlers, one using overlayfs, and kata using nydus-snapshotter
  • run 2 pods
  • use ctr to delete layers / snapshots so we can mimic stale content
  • re-run the same 2 pods again

Knowing it broke on daily basis is good enough for me to bisect, and also seems a reasonable way to test it "on my side" before we commit to having such test on containerd side.

Comment thread internal/cri/server/sandbox_run.go Outdated
// This handles cases where image metadata exists but content was
// garbage collected, or where the image needs to be re-unpacked for
// a different snapshotter.
_, err := c.ImageService.PullImage(ctx, ref, nil, config, runtimeHandler)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an air-gapped environment, there is a case where the pause image is imported rather than pulled. If we force to pull image, it will be breaking change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to double-check with you if this is really a breaking change, as PullImage() calls imageForSnapshotter(), which short-circuits the PullImage returning early in case the image is already present, please, see: https://github.com/fidencio/containerd/blob/45f847d4af4537a682bcb5b2598982f7d65c8d63/internal/cri/server/images/image_pull.go#L173-L184

The cases were it wouldn't happen, are basically the cases were a re-import / re-pull are needed regardless.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// If UseLocalImagePull is true, use client.Pull to pull the image, else use transfer service by default.

Checkout that main branch. It's not. In k8s pod imagePolicy, we can set AlwaysPull. It can allow user to re-pull images like user might push new content to same tag. I don't think it should be short-circuits

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, you're completely right and it does break AlwaysPull scenarios. I'm working on a fix for this right now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Copy Markdown
Member

@mikebrow mikebrow Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering out-loud if we should add indicator labels to imports of the pause image, and other images...

Comment thread internal/cri/server/images/service.go Outdated
//
// If snapshots already exist with incorrect labels (e.g., from a previous unpack
// with wrong configuration), they are updated with the correct labels.
func (c *CRIImageService) UnpackImage(ctx context.Context, ref string, snapshotter string) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to confirm this behavior: can we determine whether the image was unpacked with a specific snapshotter? If the image was unpacked without the label, would Nydus behave differently?

Quote Line 271.

// UnpackImage unpacks an existing image into the specified snapshotter.
// This is used when an image exists locally but needs to be unpacked for a different
// snapshotter

It looks like we can do unpack with label if it's not unpacked for nydus.

// IsUnpacked returns whether an image is unpacked.
IsUnpacked(context.Context, string) (bool, error)

Copy link
Copy Markdown
Member

@fuweid fuweid Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the current design, containerd uses the chain ID as the snapshot key. So if we unpack an image for Nydus without a label, we can’t re-unpack it later with a label because the snapshot already exists.

That’s why I’m asking about the expected behavior. To me, it seems we should first check whether it has already been unpacked (IsUnpack). If not, and then we should unpack it first. We don't need to update that label.

Copy link
Copy Markdown
Contributor Author

@fidencio fidencio Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuweid, you're right on both points, I re-worked this series (already tested on my side), please, take a second look at it whenever you find the time.

While here, let me touch the questions.

can we determine whether the image was unpacked with a specific snapshotter?

Yes, image.IsUnpacked(ctx, snapshotterName) checks if snapshots exist in that specific snapshotter's store.

If the image was unpacked without the label, would Nydus behave differently?

Yes, nydus behaves very differently. :-/

During Prepare, nydus's chooseProcessor checks for CRILayerDigest (containerd.io/snapshot/cri.layer-digest) to decide whether to set NydusProxyMode and use skipHandler.

Without that label, nydus either errors ("missing CRI reference annotation") or falls through to mountNative, which fails because proxy mode directories don't contain actual content.

@fidencio fidencio force-pushed the topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter branch 6 times, most recently from 85a480d to 931bb37 Compare February 22, 2026 17:57
// We intentionally do NOT short-circuit here when the image is already
// properly unpacked: PullImage may be called with AlwaysPull policy,
// which should always contact the registry to check for updates.
if _, _, err := c.imageForSnapshotter(ctx, ref, snapshotter); errdefs.IsFailedPrecondition(err) {
Copy link
Copy Markdown
Member

@fuweid fuweid Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please revert the change in PullImage?

It makes the function harder to maintain, and it does not actually work in practice.

  1. For non-pause images, even if there is stable content as you mentioned, this approach still cannot guarantee success because it does not have credentials. For missing content, we can address that by setting discard_unpacked_layers = false, or by handling it on the kubelet side. If this change is only for the pause image, I don’t think it is worth doing.

  2. Deleting an image cannot guarantee snapshotter cleanup.

Assume image A is built on top of image B, and both are unpacked by nydus without labels. In that case, you cannot clean up the snapshotter data related to B until image A is also cleaned up.

  1. Based on the discussion here: Unpack images with per-layer labels for specific runtime #12835 (comment)

I’m not a fan of handling image deletion in PullImage. In my opinion, that is not a good design.
If the goal is to handle data migration in this patch, I personally cannot accept this approach.

The nydus snapshotter should return an error during unpacking if the required information is missing.


I think the workflow should be:

Check whether it is already unpacked. If not, unpack it with the label.

opts := []containerd.NewContainerOpts{
containerd.WithSnapshotter(c.RuntimeSnapshotter(r.ctx, ociRuntime)),
// Prepare container rootfs. This is always writeable even if
// the container wants a readonly rootfs since we want to give
// the runtime (runc) a chance to modify (e.g. to create mount
// points corresponding to spec.Mounts) before making the
// rootfs readonly (requested by spec.Root.Readonly).
customopts.WithNewSnapshot(r.containerID, *r.containerdImage, sOpts...),
}

Currently, containerd has the WithNewSnapshot option, which can unpack an image with a specific snapshotter. This patch should use the proper snapshotter based on the runtime config, and include the image reference in the label.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuweid, your suggestion simplifies things by a LOT.

I've addressed the comments, and now we're back to a single functional commit, which can be cleanly backported to the release/2.2 branch.

Thanks a whole lot for taking a look at this, and for the suggestions made to simplify and focus this work on the correct parts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuweid, let me just add a little bit more info here.

While what I'm proposing here is slightly more complicated than your suggestion, your suggestion alone wouldn't work for nydus as:

  • image.Unpack() passes the same labels to every layer — no per-layer CRILayerDigest
  • Nydus's chooseProcessor checks CRILayerDigest to set NydusProxyMode
  • Without CRILayerDigest, nydus errors with "missing CRI reference annotation"

The per-layer manifest walk is the minimum needed to make nydus work. That's what AppendInfoHandlerWrapper already does during Pull, and that's what our unpackImageWithLabels replicates, and I sincerely hope this is an acceptable trade-off.

@github-project-automation github-project-automation Bot moved this from Needs Triage to Needs Update in Pull Request Review Feb 23, 2026
@fidencio fidencio force-pushed the topic/cri-re-pull-images-not-unpacked-for-the-target-snapshotter branch from 931bb37 to b852644 Compare February 23, 2026 08:51
Comment thread internal/cri/opts/container.go Outdated
}
}

// unpackImageWithLabels unpacks an image into the specified snapshotter with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look on this tomorrow.
I think we can leverage the existing unpacker package to do that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit fcb452efe037a16b3a97f5722bbc29ada1ae3139 (HEAD -> test-v1)
Author: Wei Fu <[email protected]>
Date:   Tue Feb 24 12:29:33 2026 -0500

    --wip-- [skip ci]

diff --git a/integration.test b/integration.test
new file mode 100755
index 000000000..a9e752ad1
Binary files /dev/null and b/integration.test differ
diff --git a/integration/runtime_handler_unpack_labels_linux_test.go b/integration/runtime_handler_unpack_labels_linux_test.go
new file mode 100644
index 000000000..bb3a40652
--- /dev/null
+++ b/integration/runtime_handler_unpack_labels_linux_test.go
@@ -0,0 +1,143 @@
+/*
+   Copyright The containerd Authors.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+*/
+
+package integration
+
+import (
+       "context"
+       "fmt"
+       "os"
+       "path/filepath"
+       "syscall"
+       "testing"
+       "time"
+
+       containerd "github.com/containerd/containerd/v2/client"
+       "github.com/containerd/containerd/v2/integration/images"
+       snpkg "github.com/containerd/containerd/v2/pkg/snapshotters"
+       "github.com/containerd/containerd/v2/plugins"
+       "github.com/containerd/errdefs"
+       "github.com/opencontainers/image-spec/identity"
+       "github.com/stretchr/testify/assert"
+       "github.com/stretchr/testify/require"
+       criruntime "k8s.io/cri-api/pkg/apis/runtime/v1"
+)
+
+func TestRuntimeHandlerUnpackWithSnapshotLabels(t *testing.T) {
+       workDir := t.TempDir()
+       cfgPath := filepath.Join(workDir, "config.toml")
+       cfg := `
+version = 3
+
+[plugins.'io.containerd.cri.v1.images']
+  snapshotter = "overlayfs"
+  disable_snapshot_annotations = false
+
+[plugins.'io.containerd.cri.v1.runtime'.containerd]
+  default_runtime_name = "runc"
+
+[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc]
+  runtime_type = "io.containerd.runc.v2"
+  snapshotter = "overlayfs"
+
+[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.erofs]
+  runtime_type = "io.containerd.runc.v2"
+  snapshotter = "erofs"
+`
+       require.NoError(t, os.WriteFile(cfgPath, []byte(cfg), 0o600))
+
+       ctrd := newCtrdProc(t, *containerdBin, workDir, nil)
+       require.NoError(t, ctrd.isReady())
+
+       rSvc := ctrd.criRuntimeService(t)
+       iSvc := ctrd.criImageService(t)
+
+       ctrdClient, err := containerd.New(ctrd.grpcAddress(), containerd.WithDefaultNamespace(k8sNamespace))
+       require.NoError(t, err)
+
+       t.Cleanup(func() {
+               if t.Failed() {
+                       t.Log("Dumping containerd config and logs due to test failure")
+                       dumpFileContent(t, ctrd.configPath())
+                       dumpFileContent(t, ctrd.logPath())
+               }
+               assert.NoError(t, ctrdClient.Close())
+               cleanupPods(t, rSvc)
+               assert.NoError(t, ctrd.kill(syscall.SIGTERM))
+               assert.NoError(t, ctrd.wait(5*time.Minute))
+       })
+
+       ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+       defer cancel()
+       resp, err := ctrdClient.IntrospectionService().Plugins(ctx, fmt.Sprintf("type==%s,id==%s", plugins.SnapshotPlugin, "erofs"))
+       require.NoError(t, err)
+       if len(resp.Plugins) == 0 {
+               t.Skip("erofs snapshotter plugin is not registered")
+       }
+       if initErr := resp.Plugins[0].InitErr; initErr != nil {
+               t.Skipf("erofs snapshotter plugin is not ready: %s", initErr.Message)
+       }
+
+       nginxImage := images.Get(images.Nginx)
+       pullImagesByCRI(t, iSvc, nginxImage)
+
+       img, err := ctrdClient.GetImage(context.Background(), nginxImage)
+       require.NoError(t, err)
+       diffIDs, err := img.RootFS(context.Background())
+       require.NoError(t, err)
+       chainIDs := identity.ChainIDs(diffIDs)
+
+       // First pod uses default runtime handler (overlayfs). No containers created.
+       sb1Cfg := PodSandboxConfig("overlay-pod", "runtime-handler-unpack")
+       _, err = rSvc.RunPodSandbox(sb1Cfg, "")
+       require.NoError(t, err)
+
+       // Image is pulled with overlayfs; nginx snapshots should not exist on erofs yet.
+       erofsSn := ctrdClient.SnapshotService("erofs")
+       for _, chainID := range chainIDs {
+               _, err := erofsSn.Stat(context.Background(), chainID.String())
+               assert.Truef(t, errdefs.IsNotFound(err), "expected no erofs snapshot for chainID %s before erofs container creation, got err=%v", chainID, err)
+       }
+
+       // Second pod uses erofs runtime handler. Creating nginx container should trigger
+       // automatic unpack for erofs with snapshot labels.
+       sb2Cfg := PodSandboxConfig("erofs-pod", "runtime-handler-unpack")
+       sb2ID, err := rSvc.RunPodSandbox(sb2Cfg, "erofs")
+       require.NoError(t, err)
+       cn2Cfg := ContainerConfig("erofs-container", nginxImage, WithCommand("sleep", "1d"))
+       cn2ID, err := rSvc.CreateContainer(sb2ID, cn2Cfg, sb2Cfg)
+       require.NoError(t, err)
+
+       for _, chainID := range chainIDs {
+               snInfo, err := erofsSn.Stat(context.Background(), chainID.String())
+               require.NoErrorf(t, err, "failed to stat erofs snapshot for chainID %s", chainID)
+               require.NotNil(t, snInfo.Labels)
+               assert.NotEmpty(t, snInfo.Labels[snpkg.TargetRefLabel], "missing %s on chainID %s", snpkg.TargetRefLabel, chainID)
+               assert.NotEmpty(t, snInfo.Labels[snpkg.TargetManifestDigestLabel], "missing %s on chainID %s", snpkg.TargetManifestDigestLabel, chainID)
+               assert.NotEmpty(t, snInfo.Labels[snpkg.TargetLayerDigestLabel], "missing %s on chainID %s", snpkg.TargetLayerDigestLabel, chainID)
+               assert.NotEmpty(t, snInfo.Labels[snpkg.TargetImageLayersLabel], "missing %s on chainID %s", snpkg.TargetImageLayersLabel, chainID)
+       }
+
+       // Make sure the second pod really used the erofs runtime handler path.
+       sb2Status, err := rSvc.PodSandboxStatus(sb2ID)
+       require.NoError(t, err)
+       assert.Equal(t, "erofs", sb2Status.RuntimeHandler)
+
+       // Keep container state check explicit. Container should be created.
+       cn2Status, err := rSvc.ContainerStatus(cn2ID)
+       require.NoError(t, err)
+       assert.Equal(t, criruntime.ContainerState_CONTAINER_CREATED, cn2Status.State)
+}
diff --git a/internal/cri/opts/container.go b/internal/cri/opts/container.go
index 3932e5fe6..e1a803b6e 100644
--- a/internal/cri/opts/container.go
+++ b/internal/cri/opts/container.go
@@ -24,19 +24,24 @@ import (
        "strings"

        "github.com/containerd/continuity/fs"
+       "github.com/containerd/platforms"
        imagespec "github.com/opencontainers/image-spec/specs-go/v1"

        containerd "github.com/containerd/containerd/v2/client"
        "github.com/containerd/containerd/v2/core/containers"
+       "github.com/containerd/containerd/v2/core/images"
        "github.com/containerd/containerd/v2/core/mount"
        "github.com/containerd/containerd/v2/core/snapshots"
+       "github.com/containerd/containerd/v2/core/unpack"
+       snpkg "github.com/containerd/containerd/v2/pkg/snapshotters"
        "github.com/containerd/errdefs"
        "github.com/containerd/log"
+       "golang.org/x/sync/semaphore"
 )

 // WithNewSnapshot wraps `containerd.WithNewSnapshot` so that if creating the
 // snapshot fails we make sure the image is actually unpacked and retry.
-func WithNewSnapshot(id string, i containerd.Image, opts ...snapshots.Opt) containerd.NewContainerOpts {
+func WithNewSnapshot(id string, i containerd.Image, appendSnapshotLabels bool, opts ...snapshots.Opt) containerd.NewContainerOpts {
        f := containerd.WithNewSnapshot(id, i, opts...)
        return func(ctx context.Context, client *containerd.Client, c *containers.Container) error {
                if err := f(ctx, client, c); err != nil {
@@ -44,7 +49,7 @@ func WithNewSnapshot(id string, i containerd.Image, opts ...snapshots.Opt) conta
                                return err
                        }

-                       if err := i.Unpack(ctx, c.Snapshotter); err != nil {
+                       if err := unpackImage(ctx, client, i, c.Snapshotter, appendSnapshotLabels); err != nil {
                                return fmt.Errorf("error unpacking image: %w", err)
                        }
                        return f(ctx, client, c)
@@ -53,6 +58,55 @@ func WithNewSnapshot(id string, i containerd.Image, opts ...snapshots.Opt) conta
        }
 }

+func unpackImage(ctx context.Context, client *containerd.Client, i containerd.Image, snapshotter string, appendSnapshotLabels bool) error {
+       ctx, done, err := client.WithLease(ctx)
+       if err != nil {
+               return err
+       }
+       defer done(ctx)
+
+       matcher := platforms.Default()
+
+       capabilities, err := client.GetSnapshotterCapabilities(ctx, snapshotter)
+       if err != nil {
+               return err
+       }
+
+       u, err := unpack.NewUnpacker(
+               ctx,
+               i.ContentStore(),
+               unpack.WithUnpackPlatform(unpack.Platform{
+                       Platform:                matcher,
+                       SnapshotterKey:          snapshotter,
+                       Snapshotter:             client.SnapshotService(snapshotter),
+                       Applier:                 client.DiffService(),
+                       SnapshotterCapabilities: capabilities,
+               }),
+               unpack.WithUnpackLimiter(semaphore.NewWeighted(3)),
+       )
+       if err != nil {
+               return fmt.Errorf("unable to initialize unpacker: %w", err)
+       }
+
+       childrenHandler := images.ChildrenHandler(i.ContentStore())
+       childrenHandler = images.FilterPlatforms(childrenHandler, matcher)
+       childrenHandler = images.LimitManifests(childrenHandler, matcher, 1)
+
+       var h images.Handler = childrenHandler
+       if appendSnapshotLabels {
+               h = snpkg.AppendInfoHandlerWrapper(i.Name())(h)
+       }
+
+       if err := images.Dispatch(ctx, u.Unpack(h), nil, i.Target()); err != nil {
+               _, _ = u.Wait()
+               return err
+       }
+       if _, err := u.Wait(); err != nil {
+               return err
+       }
+       return nil
+}
+
 // WithVolumes copies ownership of volume in rootfs to its corresponding host path.
 // It doesn't update runtime spec.
 // The passed in map is a host path to container path map for all volumes.
diff --git a/internal/cri/server/container_create.go b/internal/cri/server/container_create.go
index 303f60372..7ddb1b393 100644
--- a/internal/cri/server/container_create.go
+++ b/internal/cri/server/container_create.go
@@ -357,7 +357,7 @@ func (c *criService) createContainer(r *createContainerRequest) (_ string, retEr
                // the runtime (runc) a chance to modify (e.g. to create mount
                // points corresponding to spec.Mounts) before making the
                // rootfs readonly (requested by spec.Root.Readonly).
-               customopts.WithNewSnapshot(r.containerID, *r.containerdImage, sOpts...),
+               customopts.WithNewSnapshot(r.containerID, *r.containerdImage, !c.ImageService.Config().DisableSnapshotAnnotations, sOpts...),
        }
        if len(volumeMounts) > 0 {
                mountMap := make(map[string]string)
diff --git a/internal/cri/server/podsandbox/sandbox_run.go b/internal/cri/server/podsandbox/sandbox_run.go
index 9badf42dc..89915ee24 100644
--- a/internal/cri/server/podsandbox/sandbox_run.go
+++ b/internal/cri/server/podsandbox/sandbox_run.go
@@ -194,7 +194,7 @@ func (c *Controller) Start(ctx context.Context, id string) (cin sandbox.Controll

        opts := []containerd.NewContainerOpts{
                containerd.WithSnapshotter(sandboxSnapshotter),
-               customopts.WithNewSnapshot(id, pauseImage, snapshotterOpt...),
+               customopts.WithNewSnapshot(id, pauseImage, !c.imageConfig.DisableSnapshotAnnotations, snapshotterOpt...),
                containerd.WithSpec(spec, specOpts...),
                containerd.WithContainerLabels(sandboxLabels),
                containerd.WithContainerExtension(crilabels.SandboxMetadataExtension, &metadata),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fidencio I try to use unpacker package here to align that behaviour with PullImage.
Feel free to use and you can run the test by GOMAXPROCS=4 FOCUS=TestRuntimeHandlerUnpackWithSnapshotLabels CGROUP_DRIVER=cgroupfs CONTAINERD_BIN=$PWD/bin/containerd make cri-integration

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't polish that change. Just made POC based on the requirement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fuweid, it just works like a charm! Thanks a whole lot for the suggestions all through out the process on this one!

…tters

Remote/proxy snapshotters like nydus need per-layer annotations on each
snapshot (cri.image-ref, cri.layer-digest, cri.manifest-digest,
cri.image-layers) so they can lazily fetch content inside the guest VM.
During a normal PullImage, these annotations are set by
AppendInfoHandlerWrapper and flow through the core/unpack.Unpacker to
each layer's Prepare/Commit call.

However, when an image is already present for one snapshotter (e.g.,
overlayfs) and needs to be used with a different one (e.g., nydus for
Kata), no pull occurs.  The image must be unpacked into the target
snapshotter with the correct per-layer labels.

Replace the image.Unpack() fallback in customopts.WithNewSnapshot with
unpackImage, which leverages the existing core/unpack.Unpacker and wraps
the image handler with AppendInfoHandlerWrapper when snapshot annotations
are enabled (!DisableSnapshotAnnotations).  This reuses the same unpack
machinery as PullImage, including retry handling, parallel layer
support, and deduplication.

Signed-off-by: Fabiano Fidêncio <[email protected]>
Copy link
Copy Markdown
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
thx @fuweid and @fidencio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) cherry-pick/2.2.x Change to be cherry picked to release/2.2 branch impact/changelog kind/bug size/L

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants