[CRI] Remove image store by mxpv · Pull Request #7061 · containerd/containerd

mxpv · 2022-06-14T23:17:51Z

This PR refactors CRI and removes in-memory image store in favor of containerd's metadata image store. The goal is to simplify CRI code and rely more on containerd APIs instead of maintaining custom layers.

So instead of in-memory cache, this PR relies on containerd’s metadata store (and labels) to keep additional image information needed by CRI. It preserves existing logic with creating a separate image per reference, but now appends appropriate labels, so we can just use boltdb.

Labels can be appended on demand, on event, or at daemon start - some of these may be removed in 2.0. Currently labels that we add - config digest (that CRI uses for image ID), image size, chain ID, etc.

In the new implementation search for image references narrows down to querying metadata store with all records that contain same image ID label. Queries are slower comparing to the original implementation, but boltdb still reasonably fast (also there is room for optimization if that will be a bottleneck).

mxpv · 2022-06-15T19:59:03Z

I've no idea hot to fix Fuzzing stuff :(
This was introduced in #7052
@AdamKorcz any ideas?

mikebrow

i like this a lot...

was thinking the other day about moving some of the image / container spec fields related to container status over to container status... basically decouple (non-verbose) container status from the image store.. #6517 (comment) ... WDYT

mxpv · 2022-06-16T04:17:55Z

@mikebrow this PR replaces entire image store with a set of labels assigned to Image (so it's already decoupled). Moving some labels elsewhere would be super straightforward.

Signed-off-by: Maksym Pavlenko <[email protected]>

mxpv · 2022-06-16T04:25:04Z

Rebased against main to apply fix for Fuzz CI

mikebrow · 2022-06-18T13:32:47Z

What sort of performance numbers are we looking at here (moving from the in memory list).. Kubelet has a 30sec loop requesting the full list of images in its namespace.

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/images/image_gc_manager.go#L195-L203

mxpv · 2022-06-20T03:18:46Z

@mikebrow I wrote a quick benchmark to get order of numbers.

30,000 images, 3 references per image (so 90.000 metadata image records in boltdb),
takes ~2 seconds on MBP 2019.

https://gist.github.com/mxpv/8811a208ebee32af5e0fe8155a751f10

dmcgowan · 2022-06-20T22:57:57Z

@mxpv Do you think the image ID labels will be a long terms (post 2.0) solution? The existing codes makes some assumptions about single platform that this would mostly inherit. We could come up with platform specific labels as well to handle that, but not sure we have fully thought about the multi-platform use case for CRI images.

mxpv · 2022-06-20T23:13:24Z

@dmcgowan I haven't put much though into it, we can bring this on the next community call. I'd like to learn more about potential use cases of this. Labels are not very convenient to work with, so may be for 2.0 and post we can rethink image store structure a bit and have image ID a first class citizen. So we can, for instance, keep 1 metadata record for image and all references to it, so there will be no need to gather references from store, and have image ID guaranteed to be unique and platform independent.

fuweid · 2022-06-22T17:02:52Z

I have concerned about the cpu and memory burst issue.

At 2018, we used containerd's ListImages remote API in pouchcontainer. And we found that the containerd's cpu is getting high periodically. I remembered that the node has around 30 images. The profile showed us that the containerd uses more cpu to read json from disk and unmarshal it. We have to use cache in pouchcontainer...

I uses local plugin to make benchmark for it. The patch is here.

## current design
➜  integration git:(benchmark-cri-image-list) sudo -E $(command -v go) test -count=1 -bench=. -benchmem -v -run BenchmarkCRIList -memprofile=/tmp/mem.profile -cpuprofile=/tmp/cpu.profile
INFO[0000] Using the following image list: {Alpine:docker.io/library/alpine:latest BusyBox:docker.io/library/busybox:latest Pause:k8s.gcr.io/pause:3.7 ResourceConsumer:k8s.gcr.io/e2e-test-images/resource-consumer:1.10 VolumeCopyUp:ghcr.io/containerd/volume-copy-up:2.1 VolumeOwnership:ghcr.io/containerd/volume-ownership:2.1} 
goos: linux
goarch: amd64
pkg: github.com/containerd/containerd/integration
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkCRIList13Images
BenchmarkCRIList13Images-8         13789             87710 ns/op           28010 B/op        317 allocs/op
BenchmarkCRIList26Images
BenchmarkCRIList26Images-8          5893            178976 ns/op           54217 B/op        609 allocs/op
PASS
ok      github.com/containerd/containerd/integration    3.559s

### this patch
➜  integration git:(testing-7061) sudo -E $(command -v go) test -count=1 -bench=. -benchmem -v -run BenchmarkCRIList -memprofile=/tmp/mem.profile -cpuprofile=/tmp/cpu.profile
INFO[0000] Using the following image list: {Alpine:docker.io/library/alpine:latest BusyBox:docker.io/library/busybox:latest Pause:k8s.gcr.io/pause:3.7 ResourceConsumer:k8s.gcr.io/e2e-test-images/resource-consumer:1.10 VolumeCopyUp:ghcr.io/containerd/volume-copy-up:2.1 VolumeOwnership:ghcr.io/containerd/volume-ownership:2.1} 
goos: linux
goarch: amd64
pkg: github.com/containerd/containerd/integration
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkCRIList13Images
BenchmarkCRIList13Images-8          1012           1114956 ns/op          417320 B/op       4332 allocs/op
BenchmarkCRIList26Images
BenchmarkCRIList26Images-8           567           2093443 ns/op          788533 B/op       8445 allocs/op
PASS
ok      github.com/containerd/containerd/integration    3.061s

If we walk the disk to get the json, the cpu profile will be like:

Kubelet uses 30s to get image list which is not too bad. But the cache layer can make containerd working in low overhead.
Just my two cents.

Instead embed label with user information in metadata store. Signed-off-by: Maksym Pavlenko <[email protected]>

mxpv · 2022-06-22T17:31:51Z

Pushed 1eb20bf to omit reading from disk / unmarhsalling. We needed that just for User field, so its easy to workaround it with one more label instead.

mxpv · 2022-07-05T20:02:22Z

ping @mikebrow @fuweid @dmcgowan is there anything else I can do here to push this forward?

mikebrow · 2022-07-05T20:53:11Z

copying the windows/linux resource fields over here: https://github.com/containerd/containerd/pull/6517/files#diff-c2be5fd4b479b8b7191b7800ba25be98107c143e0a13575917220a4bfebe132aR101-R104 to eliminate the spec cost for the status checks...
We are talking about the possibility of adding image events to the PLEG work, which would allow us to change the once every 30s sync cycle check for the image list to something much greater.. The only reason to keep the image sync cycle at all will be to ensure no events were lost.

mxpv · 2022-07-05T21:38:11Z

copying the windows/linux resource fields over here: https://github.com/containerd/containerd/pull/6517/files#diff-c2be5fd4b479b8b7191b7800ba25be98107c143e0a13575917220a4bfebe132aR101-R104 to eliminate the spec cost for the status checks...

I've removed image spec reading in 1eb20bf, this should reduce CPU pressure. I'm a bit hesitant to embed big chunks of the runtime spec, as it's easy to hit label size limit, it's much easier to add specific fields depending on requirements.

We are talking about the possibility of adding image events to the PLEG work, which would allow us to change the once every 30s sync cycle check for the image list to something much greater.. The only reason to keep the image sync cycle at all will be to ensure no events were lost.

Do we want to keep this until we get PLEG events work in? I have further CRI work which is blocked by this PR :(

fuweid · 2022-07-06T15:00:59Z

@mxpv Sorry for the late reply. I will take a look in two days. Will update later~

mikebrow · 2022-07-06T15:26:51Z

Do we want to keep this until we get PLEG events work in? I have further CRI work which is blocked by this PR :(

TLDR can we move the cache down or just rewrite it in the form that you need it to be in for that which is blocked?

The PLEG work for image events isn't in plan for v1.25 kubernetes, just the container state events, and with a possible stretch for pod events. The KEP actually states: "Addressing container image relisting via CRI events is out of scope for this enhancement at this point in time."

The KEP feature change window for v1.25 closed a couple weeks back. We could do image events for v1.26. But that won't fix it (the need for caching) right away due to feature gating and service streams.

The PLEG feature(s) would first need to go from alpha to beta. K8s features start with feature gates that are off by default in alpha, and are then enabled by default in a subsequent release, then the feature gate goes away when the feature goes GA in a further subsequent release. Having big performance problems with the gate off would be seen as a regression.

Then because of the decoupled nature of containerd/kubelet older versions of kubernetes will be using newer versions of containerd, thus we would need the support stream for k8s that does not have the image PLEG events GA version to EOL (they use an n-2 support strategy) and then we can deprecate our cache on the future version of containerd that is only supported on versions of k8s that don't need the 30s image list cache.

And non-kubelet users of CRI would also need to adopt the events model or accept the performance change.

I think we need a cache of some sort for CRI image listing at least until this time next year possibly 18-24months.

mikebrow · 2022-07-06T15:31:33Z

side note: This is another example where we need switches for optimizing for memory vs cpu vs disk ... resources.

samuelkarp · 2022-10-26T01:41:54Z

@mxpv This needs a rebase.

mxpv · 2023-01-12T03:46:04Z

This needs to be reworked. I'll get back to this PR once PLEG is adopted.

mxpv force-pushed the store branch 2 times, most recently from 108c7c8 to d0ec73a Compare June 15, 2022 03:48

AkihiroSuda added area/cri Container Runtime Interface (CRI) kind/refactor labels Jun 15, 2022

AkihiroSuda reviewed Jun 15, 2022

View reviewed changes

Comment thread filters/filter.go Outdated

mxpv force-pushed the store branch from d0ec73a to a547109 Compare June 15, 2022 05:11

containerd deleted a comment from k8s-ci-robot Jun 15, 2022

mxpv force-pushed the store branch from a547109 to 3d60955 Compare June 15, 2022 16:42

containerd deleted a comment from k8s-ci-robot Jun 15, 2022

mxpv force-pushed the store branch 4 times, most recently from ba81c3b to 0095941 Compare June 15, 2022 19:06

mikebrow reviewed Jun 15, 2022

View reviewed changes

mikebrow mentioned this pull request Jun 15, 2022

ContainerStatus to return container resources #6517

Merged

mxpv added 11 commits June 15, 2022 21:19

Append image metadata labels

06612b0

Signed-off-by: Maksym Pavlenko <[email protected]>

Implement "contains" filter

5029857

Signed-off-by: Maksym Pavlenko <[email protected]>

Resolve images from metadata store

8eaebad

Signed-off-by: Maksym Pavlenko <[email protected]>

Retrieve image spec from client

1dddcd6

Signed-off-by: Maksym Pavlenko <[email protected]>

Query image status from metadata store

7f96221

Signed-off-by: Maksym Pavlenko <[email protected]>

Remove image store from CRI service

ae79ee0

Signed-off-by: Maksym Pavlenko <[email protected]>

Remove image store package

d1a985b

Signed-off-by: Maksym Pavlenko <[email protected]>

Update vendor

a4fabf3

Signed-off-by: Maksym Pavlenko <[email protected]>

Assign managed label to CRI images

bca893c

Signed-off-by: Maksym Pavlenko <[email protected]>

Use regex filter instead of contains

2a9bdb2

Signed-off-by: Maksym Pavlenko <[email protected]>

Fix CRI label assignment

b53b783

Signed-off-by: Maksym Pavlenko <[email protected]>

mxpv force-pushed the store branch from 0095941 to b53b783 Compare June 16, 2022 04:19

Don't read image spec when listing images

1eb20bf

Instead embed label with user information in metadata store. Signed-off-by: Maksym Pavlenko <[email protected]>

fuweid self-assigned this Jul 29, 2022

dmcgowan added this to the 1.7 milestone Dec 22, 2022

mxpv closed this Jan 12, 2023

mxpv deleted the store branch February 2, 2023 20:01

mxpv mentioned this pull request Feb 4, 2026

Remove image service dependency from podsandbox controller #12849

Merged

Conversation

mxpv commented Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mxpv commented Jun 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebrow left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mxpv commented Jun 16, 2022

Uh oh!

mxpv commented Jun 16, 2022

Uh oh!

mikebrow commented Jun 18, 2022

Uh oh!

mxpv commented Jun 20, 2022

Uh oh!

dmcgowan commented Jun 20, 2022

Uh oh!

mxpv commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fuweid commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxpv commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxpv commented Jul 5, 2022

Uh oh!

mikebrow commented Jul 5, 2022

Uh oh!

mxpv commented Jul 5, 2022

Uh oh!

fuweid commented Jul 6, 2022

Uh oh!

mikebrow commented Jul 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebrow commented Jul 6, 2022

Uh oh!

samuelkarp commented Oct 26, 2022

Uh oh!

mxpv commented Jan 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mxpv commented Jun 14, 2022 •

edited

Loading

mxpv commented Jun 15, 2022 •

edited

Loading

mikebrow left a comment •

edited

Loading

mxpv commented Jun 20, 2022 •

edited

Loading

fuweid commented Jun 22, 2022 •

edited

Loading

mxpv commented Jun 22, 2022 •

edited

Loading

mikebrow commented Jul 6, 2022 •

edited

Loading