[CRI] Remove image store#7061
Conversation
108c7c8 to
d0ec73a
Compare
ba81c3b to
0095941
Compare
|
I've no idea hot to fix Fuzzing stuff :( |
There was a problem hiding this comment.
i like this a lot...
was thinking the other day about moving some of the image / container spec fields related to container status over to container status... basically decouple (non-verbose) container status from the image store.. #6517 (comment) ... WDYT
|
@mikebrow this PR replaces entire image store with a set of labels assigned to |
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
Signed-off-by: Maksym Pavlenko <[email protected]>
|
Rebased against |
|
What sort of performance numbers are we looking at here (moving from the in memory list).. Kubelet has a 30sec loop requesting the full list of images in its namespace. |
|
@mikebrow I wrote a quick benchmark to get order of numbers. 30,000 images, 3 references per image (so 90.000 metadata image records in boltdb), |
|
@mxpv Do you think the image ID labels will be a long terms (post 2.0) solution? The existing codes makes some assumptions about single platform that this would mostly inherit. We could come up with platform specific labels as well to handle that, but not sure we have fully thought about the multi-platform use case for CRI images. |
|
@dmcgowan I haven't put much though into it, we can bring this on the next community call. I'd like to learn more about potential use cases of this. Labels are not very convenient to work with, so may be for 2.0 and post we can rethink image store structure a bit and have image ID a first class citizen. So we can, for instance, keep 1 metadata record for image and all references to it, so there will be no need to gather references from store, and have image ID guaranteed to be unique and platform independent. |
|
I have concerned about the cpu and memory burst issue. At 2018, we used containerd's ListImages remote API in pouchcontainer. And we found that the containerd's cpu is getting high periodically. I remembered that the node has around 30 images. The profile showed us that the containerd uses more cpu to read json from disk and unmarshal it. We have to use cache in pouchcontainer... I uses local plugin to make benchmark for it. The patch is here. If we walk the disk to get the json, the cpu profile will be like: Kubelet uses 30s to get image list which is not too bad. But the cache layer can make containerd working in low overhead. |
Instead embed label with user information in metadata store. Signed-off-by: Maksym Pavlenko <[email protected]>
|
Pushed 1eb20bf to omit reading from disk / unmarhsalling. We needed that just for |
|
I've removed image spec reading in 1eb20bf, this should reduce CPU pressure. I'm a bit hesitant to embed big chunks of the runtime spec, as it's easy to hit label size limit, it's much easier to add specific fields depending on requirements.
Do we want to keep this until we get PLEG events work in? I have further CRI work which is blocked by this PR :( |
|
@mxpv Sorry for the late reply. I will take a look in two days. Will update later~ |
TLDR can we move the cache down or just rewrite it in the form that you need it to be in for that which is blocked? The PLEG work for image events isn't in plan for v1.25 kubernetes, just the container state events, and with a possible stretch for pod events. The KEP actually states: "Addressing container image relisting via CRI events is out of scope for this enhancement at this point in time." The KEP feature change window for v1.25 closed a couple weeks back. We could do image events for v1.26. But that won't fix it (the need for caching) right away due to feature gating and service streams. The PLEG feature(s) would first need to go from alpha to beta. K8s features start with feature gates that are off by default in alpha, and are then enabled by default in a subsequent release, then the feature gate goes away when the feature goes GA in a further subsequent release. Having big performance problems with the gate off would be seen as a regression. Then because of the decoupled nature of containerd/kubelet older versions of kubernetes will be using newer versions of containerd, thus we would need the support stream for k8s that does not have the image PLEG events GA version to EOL (they use an n-2 support strategy) and then we can deprecate our cache on the future version of containerd that is only supported on versions of k8s that don't need the 30s image list cache. And non-kubelet users of CRI would also need to adopt the events model or accept the performance change. I think we need a cache of some sort for CRI image listing at least until this time next year possibly 18-24months. |
|
side note: This is another example where we need switches for optimizing for memory vs cpu vs disk ... resources. |
|
@mxpv This needs a rebase. |
|
This needs to be reworked. I'll get back to this PR once PLEG is adopted. |

This PR refactors CRI and removes in-memory image store in favor of containerd's metadata image store. The goal is to simplify CRI code and rely more on containerd APIs instead of maintaining custom layers.
So instead of in-memory cache, this PR relies on containerd’s metadata store (and labels) to keep additional image information needed by CRI. It preserves existing logic with creating a separate image per reference, but now appends appropriate labels, so we can just use boltdb.
Labels can be appended on demand, on event, or at daemon start - some of these may be removed in 2.0. Currently labels that we add - config digest (that CRI uses for image ID), image size, chain ID, etc.
In the new implementation search for image references narrows down to querying metadata store with all records that contain same image ID label. Queries are slower comparing to the original implementation, but boltdb still reasonably fast (also there is room for optimization if that will be a bottleneck).