-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Description
The ListContainerStats and ListPodSandboxStats as implemented in containerd CRI currently fetch all of the stats when the RPC is called and does not do any caching.
This can be quite expensive, as for example for ListContainerStats, a metrics request
| request, containers, err := c.buildTaskMetricsRequest(in) |
For ListPodSandboxStats, it is similar, we need to scrape the sandbox stats:
| metrics, err := metricsForSandbox(sandbox) |
containerd/pkg/cri/server/sandbox_stats_linux.go
Lines 113 to 114 in 1e6523f
| listContainerStatsRequest := &runtime.ListContainerStatsRequest{Filter: &runtime.ContainerStatsFilter{PodSandboxId: meta.ID}} | |
| resp, err := c.ListContainerStats(ctx, listContainerStatsRequest) |
ListPodSandboxStats, we need to get metrics for all sandboxes and for each sandbox send TTRPC request to the shim to get container stats.
The recommendation to solve issues would be to fetch the stats for sandboxes and containers in the background periodically and cache them. Then when the RPC to get data about them comes in, we can serve the data from local memory cache. This is similar to what cAdvisor does already today. Additionally, if we collect stats in the background, we should should probably avoid collecting all of the stats for all containers at once, and instead add some jitter and collect them on a period interval, e.g. see https://github.com/google/cadvisor/blob/86b11c65eae6682a4c0d1b0ffaaa091aec701e56/manager/container.go#L482-L506
Since these RPCs will become more used as part of kubernetes/enhancements#2371 it's important these RPCs will be fast and low overhead. Also see cri-o which is performing this collection in the background already (https://github.com/cri-o/cri-o/blob/main/internal/lib/stats/stats_server.go)
Steps to reproduce the issue
n/a
Describe the results you received and expected
n/a
What version of containerd are you using?
n/a
Any other relevant information
No response
Show configuration if it is related to CRI plugin.
n/a