kubelet/cm: refactor, prepare for runc 1.1 bump#108597
kubelet/cm: refactor, prepare for runc 1.1 bump#108597k8s-ci-robot merged 5 commits intokubernetes:masterfrom
Conversation
There's no need to call m.Update (which will create another instance of libcontainer cgroup manager, convert all the resources and then set them). All this is already done here, except for Set(). Signed-off-by: Kir Kolyshkin <[email protected]>
Commit ecd6361 added setting PidsLimit to Create and Update. Commit bce9d5f added setting PidsLimit to m.toResources. Now, PidsLimit is assigned twice. Remove the duplicate. Fixes: bce9d5f Signed-off-by: Kir Kolyshkin <[email protected]>
Commit 79be8be made hugetlb settings optional if cgroup v2 is used and hugetlb is not available, fixing issue 92933. Note at that time this was only needed for v2, because for v1 the resources were set one-by-one, and only for supported resources. Commit d312ef7 switched the code to using Set from runc/libcontainer cgroups manager, and expanded the check to cgroup v1 as well. Move this check earlier, to inside m.toResources, so instead of converting all hugetlb resources from ResourceConfig to libcontainers's Resources.HugetlbLimit, and then setting it to nil, we can skip the conversion entirely if hugetlb is not supported, thus not doing the work that is not needed. Signed-off-by: Kir Kolyshkin <[email protected]>
Instead of doing (almost) the same thing from the three different methods (Create, Update, Destroy), move the functionality to libctCgroupConfig, replacing updateSystemdCgroupInfo. The needResources bool is needed because we do not need resources during Destroy, so we skip the unneeded resource conversion. Signed-off-by: Kir Kolyshkin <[email protected]>
Move the rl == nil check to before we dereference it. Signed-off-by: Kir Kolyshkin <[email protected]>
|
/priority important-soon |
|
/assign @dchen1107 |
odinuge
left a comment
There was a problem hiding this comment.
Thanks for working on this @kolyshkin!
Looks like all my initial comments from the other PR is resolved here, so it looks good from my pov. I am a big fan of all the performance improvements here, as well as the cleanups.
Code looks good, but will prefer if someone like @ehashman can also take a quick look to validate this. Thanks
/lgtm
To be sure;
/test pull-kubernetes-node-kubelet-serial-containerd
/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager
/test pull-kubernetes-node-memoryqos-cgrpv2
/test pull-kubernetes-node-swap-fedora
/test pull-kubernetes-node-swap-fedora-serial
/test pull-kubernetes-node-swap-ubuntu-serial
|
Looking at tests:
I know a lot of the presubmits aren't in great shape as they weren't a priority after the dockershim migration since they're only used occasionally. Trying to launch this many node tests simultaneously might have hit capacity issues. Let's see if we can focus on just a few critical ones for this PR. /skip |
|
Given that https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/108597/pull-kubernetes-node-kubelet-serial-containerd/1505844761926307840/ passed I'm not super concerned by most of these failures, I think the jobs are just a bit neglected and we tried to run jobs we wouldn't normally. I think the Ubuntu bug is a true failure. According to triage, we've only seen it on the runc 1.1 bump PRs. Is it deterministic? /test pull-kubernetes-node-swap-ubuntu-serial Failure cluster 5c6173278f01b3767cdfError text:Recent failures:3/21/2022, 2:52:29 AM pr:pull-kubernetes-node-swap-ubuntu-serial |
|
I'm a little confused on the Maybe this is something specific to setup of ubuntu swap job? It's weird to see the test only fail on swap job, I would expect it to either fail on both or neither :) |
|
The swap job is failing because it does not set The serial job in contrast correctly sets |
|
/test pull-kubernetes-node-swap-ubuntu-serial |
|
/lgtm LGTM on the code changes, thanks for all the refactoring, it's much cleaner now with less duplication of cgroup configuration. The swap job is now passing as well after re-running with kubernetes/test-infra#25726 merged |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kolyshkin, mrunalp, odinuge The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ehashman
left a comment
There was a problem hiding this comment.
/lgtm
thanks for fixing the job!
|
Some weird errors, looks like due to go 1.18 being used by default. Let me try to rebase this. |
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
1 similar comment
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
|
@kolyshkin: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This does some refactoring, simplification, and occasional fixing pkg/kubelet/cm,
preparing the code for runc 1.1 bump. Originally part of #107149.
Special notes for your reviewer:
Please review commit by commit. See individual commit messages for details.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
none