[release/1.6] update runc binary to 1.1.15 by k8s-infra-cherrypick-robot · Pull Request #10795 · containerd/containerd

k8s-infra-cherrypick-robot · 2024-10-08T05:17:29Z

This is an automated cherry-pick of #10787

/assign samuelkarp

diff: opencontainers/runc@v1.1.14...v1.1.15 Release notes: - The -ENOSYS seccomp stub is now always generated for the native architecture that runc is running on. This is needed to work around some arguably specification-incompliant behaviour from Docker on architectures such as ppc64le, where the allowed architecture list is set to null. This ensures that we always generate at least one -ENOSYS stub for the native architecture even with these weird configs. (containerd#4391) - On a system with older kernel, reading /proc/self/mountinfo may skip some entries, as a consequence runc may not properly set mount propagation, causing container mounts leak onto the host mount namespace. (containerd#2404, containerd#4425) - In order to fix performance issues in the "lightweight" bindfd protection against [CVE-2019-5736], the temporary ro bind-mount of /proc/self/exe has been removed. runc now creates a binary copy in all cases. (containerd#4392, containerd#2532) Signed-off-by: Samuel Karp <[email protected]>

k8s-ci-robot · 2024-10-08T05:17:39Z

Hi @k8s-infra-cherrypick-robot. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

samuelkarp · 2024-10-08T07:28:41Z

/ok-to-test

akhilerm · 2024-10-08T09:37:38Z

/test pull-containerd-node-e2e-1-6

akhilerm · 2024-10-08T10:59:06Z

Consistently seeing this error https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/containerd_containerd/10795/pull-containerd-node-e2e-1-6/1843586548746948608#1:build-log.txt%3A9376 in the node-e2e tests.

akhilerm · 2024-10-08T10:59:15Z

/test pull-containerd-node-e2e-1-6

dmcgowan · 2024-10-08T13:08:04Z

/retest

k8s-ci-robot · 2024-10-08T13:46:53Z

@k8s-infra-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-containerd-node-e2e-1-6	`4b70b3e`	link	true	`/test pull-containerd-node-e2e-1-6`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

akhilerm · 2024-10-08T16:04:31Z

The node-e2e failure does not seem to be flaky, while reverting to runc 1.1.14, the node-e2e presubmit is passing.

samuelkarp · 2024-10-08T16:17:48Z

runc 1.1.15 contains opencontainers/runc#4392; I'm wondering if we're seeing failures in e2e because of that. There were notes in opencontainers/runc#3973 related to OOMs and 1.2.0-rcs use opencontainers/runc#3987 to mitigate this but that change is not present in 1.1.15.

akhilerm · 2024-10-08T16:21:28Z

Also curious why we are seeing this only in 1.6 branch and not on the 1.7 releases.

samuelkarp · 2024-10-08T16:29:08Z

I can't tell from the log whether cgroup v1 or v2 is in use.

  I1008 13:36:56.448303 1238 dump.go:53] At 2024-10-08 13:36:53 +0000 UTC - event for initcontinar-oomkill-target-pod: {kubelet tmp-node-e2e-75e0da1b-cos-beta-117-18613-0-76} Failed: Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown

  [FAILED] pod: "initcontinar-oomkill-target-pod", container: "oomkill-target-init-container" has unexpected exitCode: '\u0080'
  Expected
      <int32>: 128
  to equal
      <int32>: 137

samuelkarp · 2024-10-08T16:39:44Z

It looks like the prow jobs for 1.7 and main are setting CONTAINERD_SYSTEMD_CGROUP: 'true' (which ends up influencing config.toml to set:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  BinaryName = "${CONTAINERD_HOME}/usr/local/sbin/runc"
  SystemdCgroup = ${systemdCgroup}

But the job for 1.6 does not set the same value.

akhilerm · 2024-10-08T16:39:59Z

This is where the cgroup settings get applied to the cos machine. https://github.com/containerd/containerd/blob/release/1.6/test/e2e_node/gci-init.sh#L27 via this env file

samuelkarp · 2024-10-08T16:46:56Z

I don't see CONTAINERD_CGROUPV2 defined anywhere, so that looks like it defaults to false. The same script in our release/1.7 and main branches uses CONTAINERD_COS_CGROUP_MODE but leaves the VM in its default configuration if it is unset. I believe COS M117 defaults to cgroup v2 (checking that now...).

akhilerm · 2024-10-08T16:48:34Z

I can't tell from the log whether cgroup v1 or v2 is in use.

  I1008 13:36:56.448303 1238 dump.go:53] At 2024-10-08 13:36:53 +0000 UTC - event for initcontinar-oomkill-target-pod: {kubelet tmp-node-e2e-75e0da1b-cos-beta-117-18613-0-76} Failed: Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown

  [FAILED] pod: "initcontinar-oomkill-target-pod", container: "oomkill-target-init-container" has unexpected exitCode: '\u0080'
  Expected
      <int32>: 128
  to equal
      <int32>: 137

from the kubelet log, cgroupfs is being used

samuelkarp · 2024-10-08T16:54:23Z

Confirmed COS M117 defaults to cgroup v2.

from the kubelet log, cgroupfs is being used

Yep, this makes sense because of the absence of CONTAINERD_SYSTEMD_CGROUP: 'true'. But the driver (systemd vs. cgroupfs) is independent of cgroup v1 vs. cgroup v2.

Kubelet log also says "CgroupVersion":2, so I'm not sure. I'll open another test PR to try and debug some of these settings.

samuelkarp · 2024-10-10T23:20:17Z

/test pull-containerd-node-e2e-1-6-systemd-cgroup

estesp · 2024-10-16T13:51:08Z

reverted in main; closing and will update when the runc issue is solved

k8s-infra-cherrypick-robot mentioned this pull request Oct 8, 2024

update runc binary to 1.1.15 #10787

Merged

k8s-ci-robot assigned samuelkarp Oct 8, 2024

k8s-ci-robot added size/XS needs-ok-to-test labels Oct 8, 2024

dosubot Bot added the area/runtime Runtime label Oct 8, 2024

akhilerm approved these changes Oct 8, 2024

View reviewed changes

samuelkarp approved these changes Oct 8, 2024

View reviewed changes

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Oct 8, 2024

dmcgowan added the impact/changelog label Oct 8, 2024

akhilerm mentioned this pull request Oct 8, 2024

[DNM] Test pull-containerd-node-e2e-1.6 #10797

Closed

samuelkarp mentioned this pull request Oct 8, 2024

runc 1.1.15 OOMs in Kubernetes e2e tests with containerd, cgroup v2, and cgroupfs driver opencontainers/runc#4427

Closed

austinvazquez mentioned this pull request Oct 9, 2024

Dockerfile: update runc binary to 1.1.15 moby/buildkit#5417

Closed

estesp closed this Oct 16, 2024

Conversation

k8s-infra-cherrypick-robot commented Oct 8, 2024

Uh oh!

k8s-ci-robot commented Oct 8, 2024

Uh oh!

samuelkarp commented Oct 8, 2024

Uh oh!

akhilerm commented Oct 8, 2024

Uh oh!

akhilerm commented Oct 8, 2024

Uh oh!

akhilerm commented Oct 8, 2024

Uh oh!

dmcgowan commented Oct 8, 2024

Uh oh!

k8s-ci-robot commented Oct 8, 2024

Uh oh!

akhilerm commented Oct 8, 2024

Uh oh!

samuelkarp commented Oct 8, 2024

Uh oh!

akhilerm commented Oct 8, 2024

Uh oh!

samuelkarp commented Oct 8, 2024

Uh oh!

samuelkarp commented Oct 8, 2024

Uh oh!

akhilerm commented Oct 8, 2024

Uh oh!

samuelkarp commented Oct 8, 2024

Uh oh!

akhilerm commented Oct 8, 2024

Uh oh!

samuelkarp commented Oct 8, 2024

Uh oh!

samuelkarp commented Oct 10, 2024

Uh oh!

estesp commented Oct 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants