Skip to content

[release/1.6] update runc binary to 1.1.15#10795

Closed
k8s-infra-cherrypick-robot wants to merge 1 commit intocontainerd:release/1.6from
k8s-infra-cherrypick-robot:cherry-pick-10787-to-release/1.6
Closed

[release/1.6] update runc binary to 1.1.15#10795
k8s-infra-cherrypick-robot wants to merge 1 commit intocontainerd:release/1.6from
k8s-infra-cherrypick-robot:cherry-pick-10787-to-release/1.6

Conversation

@k8s-infra-cherrypick-robot
Copy link
Copy Markdown

This is an automated cherry-pick of #10787

/assign samuelkarp

diff: opencontainers/runc@v1.1.14...v1.1.15

Release notes:

- The -ENOSYS seccomp stub is now always generated for the native
  architecture that runc is running on. This is needed to work around some
  arguably specification-incompliant behaviour from Docker on architectures
  such as ppc64le, where the allowed architecture list is set to null. This
  ensures that we always generate at least one -ENOSYS stub for the native
  architecture even with these weird configs. (containerd#4391)
- On a system with older kernel, reading /proc/self/mountinfo may skip some
  entries, as a consequence runc may not properly set mount propagation,
  causing container mounts leak onto the host mount namespace. (containerd#2404, containerd#4425)
- In order to fix performance issues in the "lightweight" bindfd protection
  against [CVE-2019-5736], the temporary ro bind-mount of /proc/self/exe
  has been removed. runc now creates a binary copy in all cases. (containerd#4392, containerd#2532)

Signed-off-by: Samuel Karp <[email protected]>
@k8s-ci-robot
Copy link
Copy Markdown

Hi @k8s-infra-cherrypick-robot. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dosubot dosubot Bot added the area/runtime Runtime label Oct 8, 2024
@samuelkarp
Copy link
Copy Markdown
Member

/ok-to-test

@akhilerm
Copy link
Copy Markdown
Member

akhilerm commented Oct 8, 2024

/test pull-containerd-node-e2e-1-6

@akhilerm
Copy link
Copy Markdown
Member

akhilerm commented Oct 8, 2024

@akhilerm
Copy link
Copy Markdown
Member

akhilerm commented Oct 8, 2024

/test pull-containerd-node-e2e-1-6

@dmcgowan
Copy link
Copy Markdown
Member

dmcgowan commented Oct 8, 2024

/retest

@k8s-ci-robot
Copy link
Copy Markdown

@k8s-infra-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-containerd-node-e2e-1-6 4b70b3e link true /test pull-containerd-node-e2e-1-6

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@akhilerm
Copy link
Copy Markdown
Member

akhilerm commented Oct 8, 2024

The node-e2e failure does not seem to be flaky, while reverting to runc 1.1.14, the node-e2e presubmit is passing.

@samuelkarp
Copy link
Copy Markdown
Member

runc 1.1.15 contains opencontainers/runc#4392; I'm wondering if we're seeing failures in e2e because of that. There were notes in opencontainers/runc#3973 related to OOMs and 1.2.0-rcs use opencontainers/runc#3987 to mitigate this but that change is not present in 1.1.15.

@akhilerm
Copy link
Copy Markdown
Member

akhilerm commented Oct 8, 2024

Also curious why we are seeing this only in 1.6 branch and not on the 1.7 releases.

@samuelkarp
Copy link
Copy Markdown
Member

I can't tell from the log whether cgroup v1 or v2 is in use.

  I1008 13:36:56.448303 1238 dump.go:53] At 2024-10-08 13:36:53 +0000 UTC - event for initcontinar-oomkill-target-pod: {kubelet tmp-node-e2e-75e0da1b-cos-beta-117-18613-0-76} Failed: Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown
  [FAILED] pod: "initcontinar-oomkill-target-pod", container: "oomkill-target-init-container" has unexpected exitCode: '\u0080'
  Expected
      <int32>: 128
  to equal
      <int32>: 137

@samuelkarp
Copy link
Copy Markdown
Member

It looks like the prow jobs for 1.7 and main are setting CONTAINERD_SYSTEMD_CGROUP: 'true' (which ends up influencing config.toml to set:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  BinaryName = "${CONTAINERD_HOME}/usr/local/sbin/runc"
  SystemdCgroup = ${systemdCgroup}

But the job for 1.6 does not set the same value.

@akhilerm
Copy link
Copy Markdown
Member

akhilerm commented Oct 8, 2024

@samuelkarp
Copy link
Copy Markdown
Member

I don't see CONTAINERD_CGROUPV2 defined anywhere, so that looks like it defaults to false. The same script in our release/1.7 and main branches uses CONTAINERD_COS_CGROUP_MODE but leaves the VM in its default configuration if it is unset. I believe COS M117 defaults to cgroup v2 (checking that now...).

@akhilerm
Copy link
Copy Markdown
Member

akhilerm commented Oct 8, 2024

I can't tell from the log whether cgroup v1 or v2 is in use.

  I1008 13:36:56.448303 1238 dump.go:53] At 2024-10-08 13:36:53 +0000 UTC - event for initcontinar-oomkill-target-pod: {kubelet tmp-node-e2e-75e0da1b-cos-beta-117-18613-0-76} Failed: Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown
  [FAILED] pod: "initcontinar-oomkill-target-pod", container: "oomkill-target-init-container" has unexpected exitCode: '\u0080'
  Expected
      <int32>: 128
  to equal
      <int32>: 137

from the kubelet log, cgroupfs is being used

@samuelkarp
Copy link
Copy Markdown
Member

Confirmed COS M117 defaults to cgroup v2.

from the kubelet log, cgroupfs is being used

Yep, this makes sense because of the absence of CONTAINERD_SYSTEMD_CGROUP: 'true'. But the driver (systemd vs. cgroupfs) is independent of cgroup v1 vs. cgroup v2.

Kubelet log also says "CgroupVersion":2, so I'm not sure. I'll open another test PR to try and debug some of these settings.

@samuelkarp
Copy link
Copy Markdown
Member

/test pull-containerd-node-e2e-1-6-systemd-cgroup

@estesp
Copy link
Copy Markdown
Member

estesp commented Oct 16, 2024

reverted in main; closing and will update when the runc issue is solved

@estesp estesp closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants