Update runc to 1.0.0#102508
Conversation
|
Hi @kolyshkin. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
|
/test pull-kubernetes-node-crio-cgrpv2-e2e |
|
Test failure in |
|
Is there a reason why we don't want to split the dependency bump and the device changes @kolyshkin? |
runc 1.0.0 includes the fix for SkipDevices to work properly with the systemd driver (opencontainers/runc#2958). This is why the update and the removal of "allow all" device rule is tied together. One thing this PR does is making sure this PR really fixes the issue. Surely we can do it in a sequential fashion -- update to 1.0 first, then remove the rules. I don't see the reason why though. |
I believe we do
Do you mean to split this PR into two? I would say it's not needed, as the other change is a cleanup coupled to runc-1.0.0 bump, but sure, I have separated the second commit into #103163 |
|
/assign @liggitt |
|
/approve needs a final |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, kolyshkin, liggitt, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@kolyshkin: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
should we be upgrading the runc binary in the CI jobs? AFAIK this has been pretty adhoc |
|
@BenTheElder we update the runc in CI when we update containerd (goes as a pair) |
Yes, but do we do that after this PR? containerd has already updated AFAIK. Also the project has blocking CI on GCE with:
Plus KIND with it's own version of containerd. So if we should be updating it, we have some work to coordinate. |
This is to check if runc 1.0.0 (to be released shortly) works with k8s. The commands used were (roughly): hack/pin-dependency.sh github.com/opencontainers/runc v1.0.0 hack/lint-dependencies.sh # Follow its recommendations. hack/pin-dependency.sh github.com/cilium/ebpf v0.6.1 hack/pin-dependency.sh github.com/opencontainers/selinux v1.8.2 hack/pin-dependency.sh github.com/sirupsen/logrus v1.8.1 # Recheck. hack/lint-dependencies.sh GO111MODULE=on go mod edit -dropreplace github.com/willf/bitset hack/update-vendor.sh # Recheck. hack/lint-dependencies.sh hack/update-internal-modules.sh # Recheck. hack/lint-dependencies.sh [v2: rebased, updated runc 3a0234e1fe2e82 -> 2f8e8e9d977500] [v3: testing master + runc pr 3019] [v4: updated to 93a01cd4d0b7a0f08a] [v5: updated to f093cca13d3cf8a484] [v6: rebased] [v7: updated to runc v1.0.0] [v8: rebased] Signed-off-by: Kir Kolyshkin <[email protected]>
Since runc 1.0.0 it is now sufficient to have SkipDevices: true. Signed-off-by: Kir Kolyshkin <[email protected]>
|
/lgtm |
|
1.21 backport: #103392 |
|
@kolyshkin The kOps project has some periodic e2e conformance tests that run against the The conformance tests results can be found here: https://testgrid.k8s.io/kops-misc#kops-aws-misc-amd64-conformance. After some testing using various kubernetes versions, I think I tracked the moment that the issue appeared to this change: Would you mind checking if this is in any way related with failures we are seeing in kOps tests? |
|
For some clarity: c206af0...4748bb0 (based on commits in links above) |
|
We managed to find more info about the issue we are seeing in kOps. By switching the cgroup driver from |
|
Thanks for the report @hakman! And thanks for all the details! I have done some digging, and it turns out opencontainers/runc#3019 (I am pretty sure, but have not bisected) cause some strange behavior where various cgroups are freezed without being thawed (more info here: https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt). We still lack a great deal of CI coverage for the systemd cgroup driver.... The result of this freezing is that everything just stops, and stuff just times out in places where that shouldn't be possible. I vote for reverting this PR, and awaits a proper fix, since this essentially breaks kubelet systemd cgroup driver support :/ The regression in v1.21 could also be fixed by #102250 as well, so we should look into that in order to fix that release (or revert the previous runc bump there as well). Output after kubeadm init (that just times out): |
|
Opened a revert here: #103483 |
|
@odinuge: GitHub didn't allow me to request PR reviews from the following users: kolyshkin. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
we are also seeing issues on ppc64le platform, hitting with the following error while starting the containers, |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes: #102676
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
none