Skip to content

Pod stuck in ContainerCreating: Unit ...slice already exists #102676

@kolyshkin

Description

@kolyshkin

What happened:

Errors like this one

May 27 06:38:19.960408 ip-10-0-220-230 hyperkube[1448]: E0527 06:38:19.960361 1448 pod_workers.go:190] "Error syncing pod, skipping" err="failed to ensure that the pod: 5ac83c3f-0b16-4cf2-a3cb-f67c19cd0e16 cgroups exist and are correctly applied: failed to create container for [kubepods burstable pod5ac83c3f-0b16-4cf2-a3cb-f67c19cd0e16] : Unit kubepods-burstable-pod5ac83c3f_0b16_4cf2_a3cb_f67c19cd0e16.slice already exists." pod="openshift-machine-config-operator/machine-config-daemon-mm7gt" podUID=5ac83c3f-0b16-4cf2-a3cb-f67c19cd0e16

(when using cgroupDriver: systemd)

What you expected to happen:

No such errors

How to reproduce it (as minimally and precisely as possible):

I don't know for sure.

Anything else we need to know?:

This was introduced in k8s in #102147 and backported to 1.21 in #102196, so needs to be fixed in both master and release-1.21.

RH BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1965545

The cause is a regression in runc/libcontainer: opencontainers/runc#2996

The fix is in opencontainers/runc#2997, which should make its way into runc 1.0.0 GA.

Currently there is DNM PR to bump runc to the version with the fix: #102508, but we have decided (#102250 (comment)) to wait until the release.

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.kind/regressionCategorizes issue or PR as related to a regression from a prior release.priority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.release-blockersig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions