Skip to content

runc systemd cgroup driver logic is wrong #3780

@kolyshkin

Description

@kolyshkin

Description

While investigating issue #3760 I found the following.

Issue A: leftover failed scope unit.

  1. By default, systemd removes transient units (e.g. scopes) that have ended (are no longer active), except those that failed. The failed units are not being removed, they stay.
  2. RHEL9's systemd, for some reason (I guess a RHEL9-specific patch I can't find yet), works differently than in does in older CentOS, or Fedora 37 (or earlier Fedora versions, I guess). The difference is, if the container is OOM-killed (which happen in "event oom" test), its scope state becomes "failed".

This above is the first issue. The container ended, but its scope stays. Here is a repro (works on CentOS Stream 9 only):

Update: this repro needs RHEL/CentOS 9 systemd version < 252.14, i.e. before they've added redhat-plumbers/systemd-rhel9#149.

[root@localhost runc]# systemctl status runc-test_busybox.scope
Unit runc-test_busybox.scope could not be found.
[root@localhost runc]# bats tests/integration/events.bats 
1..5
ok 1 events --stats
ok 2 events --interval default
ok 3 events --interval 1s
ok 4 events --interval 100ms
ok 5 events oom
[root@localhost runc]# systemctl status runc-test_busybox.scope
Warning: The unit file, source configuration file or drop-ins of runc-test_busybox.scope changed on disk. Run 'systemctl daemon->
× runc-test_busybox.scope - libcontainer container test_busybox
     Loaded: loaded (/run/systemd/transient/runc-test_busybox.scope; transient)
  Transient: yes
    Drop-In: /run/systemd/transient/runc-test_busybox.scope.d
             └─50-DevicePolicy.conf, 50-DeviceAllow.conf, 50-MemoryMax.conf
     Active: failed (Result: oom-kill) since Thu 2023-03-23 01:58:22 UTC; 2s ago
   Duration: 5.729s
        CPU: 710ms

Mar 23 01:58:16 localhost systemd[1]: Started libcontainer container test_busybox.
Mar 23 01:58:22 localhost systemd[1]: runc-test_busybox.scope: A process of this unit has been killed by the OOM killer.
Mar 23 01:58:22 localhost systemd[1]: runc-test_busybox.scope: Killing process 248957 (sh) with signal SIGKILL.
Mar 23 01:58:22 localhost systemd[1]: runc-test_busybox.scope: Killing process 249001 (sh) with signal SIGKILL.
Mar 23 01:58:22 localhost systemd[1]: runc-test_busybox.scope: Killing process 249007 (dd) with signal SIGKILL.
Mar 23 01:58:22 localhost systemd[1]: runc-test_busybox.scope: Failed with result 'oom-kill'.
[root@localhost runc]# ./runc --version
runc version 1.1.0+dev
commit: efad7a3b
spec: 1.1.0-rc.1
go: go1.19
libseccomp: 2.5.2

Issue B: Ignoring "org.freedesktop.systemd1.UnitExists" from Apply / startUnit

  1. When runc systemd cgroup manager is used, it calls org.freedesktop.systemd1.Manager.StartTransientUnit over dbus, and waits for it to complete and return back the correct result ("done"). Except, if the call fails with "org.freedesktop.systemd1.UnitExists" error, this error is ignored and startUnit returns no error. The failure can happen because of Issue A above. In such case, runc creates a container which is not placed into a proper systemd scope (and is not placed into a proper cgroup either).

Issue C: failed integration tests on CentOS Stream 9

  1. Many runc integration tests do not use set_cgroups_path, meaning that cgroupsPath field in spec is not set. This results in default cgroup path, which, when systemd cgroup manager is used, is :runc:<container-name>.
  2. Many runc integration tests use the same container name, test_busybox. Together with the item 1 above, which means that the tests repeatedly create a systemd scope named runc-test_busybox.scope.
  3. All of the above, plus the Issue A and Issue B, result is some integration tests failing, as reported in centos-stream-9 CI is failing for the main branch #3760.

Steps to reproduce the issue

See above.

Describe the results you received and expected

What version of runc are you using?

[kir@kir-rhat runc]$ ./runc --version
runc version 1.1.0+dev
commit: v1.1.0-443-g7e630356
spec: 1.1.0-rc.1
go: go1.20.2
libseccomp: 2.5.3

Host OS information

No response

Host kernel information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions