Skip to content

Remove sub cgroup when container exits#3226

Merged
kolyshkin merged 1 commit intoopencontainers:masterfrom
chenk008:fix_delete_cgroupv2
Oct 5, 2021
Merged

Remove sub cgroup when container exits#3226
kolyshkin merged 1 commit intoopencontainers:masterfrom
chenk008:fix_delete_cgroupv2

Conversation

@chenk008
Copy link
Copy Markdown
Contributor

@chenk008 chenk008 commented Sep 27, 2021

Delete all child cgroup in container cgroup, otherwise it will failed to delete the container cgroup.

Close #3225

Signed-off-by: Kang Chen [email protected]

Proposed changelog entry

(by @kolyshkin)

Bugfixes:
* cgroup v2 systemd manager: delete sub-cgroups on cgroup delete

1.0 backport: #3297

Comment thread libcontainer/cgroups/systemd/v2.go Outdated
@AkihiroSuda
Copy link
Copy Markdown
Member

Please squash commits and use real name for signing.
LGTM then.

@chenk008 chenk008 force-pushed the fix_delete_cgroupv2 branch 3 times, most recently from 76fe2b6 to 52ebeb4 Compare September 27, 2021 07:56
@chenk008
Copy link
Copy Markdown
Contributor Author

@AkihiroSuda Thanks! I have squashed commits and signed with real name.

AkihiroSuda
AkihiroSuda previously approved these changes Sep 27, 2021
Comment thread libcontainer/cgroups/systemd/v2.go Outdated
Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NACK.

  1. os.RemoveAll tries to remove all the files, too, which is definitely not what's needed in this case. For such specific case (need to remove all directories recursively but not files) we have cgroups.RemovePath() function, which fs2 driver uses. I'd suggest to merely call m.fsMgr.Destroy() here, it does exactly what's needed, and other methods similarly rely on fsMgr.

  2. We already have tests for cgroup v1 and v2 removal with sub-cgroups (see tests/integration/delete.bats). Need to explain why aren't they failing with the existing code.

@chenk008
Copy link
Copy Markdown
Contributor Author

@kolyshkin Thanks for your suggestion. I have modified it, PLTA.

Comment thread libcontainer/cgroups/systemd/v2.go Outdated
@kolyshkin
Copy link
Copy Markdown
Contributor

We already have tests for cgroup v1 and v2 removal with sub-cgroups (see tests/integration/delete.bats). Need to explain why aren't they failing with the existing code.

The question still remains -- why this test is not failing without this patch?

@test "runc delete --force in cgroupv2 with subcgroups" {
requires cgroups_v2 root
set_cgroups_path
set_cgroup_mount_writable
# run busybox detached
runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
[ "$status" -eq 0 ]
# check state
testcontainer test_busybox running
# create a sub process
__runc exec -d test_busybox sleep 1d
# find the pid of sleep
pid=$(__runc exec test_busybox ps -a | grep 1d | awk '{print $1}')
[[ ${pid} =~ [0-9]+ ]]
# create subcgroups
cat <<EOF >nest.sh
set -e -u -x
cd /sys/fs/cgroup
echo +pids > cgroup.subtree_control
mkdir foo
cd foo
echo threaded > cgroup.type
echo ${pid} > cgroup.threads
cat cgroup.threads
EOF
runc exec test_busybox sh <nest.sh
[ "$status" -eq 0 ]
[[ "$output" =~ [0-9]+ ]]
# check create subcgroups success
[ -d "$CGROUP_PATH"/foo ]
# force delete test_busybox
runc delete --force test_busybox
runc state test_busybox
[ "$status" -ne 0 ]
# check delete subcgroups success
[ ! -d "$CGROUP_PATH"/foo ]
}

Or, to rephrase the question -- can you provide us with a repro to see what issue this PR fixes?

@kolyshkin
Copy link
Copy Markdown
Contributor

We already have tests for cgroup v1 and v2 removal with sub-cgroups (see tests/integration/delete.bats). Need to explain why aren't they failing with the existing code.

The question still remains -- why this test is not failing without this patch?

In my case (Fedora 34, systemd 248 (v248.7-1.fc34)) systemd is removing the cgroup together with the children.

Maybe there's something that prevents system from removing the sub-cgroups. I wonder what is it.

@chenk008
Copy link
Copy Markdown
Contributor Author

chenk008 commented Sep 28, 2021

I think it is related to kernel code https://elixir.bootlin.com/linux/v4.18.20/source/kernel/cgroup/cgroup.c#L5125

I write some bpf script to check, and css_has_online_children return 1 (i.e. true) ,so result is EBUSY

#!/usr/bin/env bpftrace

#include <linux/cgroup-defs.h>

BEGIN
{
	printf("Tracing cgroup remove events... Hit Ctrl-C to end.\n");
	printf("%-8s %s\n", "TIME", "Action");
}

kprobe:cgroup_rmdir
{
	time("%H:%M:%S ");
    $node = (struct kernfs_node *)arg0;
	printf("delete: %s cg: %p\n", str($node->name),$node->priv);
}

kprobe:cgroup_destroy_locked
{
	time("%H:%M:%S ");
    $cg = (struct cgroup *)arg0;
	printf("delete: %p\n", $cg);
}

kretprobe:cgroup_destroy_locked
{
	time("%H:%M:%S ");
	printf("cgroup_destroy_locked returned: %d\n", retval);
}

kprobe:css_has_online_children
{
   time("%H:%M:%S ");
   $cg = (struct cgroup *)arg0;
   printf("check: %p\n", $cg);
}


kretprobe:css_has_online_children
{
	time("%H:%M:%S ");
	printf("css_has_online_children returned: %d\n", retval);
}

results:

Attaching 6 probes...
Tracing cgroup remove events... Hit Ctrl-C to end.
TIME     Action
11:49:36 delete: runc-foo.scope cg: 0xffff9b4ce34ca000
11:49:36 delete: 0xffff9b4ce34ca000
11:49:36 check: 0xffff9b4ce34ca000
11:49:36 css_has_online_children returned: 1
11:49:36 cgroup_destroy_locked returned: -16

@chenk008
Copy link
Copy Markdown
Contributor Author

chenk008 commented Sep 28, 2021

@kolyshkin I have mocked the test on my computer, it seems the error message still shown, but the cgroup will be deleted.

terminal A:

[vagrant@localhost rootless]$ ./runc.amd64 --systemd-cgroup  run foo
/ # mkdir /sys/fs/cgroup/test

terminal B:

runc delete --force foo

After deleted, terminal A shows:

/ # ERRO[0044] remove /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/user.slice/runc-foo.scope: device or resource busy
[vagrant@localhost rootless]$ ll /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/user.slice/runc-foo.scope
ls: cannot access '/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/user.slice/runc-foo.scope': No such file or directory

And the bpf shows:

Attaching 6 probes...
Tracing cgroup remove events... Hit Ctrl-C to end.
TIME     Action
14:36:41 delete: runc-foo.scope cg: 0xffff9b4cb859d000
14:36:41 delete: 0xffff9b4cb859d000
14:36:41 check: 0xffff9b4cb859d000
14:36:41 css_has_online_children returned: 1
14:36:41 cgroup_destroy_locked returned: -16
14:36:42 delete: runc-foo.scope cg: 0xffff9b4cb859d000
14:36:42 delete: 0xffff9b4cb859d000
14:36:42 check: 0xffff9b4cb859d000
14:36:42 css_has_online_children returned: 1
14:36:42 cgroup_destroy_locked returned: -16
14:36:42 delete: test cg: 0xffff9b4d05fd3000
14:36:42 delete: 0xffff9b4d05fd3000
14:36:42 check: 0xffff9b4d05fd3000
14:36:42 css_has_online_children returned: 0
14:36:42 cgroup_destroy_locked returned: 0
14:36:42 delete: runc-foo.scope cg: 0xffff9b4cb859d000
14:36:42 delete: 0xffff9b4cb859d000
14:36:42 check: 0xffff9b4cb859d000
14:36:42 css_has_online_children returned: 0
14:36:42 cgroup_destroy_locked returned: 0

And I hooked the fs2.(*manager).Destroy on runc 1.0.2 , got the backtrace. The delete action will use cgroupManager in libcontainer/cgroups/fs2/fs2.go, so it can remove all directories recursively and remove the container cgroup. So I think this pull request is required when container exit normally.

goroutine 1 [running]:
github.com/opencontainers/runc/libcontainer/cgroups/fs2.(*manager).Destroy(0xc0002b0720, 0x555b85e3b086, 0x6)
	github.com/opencontainers/runc/libcontainer/cgroups/fs2/fs2.go:141 +0x3b
github.com/opencontainers/runc/libcontainer.destroy(0xc000210400, 0xc000206700, 0x9)
	github.com/opencontainers/runc/libcontainer/state_linux.go:47 +0x83
github.com/opencontainers/runc/libcontainer.(*stoppedState).destroy(0xc000206780, 0x555b861ada40, 0xc000279c00)
	github.com/opencontainers/runc/libcontainer/state_linux.go:104 +0x30
github.com/opencontainers/runc/libcontainer.(*linuxContainer).Destroy(0xc000210400, 0x0, 0x0)
	github.com/opencontainers/runc/libcontainer/container_linux.go:639 +0x88
main.destroy(0x555b8626aec0, 0xc000210400)
	github.com/opencontainers/runc/utils_linux.go:143 +0x37
main.killContainer(0x555b8626aec0, 0xc000210400, 0x0, 0x0)
	github.com/opencontainers/runc/delete.go:23 +0xd0
main.glob..func3(0xc000238160, 0x0, 0xc00020ccf0)
	github.com/opencontainers/runc/delete.go:82 +0x3a6
github.com/urfave/cli.HandleAction(0x555b86179880, 0x555b86248f50, 0xc000238160, 0xc000238160, 0x0)
	github.com/urfave/[email protected]/app.go:523 +0x11c
github.com/urfave/cli.Command.Run(0x555b85e3b518, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x555b85e5bced, 0x4d, 0x0, ...)
	github.com/urfave/[email protected]/command.go:174 +0x570
github.com/urfave/cli.(*App).Run(0xc00027a000, 0xc000020080, 0x4, 0x4, 0x0, 0x0)
	github.com/urfave/[email protected]/app.go:276 +0x7b0
main.main()
	github.com/opencontainers/runc/main.go:163 +0xbda

@kolyshkin
Copy link
Copy Markdown
Contributor

Ah! So you're using rootless, and the test I referred to requires root (for some reason -- I think this req can be dropped, see #3229).

We still can't catch this bug as you are using centos 8 + cgroup v2, a combo we do not test.

The patch makes sense though. Can you please improve the commit subject and add the description?

@kolyshkin kolyshkin added the backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 label Sep 29, 2021
@kolyshkin
Copy link
Copy Markdown
Contributor

The commit message subject should be something like libct/cg/sd/v2: Destroy: remove cgroups recursively

The commit message itself should link to the bug report, and briefly describe what are you fixing.

Comment thread libcontainer/cgroups/systemd/v2.go Outdated
@chenk008
Copy link
Copy Markdown
Contributor Author

chenk008 commented Sep 29, 2021

@kolyshkin I have modified the commit message and comment, PTAL. Thanks!

kolyshkin
kolyshkin previously approved these changes Sep 29, 2021
Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thanks!

(not sure why DCO check complains -- seems the commit is properly signed)

@kolyshkin
Copy link
Copy Markdown
Contributor

@AkihiroSuda @cyphar @thaJeztah PTAL

@AkihiroSuda
Copy link
Copy Markdown
Member

Commit sha: 9c5fe74, Author: wuhua.ck, Committer: wuhua.ck; Expected "wuhua.ck [email protected]", but got "Kang Chen [email protected]".

https://github.com/opencontainers/runc/pull/3226/checks?check_run_id=3739410743

You need git config user.name Kang Chen

Comment thread libcontainer/cgroups/systemd/v2.go Outdated
err := os.Remove(m.path)
if err != nil && !os.IsNotExist(err) {
// systemd 239 do not remove sub-cgroups.
logrus.Debugf("try to destroy cgroup: %s", m.path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw there was a comment about this earlier; mostly wondering how useful this debug log is, as it would be logging the "happy" path as well (successfully handled this); the "failure" path is already returned as an error (which may be more interesting in case things don't work as expected).

Is there a strong motivation to log this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion. When I met the error message, I want to make sure if this code is the root cause. I have deleted the debug log.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! If the error message itself is not providing enough information, we could add a (debug) log inside the if err != nil. I suspect the error will contain information about the path that failed to be removed though (so perhaps not needed to add additional logs for that)

Currently, we can create subcgroup in a rootless container with systemd cgroupv2 on centos8.
But after the container exited, the container cgroup and its subcgroup will not be removed.

Fix this by removing all directories recursively.

Fixes: opencontainers#3225

Signed-off-by: Kang Chen <[email protected]>
@chenk008 chenk008 force-pushed the fix_delete_cgroupv2 branch from e5f3a5e to 7758d3f Compare October 1, 2021 14:07
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin merged commit c670691 into opencontainers:master Oct 5, 2021
@kolyshkin kolyshkin added this to the 1.1.0 milestone Oct 5, 2021
@kolyshkin kolyshkin added backport/1.0-done A PR in main branch which has been backported to release-1.0 and removed backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 labels Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cgroupv2 area/systemd backport/1.0-done A PR in main branch which has been backported to release-1.0 impact/changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When rootless runc exits, failed to remove cgroupv2 with sub-cgroup

4 participants