Remove sub cgroup when container exits#3226
Conversation
ae99440 to
a942d5d
Compare
|
Please squash commits and use real name for signing. |
76fe2b6 to
52ebeb4
Compare
|
@AkihiroSuda Thanks! I have squashed commits and signed with real name. |
kolyshkin
left a comment
There was a problem hiding this comment.
NACK.
-
os.RemoveAlltries to remove all the files, too, which is definitely not what's needed in this case. For such specific case (need to remove all directories recursively but not files) we havecgroups.RemovePath()function, which fs2 driver uses. I'd suggest to merely callm.fsMgr.Destroy()here, it does exactly what's needed, and other methods similarly rely on fsMgr. -
We already have tests for cgroup v1 and v2 removal with sub-cgroups (see
tests/integration/delete.bats). Need to explain why aren't they failing with the existing code.
52ebeb4 to
0f5583e
Compare
|
@kolyshkin Thanks for your suggestion. I have modified it, PLTA. |
The question still remains -- why this test is not failing without this patch? runc/tests/integration/delete.bats Lines 113 to 158 in e999e29 Or, to rephrase the question -- can you provide us with a repro to see what issue this PR fixes? |
In my case (Fedora 34, systemd 248 (v248.7-1.fc34)) systemd is removing the cgroup together with the children. Maybe there's something that prevents system from removing the sub-cgroups. I wonder what is it. |
0f5583e to
7e76e6a
Compare
|
I think it is related to kernel code https://elixir.bootlin.com/linux/v4.18.20/source/kernel/cgroup/cgroup.c#L5125 I write some bpf script to check, and results: |
|
@kolyshkin I have mocked the test on my computer, it seems the error message still shown, but the cgroup will be deleted. terminal A: terminal B: After deleted, terminal A shows: And the bpf shows: And I hooked the |
|
Ah! So you're using rootless, and the test I referred to requires root (for some reason -- I think this req can be dropped, see #3229). We still can't catch this bug as you are using centos 8 + cgroup v2, a combo we do not test. The patch makes sense though. Can you please improve the commit subject and add the description? |
|
The commit message subject should be something like The commit message itself should link to the bug report, and briefly describe what are you fixing. |
7e76e6a to
9c5fe74
Compare
|
@kolyshkin I have modified the commit message and comment, PTAL. Thanks! |
kolyshkin
left a comment
There was a problem hiding this comment.
LGTM, and thanks!
(not sure why DCO check complains -- seems the commit is properly signed)
You need |
| err := os.Remove(m.path) | ||
| if err != nil && !os.IsNotExist(err) { | ||
| // systemd 239 do not remove sub-cgroups. | ||
| logrus.Debugf("try to destroy cgroup: %s", m.path) |
There was a problem hiding this comment.
I saw there was a comment about this earlier; mostly wondering how useful this debug log is, as it would be logging the "happy" path as well (successfully handled this); the "failure" path is already returned as an error (which may be more interesting in case things don't work as expected).
Is there a strong motivation to log this case?
There was a problem hiding this comment.
Thanks for your suggestion. When I met the error message, I want to make sure if this code is the root cause. I have deleted the debug log.
There was a problem hiding this comment.
Thanks! If the error message itself is not providing enough information, we could add a (debug) log inside the if err != nil. I suspect the error will contain information about the path that failed to be removed though (so perhaps not needed to add additional logs for that)
9c5fe74 to
e5f3a5e
Compare
Currently, we can create subcgroup in a rootless container with systemd cgroupv2 on centos8. But after the container exited, the container cgroup and its subcgroup will not be removed. Fix this by removing all directories recursively. Fixes: opencontainers#3225 Signed-off-by: Kang Chen <[email protected]>
e5f3a5e to
7758d3f
Compare
Delete all child cgroup in container cgroup, otherwise it will failed to delete the container cgroup.
Close #3225
Signed-off-by: Kang Chen [email protected]
Proposed changelog entry
(by @kolyshkin)
1.0 backport: #3297