nspawn: change the synced cgroup hierarchy permissions too#4223
nspawn: change the synced cgroup hierarchy permissions too#4223keszybz merged 1 commit intosystemd:masterfrom
Conversation
b496d8d to
29e890c
Compare
29e890c to
7281ab6
Compare
src/basic/stat-util.c
Outdated
| is_fs_type(s, RAMFS_MAGIC); | ||
| is_fs_type(s, RAMFS_MAGIC) || | ||
| is_fs_type(s, CGROUP_SUPER_MAGIC) || | ||
| is_fs_type(s, CGROUP2_SUPER_MAGIC); |
There was a problem hiding this comment.
rm_rf(path, REMOVE_ROOT|REMOVE_ONLY_DIRECTORIES) doesn't work if path on the cgroup[2]-fs. I think we should fix the rm_rf instead. But let's check that this PR really fixes #4181 first.
There was a problem hiding this comment.
i figure it would make sense to add a cg_is_cgroupfs() call to cgroup-util.h, and then call that from rm-rf.c and also allow it.
| undo_mount = true; | ||
|
|
||
| fn = strjoina(tree, cgroup); | ||
| (void) rm_rf(fn, REMOVE_ROOT|REMOVE_ONLY_DIRECTORIES); |
There was a problem hiding this comment.
This fixes #4181 (comment)
How to reproduce:
# create the new container in some-dir
systemd-nspawn -D some-dir -b
...
^]]]
systemd-nspawn -D some-dir -b -U
...
Harder way to reproduce:
# create the new container in some-dir
systemd-nspawn -D some-dir -b
...
login:
...
# at another teminal
ps -C 'systemd-nspawn'
# find the systemd-nspawn-pid
kill -9 <systemd-nspawn-pid>
systemd-nspawn -D some-dir -b -U
There was a problem hiding this comment.
I cannot reproduce this. With the first recipe, things work just fine (without the rm_rf part). At least with the host running unified. I'll reboot and test with the host running mixed later on.
With the second recipe, the container survives the death of nspawn. But in this case we have a mess, so it's probably better if things fail.
There was a problem hiding this comment.
Hm, this is strange.
Did you run ^]]]? . i.e.
Container master terminated by signal KILL.
What is the output of the
mkdir cgroup
mount -t cgroup cgroup -o none,name=systemd,xattr cgroup
ls -lR cgroup/
after the first run of the systemd-nspawn -D ... -b (without -U)
?
There was a problem hiding this comment.
With the second recipe
Sorry, I was wrong. Actually, I mean kill -9 pid1-of-the-container
There was a problem hiding this comment.
@evverx you are saying that if nspawn dies abruptly the cgroup hierarchy created below its unit isn't cleaned up? that's somethig PID 1 really should take care of, and we should fix it there.
There was a problem hiding this comment.
oh, i see now, this is because in this case PID 1 on the host of course won't take care of the other hierarchy, i see.
|
I'm getting a bunch of failures when running F26 inside
It looks like we need to tighten our Condition checks for the first two. The third looks "interesting". Also on shutdown: |
There is a bug report: #4280 (comment) |
|
@evverx I would merge your original patch, but I'm not convinced about the rm_rf part — it seems like the fix should be elsewhere. But even if it is the correct thing to do, I think it should be a separate commit with a good commit message; currently that change is hidden in that large diff. So can you resubmit the first version which only does the chown? Wrt. to #4280, it doesn't seem to improve things for me. |
Makes sense
But I wonder how does this patch work for you? diff --git a/src/nspawn/nspawn-cgroup.c b/src/nspawn/nspawn-cgroup.c
index 8036323..c22393b 100644
--- a/src/nspawn/nspawn-cgroup.c
+++ b/src/nspawn/nspawn-cgroup.c
@@ -111,9 +111,6 @@ int sync_cgroup(pid_t pid, CGroupUnified unified_requested, uid_t arg_uid_shift)
undo_mount = true;
- fn = strjoina(tree, cgroup);
- (void) rm_rf(fn, REMOVE_ROOT|REMOVE_ONLY_DIRECTORIES);
-Environment: root# cat /proc/version
Linux version 4.7.5-200.fc24.x86_64 ([email protected]) (gcc version 6.2.1 20160916 (Red Hat 6.2.1-2) (GCC) ) #1 SMP Mon Sep 26 21:25:47 UTC 2016
root# grep cgroup /proc/self/mountinfo
23 17 0:20 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:7 - cgroup2 cgroup rwLet's run our container without root# UNIFIED_CGROUP_HIERARCHY=no systemd-nspawn -D /var/lib/machines/master/ -b systemd.unit=multi-user.target
...
[ OK ] Started Update UTMP about System Runlevel Changes.
systemd-testsuite login:
^]]]
Container master terminated by signal KILL.Let's check the root# mkdir cgroup
root# mount -t cgroup -o none,name=systemd,xattr cgroup cgroup
root# ls -lR cgroup
...
cgroup/machine.slice/machine-master.scope/system.slice/systemd-user-sessions.service:
total 0
-rw-r--r-- 1 root root 0 Oct 8 22:58 cgroup.clone_children
-rw-r--r-- 1 root root 0 Oct 8 22:58 cgroup.procs
-rw-r--r-- 1 root root 0 Oct 8 22:58 notify_on_release
-rw-r--r-- 1 root root 0 Oct 8 22:58 tasks
cgroup/machine.slice/machine-master.scope/system.slice/tmp.mount:
total 0
-rw-r--r-- 1 root root 0 Oct 8 22:58 cgroup.clone_children
-rw-r--r-- 1 root root 0 Oct 8 22:58 cgroup.procs
-rw-r--r-- 1 root root 0 Oct 8 22:58 notify_on_release
-rw-r--r-- 1 root root 0 Oct 8 22:58 tasks
cgroup/machine.slice/machine-master.scope/user.slice:
total 0
-rw-r--r-- 1 root root 0 Oct 8 22:58 cgroup.clone_children
-rw-r--r-- 1 root root 0 Oct 8 22:58 cgroup.procs
-rw-r--r-- 1 root root 0 Oct 8 22:58 notify_on_release
-rw-r--r-- 1 root root 0 Oct 8 22:58 tasksLet's run our container with root# UNIFIED_CGROUP_HIERARCHY=no systemd-nspawn -D /var/lib/machines/master/ -U -b systemd.unit=multi-user.target
...
Detected virtualization systemd-nspawn.
Detected architecture x86-64.
Welcome to Fedora 24 (Cloud Edition)!
Set hostname to <systemd-testsuite>.
Using cgroup controller name=systemd. File system hierarchy is at /sys/fs/cgroup/systemd.
Failed to install release agent, ignoring: No such file or directory
Failed to create /init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object, freezing.
Freezing execution. |
Well, #4280 improves one particular case: |
|
Hm, with your latest recipe and I guess that's because some |
|
Also note #4323. |
Hm on the host? Also, does |
|
|
poettering
left a comment
There was a problem hiding this comment.
OK, looks all good to me. I'd prefer if we'd move the cgroupfs check currentl in is_temporary_fs into a new function is_cgroupfs() or so, and place that in cgroup-util. rm_rf() would then also call that, or so...
Patch looks great otherwise!
src/basic/stat-util.c
Outdated
| is_fs_type(s, RAMFS_MAGIC); | ||
| is_fs_type(s, RAMFS_MAGIC) || | ||
| is_fs_type(s, CGROUP_SUPER_MAGIC) || | ||
| is_fs_type(s, CGROUP2_SUPER_MAGIC); |
There was a problem hiding this comment.
i figure it would make sense to add a cg_is_cgroupfs() call to cgroup-util.h, and then call that from rm-rf.c and also allow it.
| undo_mount = true; | ||
|
|
||
| fn = strjoina(tree, cgroup); | ||
| (void) rm_rf(fn, REMOVE_ROOT|REMOVE_ONLY_DIRECTORIES); |
There was a problem hiding this comment.
@evverx you are saying that if nspawn dies abruptly the cgroup hierarchy created below its unit isn't cleaned up? that's somethig PID 1 really should take care of, and we should fix it there.
| undo_mount = true; | ||
|
|
||
| fn = strjoina(tree, cgroup); | ||
| (void) rm_rf(fn, REMOVE_ROOT|REMOVE_ONLY_DIRECTORIES); |
There was a problem hiding this comment.
oh, i see now, this is because in this case PID 1 on the host of course won't take care of the other hierarchy, i see.
|
Also needs rebase. |
|
Thanks for working on this! |
Will do. Thanks!
@keszybz , I can't reproduce. I stuck. Am I correct:
? |
|
So... I now added #4351, which adds more debug statements to make it easier to see what is happening. The issue I'm seeing is this: Now But the process does not have write permissions for /sys/fs/cgroup: I added an assert to kill the process when the mount fails. The backtrace is: I guess that's a corner case that we can leave for later. It's not directly related (I think) to this PR. |
|
I forgot to add: the result is the same with this PR and without. |
|
@keszybz , thanks! I've reproduced. This is the result of the torvalds/linux@036d523#diff-003a2c1efe8cf5587b2796a966c15b1c
I'll take a look tomorrow (on Thursday). Thanks! |
7281ab6 to
b37ae90
Compare
|
@poettering , @keszybz , updated. |
| #include "string-util.h" | ||
|
|
||
| static bool is_physical_fs(const struct statfs *sfs) { | ||
| return !is_temporary_fs(sfs) && !is_cgroup_fs(sfs); |
There was a problem hiding this comment.
BTW: I think we should add more filesystems here.
Basically, this test runs:
```
systemd-nspawn --register=no -D "$_root" -b
systemd-nspawn --register=no -D "$_root" --private-network -b
systemd-nspawn --register=no -D "$_root" -U -b
systemd-nspawn --register=no -D "$_root" --private-network -U -b
```
and exports the `UNIFIED_CGROUP_HIERARCHY=[yes|no]`, `SYSTEMD_NSPAWN_USE_CGNS=[yes|no]`
Inspired by
* systemd#3589 (comment)
* systemd#4372 (comment)
* systemd#4223 (comment)
* systemd#1555
and so on :-)
Basically, this test runs:
```
systemd-nspawn --register=no -D "$_root" -b
systemd-nspawn --register=no -D "$_root" --private-network -b
systemd-nspawn --register=no -D "$_root" -U -b
systemd-nspawn --register=no -D "$_root" --private-network -U -b
```
and exports the `UNIFIED_CGROUP_HIERARCHY=[yes|no]`, `SYSTEMD_NSPAWN_USE_CGNS=[yes|no]`
Inspired by
* systemd#3589 (comment)
* systemd#4372 (comment)
* systemd#4223 (comment)
* systemd#1555
and so on :-)
torvalds/linux@036d523 > vfs: Don't create inodes with a uid or gid unknown to the vfs It is expected that filesystems can not represent uids and gids from outside of their user namespace. Keep things simple by not even trying to create filesystem nodes with non-sense uids and gids. So, we actually should `reset_uid_gid` early to prevent #4223 (comment) $ sudo UNIFIED_CGROUP_HIERARCHY=no LD_LIBRARY_PATH=.libs .libs/systemd-nspawn -D /var/lib/machines/fedora-rawhide -U -b systemd.unit=multi-user.target Spawning container fedora-rawhide on /var/lib/machines/fedora-rawhide. Press ^] three times within 1s to kill container. Child died too early. Selected user namespace base 1073283072 and range 65536. Failed to mount to /sys/fs/cgroup/systemd: No such file or directory Details: #4223 (comment) Fixes: #4352
Fixes: #4181