-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Broadly suggested in an issue on Ubuntu 18.04: https://utcc.utoronto.ca/~cks/space/blog/linux/SystemdTimesyncdFailure
Audited 7e8d494 (v239+) source files execute.c / namespace.c (and checking the general DynamicUser= design from the systemd blog post).
Using DynamicUser=yes with e.g. StateDirectory=foobar, requires a private mount namespace. setup_namespace() is used to make a sub-directory of /var/lib/private/ accessible. In the main namespace, /var/lib/private is inaccessible (chmod a-rwx) unless you have CAP_DAC_SEARCH.
But apply_mount_namespace() silently ignores EPERM or EACCESS errors from setup_namespace(). E.g. if unshare(CLONE_NEWNS) fails with EPERM or EACCESS, the error will be ignored. Then I think the daemon is going to hit EACCESS when it tries to access its subdirectory - and you're in very annoying debugging territory.
In cks' case on Ubuntu 18.04, the namespace failure actually seemed to be triggered by some inconvenient dead FUSE mount. I don't know whether that full scenario is also reproducible on 7e8d494.
Expected behaviour you didn't see
When using DynamicUser=yes with StateDirectory=foobar, failure of unshare(CLONE_NEWNS) failure is treated as fatal and logged, showing the error code even when it is EPERM or EACCESS (and some clue that would let you work out this was related to mount namespaces).
EDIT: And if this hard failure applies only to DynamicUser=yes, you'd really want the message to give some clue about that. I guess at minimum, mentioning /var/lib/private would give a clue.
Steps to reproduce the problem
The steps were tested with systemd v238 on Fedora Linux.
Start the following test.service, inside a systemd-nspawn container which was started manually with the option --drop-capability=CAP_SYS_ADMIN. On systemd v238, test.service will try to run the touch command, and the touch command will fail with "Permission denied". (It will succeed when run inside a container which is allowed CAP_SYS_ADMIN).
[Service]
Type=oneshot
DynamicUser=yes
User=test-service
StateDirectory=test-service
ExecStart=/bin/cat /proc/self/mountinfo
ExecStart=/bin/ls -l /proc/self/ns
ExecStart=/bin/ls -ld /var/lib/test-service
ExecStart=/bin/touch /var/lib/test-service/test
Expected behaviour you didn't see (from reproducer)
The service failure should have happened before running the touch command. (systemd should have logged an error message etc, as above).