Skip to content

DynamicUser= is missing error-handling for setup_namespace() == EPERM or EACCESS #9835

@sourcejedi

Description

@sourcejedi

Broadly suggested in an issue on Ubuntu 18.04: https://utcc.utoronto.ca/~cks/space/blog/linux/SystemdTimesyncdFailure

Audited 7e8d494 (v239+) source files execute.c / namespace.c (and checking the general DynamicUser= design from the systemd blog post).

Using DynamicUser=yes with e.g. StateDirectory=foobar, requires a private mount namespace. setup_namespace() is used to make a sub-directory of /var/lib/private/ accessible. In the main namespace, /var/lib/private is inaccessible (chmod a-rwx) unless you have CAP_DAC_SEARCH.

But apply_mount_namespace() silently ignores EPERM or EACCESS errors from setup_namespace(). E.g. if unshare(CLONE_NEWNS) fails with EPERM or EACCESS, the error will be ignored. Then I think the daemon is going to hit EACCESS when it tries to access its subdirectory - and you're in very annoying debugging territory.

In cks' case on Ubuntu 18.04, the namespace failure actually seemed to be triggered by some inconvenient dead FUSE mount. I don't know whether that full scenario is also reproducible on 7e8d494.

Expected behaviour you didn't see

When using DynamicUser=yes with StateDirectory=foobar, failure of unshare(CLONE_NEWNS) failure is treated as fatal and logged, showing the error code even when it is EPERM or EACCESS (and some clue that would let you work out this was related to mount namespaces).

EDIT: And if this hard failure applies only to DynamicUser=yes, you'd really want the message to give some clue about that. I guess at minimum, mentioning /var/lib/private would give a clue.


Steps to reproduce the problem

The steps were tested with systemd v238 on Fedora Linux.

Start the following test.service, inside a systemd-nspawn container which was started manually with the option --drop-capability=CAP_SYS_ADMIN. On systemd v238, test.service will try to run the touch command, and the touch command will fail with "Permission denied". (It will succeed when run inside a container which is allowed CAP_SYS_ADMIN).

[Service]
Type=oneshot
DynamicUser=yes
User=test-service
StateDirectory=test-service
ExecStart=/bin/cat /proc/self/mountinfo
ExecStart=/bin/ls -l /proc/self/ns
ExecStart=/bin/ls -ld /var/lib/test-service
ExecStart=/bin/touch /var/lib/test-service/test

Expected behaviour you didn't see (from reproducer)

The service failure should have happened before running the touch command. (systemd should have logged an error message etc, as above).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions