-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Launch resolved & networkd unprivileged using RuntimeDirectory & Ambient capabilities #6393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
systemd-networkd may be started before systemd-tmpfilest thus it already implements the logic required to create those directories.
Start under an unprivileged user, remove uid/gid dropping logic and initial capability drop. Refuse to run as root. Use RuntimeDirectory, RuntimeDirectoryMode and RuntimeDirectoryPreserve to create runtime directory.
Start under an unprivileged user, remove uid/gid dropping logic and initial capability drop. Refuse to run as root. Use RuntimeDirectory, RuntimeDirectoryMode and RuntimeDirectoryPreserve to create runtime directories.
|
I am not sure how I feel about requiring ambient caps... We used to say that we focus on kernels from the last two years for systemd, but today actually support substantially more... 4.3 however is 22 months old now, which isn't even 2 years... I wonder if we can't find another way for this, one that permits us to maintain compat on this one. Here's one such idea: currently, if ExecStart='s first character is "+" we will execute the process with full privileges. Maybe we should introduce a similar, new magic char, maybe "?": if set, then the service is run with full privileges if ambient caps are not available, and otherwise with reduced caps. Then, inside of the daemon code we'd check if uid==0, and if so, drop privs as we used to, otherwise make that a NOP. This way we get great behaviour on older kernels as well as on new kernels. I think implementing such a scheme would be relatively straight-forward... Does that make sense? |
| d /run/systemd/netif 0755 systemd-network systemd-network - | ||
| d /run/systemd/netif/links 0755 systemd-network systemd-network - | ||
| d /run/systemd/netif/leases 0755 systemd-network systemd-network - | ||
| )m4_dnl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this should be left in, so that regular userspace clients can install inotify watches on these dirs unconditionally whether networkd actually got started or didn't. After all, inotify is the way how event notifications work on the networkd client side...
Yes, strictly speaking these lines now become redundant, but only in the usual case, not in all cases
| r = drop_privileges(uid, gid, | ||
| (UINT64_C(1) << CAP_NET_RAW)| /* needed for SO_BINDTODEVICE */ | ||
| (UINT64_C(1) << CAP_NET_BIND_SERVICE)| /* needed to bind on port 53 */ | ||
| (UINT64_C(1) << CAP_SETPCAP) /* needed in order to drop the caps later */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer if that the comments about these three caps aren't lost, they are pretty interesting to read
|
Having such fallback code for the ambient stuff in place would also be extremely useful for debugging, as one could then still invoke resolved from the shell prompt directly and get things set up the right way... |
|
Unfortunately, I also remove some capabilities from the bounding set in the unit when using ambient capabilities thus in order to keep a single unit for both cases I would have to drop capabilities in both cases. But I understand the concern. This only impacts systems running the networkd & resolved daemons and those are generally fairly recent distributions. |
Much like "+" the new "?" thingy would leave all bounding caps in place too. Strictly speaking this means that on old kernels we'd thus run with a wider set of caps than we do now, but that's acceptable I think. |
Well, people especially in the embedded world tend to use really old kernels with a much more modern userspace, and they complain already that we don't support really old stuff... cutting this off to 22 months now is just too much... |
|
After yesterday's networkd-test.py fix, this now seems to genuinely fail due to a regression here: |
|
I made a mistake in the units. I'll try to fix the other issue before pushing again. Thanks |
| RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6 | ||
| SystemCallFilter=~@clock @cpu-emulation @debug @keyring @module @mount @obsolete @raw-io @reboot @swap | ||
| SystemCallArchitectures=native | ||
| ReadWritePaths=/run/systemd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required? RuntimeDirectory= implies ReadWritePaths=.
| SystemCallFilter=~@clock @cpu-emulation @debug @keyring @module @mount @obsolete @raw-io @reboot @swap | ||
| SystemCallArchitectures=native | ||
| ReadWritePaths=/run/systemd | ||
| RuntimeDirectory=/run/systemd/resolve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be systemd/resolve. Full path is not allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed
| UMask=0022 | ||
| WatchdogSec=3min | ||
| CapabilityBoundingSet=CAP_SETUID CAP_SETGID CAP_SETPCAP CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_NET_RAW CAP_NET_BIND_SERVICE | ||
| CapabilityBoundingSet=CAP_SETPCAP CAP_NET_RAW CAP_NET_BIND_SERVICE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required if AmbientCapabilities= is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you still need to drop the capabilities from the bounding set in all cases.
| log_error("This program must be run under an unprivileged user " | ||
| "with the following capabilities: CAP_NET_ADMIN " | ||
| "CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW"); | ||
| r = -EINVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, still we can drop privileges by drop_privileges(). I think it is not necessary to quit the program.
| log_error("This program must be run under an unprivileged user" | ||
| " with the following capabilities: CAP_NET_RAW" | ||
| " CAP_NET_BIND_SERVICE CAP_SETPCAP"); | ||
| r = -EINVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as networkd.c.
| UMask=0022 | ||
| WatchdogSec=3min | ||
| CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW CAP_SETUID CAP_SETGID CAP_SETPCAP CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER | ||
| CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required?
| SystemCallFilter=~@clock @cpu-emulation @debug @keyring @module @mount @obsolete @raw-io @reboot @swap | ||
| SystemCallArchitectures=native | ||
| ReadWritePaths=/run/systemd | ||
| RuntimeDirectory=/run/systemd/netif /run/systemd/netif/links /run/systemd/netif/leases /run/systemd/netif/lldp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full path is not allowed. Why don't you create links, leases, and lldp subdirectories by the daemon?
| SystemCallArchitectures=native | ||
| ReadWritePaths=/run/systemd | ||
| RuntimeDirectory=/run/systemd/netif /run/systemd/netif/links /run/systemd/netif/leases /run/systemd/netif/lldp | ||
| RuntimeDirectoryMode=0755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but redundant.
| RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6 AF_PACKET | ||
| SystemCallFilter=~@clock @cpu-emulation @debug @keyring @module @mount @obsolete @raw-io @reboot @swap | ||
| SystemCallArchitectures=native | ||
| ReadWritePaths=/run/systemd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required?
| SystemCallArchitectures=native | ||
| ReadWritePaths=/run/systemd | ||
| RuntimeDirectory=/run/systemd/resolve | ||
| RuntimeDirectoryMode=0755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default value. It's OK, but redundant.
|
In PR #6328, once I've pushed similar commits, but they cause networkd-test.py failure, so I tentatively drop them. Now I pushed them in https://github.com/yuwata/systemd/tree/drop-priv . If you are interested to them, please take a look. Thank you. |
|
Closing this one in favor of #6564. |
This is a follow up to #6241 using the newly added RuntimeDirectoryPreserve option to start resolved and networkd unprivileged using ambient capabilities.