-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Introduce SocketBind{Allow|Deny}= properties powered by source compiled BPF #17655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/core/dbus-cgroup.c
Outdated
| SD_BUS_PROPERTY("ManagedOOMSwap", "s", property_get_managed_oom_mode, offsetof(CGroupContext, moom_swap), 0), | ||
| SD_BUS_PROPERTY("ManagedOOMMemoryPressure", "s", property_get_managed_oom_mode, offsetof(CGroupContext, moom_mem_pressure), 0), | ||
| SD_BUS_PROPERTY("ManagedOOMMemoryPressureLimitPercent", "s", bus_property_get_percent, offsetof(CGroupContext, moom_mem_pressure_limit), 0), | ||
| SD_BUS_PROPERTY("AllowBindPorts", "as", NULL, offsetof(CGroupContext, allow_bind_ports), 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general there's a preference for passing around structured data, rather than strings to parse - maybe this could be an int, with -1 to indicate "none"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluca IMO -1 semantics is confusing while passing "none" would be consistent with service unit config file.
Also there are IPAddress{Allow|Deny} properties with special values such as "any" and "localhost", I followed that pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, with the DBUS interface we don't really aim for 1:1 mapping with unit config, but for well-structured data passing. And IPAddress is passed around as structured data (array of bytes + unsigned for mask), not as strings:
SD_BUS_PROPERTY("IPAddressAllow", "a(iayu)", property_get_ip_address_access, offsetof(CGroupContext, ip_address_allow), 0),
SD_BUS_PROPERTY("IPAddressDeny", "a(iayu)", property_get_ip_address_access, offsetof(CGroupContext, ip_address_deny), 0),
any/localhost/etc are encoded as special values - eg, 0.0.0.0/0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is integer (i.e.t dbus type specifier) forward compatible?
It keep the initial program simple and to reduce the complexity of this PR, AllowBindPorts= accepts a single integer or none.
Still I'd like to add range support as follow-up, e.g. AllowBindPorts=5555:7777 which will require a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be two integers. You can immediately add a second one, but ignore it for now - that way when it's ready you can use it for ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please use the correct type here, not strings. dbus is all about type-safety, and strings are about "everything goes", hence break tht design goal of dbus.
now, dbus isn't good at having nice ways to extend things in a compatible way. it usually pays of to think in advance how thing are going to be extended, and then use the appropriate types for that. and if that doesn't work out and one has to change type the usual approach is to introduce a new property that has the new type, and both the old and the new continue to be exposed, showing the same data as much as that's possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw what about ipv4 vs. ipv6? here? eventually people probably want to list ports per protocol...
maybe encode "none" as single entry of value "0" (which is after all not a valid port, i.e. if the only allowed port is an invalid one i think the meaning is clear)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does <protocol>:<port> format, e.g. IPv6:5555 look good? This may be extended to <protocol>:<port_min>-<port_max>, e.g.IPv6:5555-7777 as port range support is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use an array of structures of three integers: the address family to use (AF_INET or AF_INET6), and the port start nr, plus number of ports. the dbus type system is easily flexible enough to express that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, dbus is switched to (address_family, nr_ports, port_min) three tuple.
Please, don't. Either libbpf is ready for production and thus provides a stable interface, and then it can be depended upon like any other system library that's already in use, or it isn't and hence it should not be used in core projects like systemd at all. In fact, looking at how/where it's used I'd say it's a good candidate for dynamic loading via dlopen at runtime - please see how other libraries like cryptsetup are handled, eg: https://github.com/systemd/systemd/blob/master/src/shared/cryptsetup-util.c |
a7b9208 to
156cb94
Compare
I think API/ABI compatibility is a lesser concern since it's fully in control of kernel community. That means we can't introduce features which are both mandatory and recent.
Thanks for the pointer, taking a look! |
|
Converting to draft to add feature gate in |
Speaking as a downstream once-removed distro maintainer at $work, in my opinion this is not a problem at all - in fact, it's entirely expected. It's how every other existing dependency works - I'm not seeing anything that indicates this is special in any way to warrant a different, custom treatment for this one option. As long as every requirement is clearly indicated, it's fully expected that optional features do not work if the minimum build and runtime requirements are not met.
|
156cb94 to
05be8a6
Compare
9a6ddef to
a8a33a5
Compare
Listing trade-offs I can think about. Static(+) Ability to introduce new libbpf features is not limited by presence of recent versions in distributions. SharedBasically, swap pros and cons of static: In the patch set I also included libbpf to libsystemd-shared bundle: I think this straightens dynamic linking approach by simplifying libbpf delivery to end users. Wdyt? |
The problem is not so much binary size, although a few folks care about that for certain important and popular use cases, so it's certainly worth mentioning. In my view, the most important factor by some distance is security maintenance. Distributions rely on being able to update a library with a security bug without going around guessing whatever might have embedded copies all over the place. There are over fifty thousand binary packages in Debian, if every single one of them had embedded random libraries it would pretty much be game over. Moreover, the maintainers here can correct me if I'm wrong, but my distinct impression is that the systemd distribution model is not random users installing random things - but you get and use what your distribution gives you. I do not see random end users building their own systemd. Good integration and synergy with the rest of the distro is paramaunt to a functional system. So in my opinion it's not really worth it to worry about what end users might or might not have to do. Developers testing things are qualified enough to look at the documentation, and personally I do not see a reason to believe this is particular one is any different from the other dozen or so optional dependencies. For example, iproute2 just got support for libbpf, as a normal shared library like many others. I maintain it in Debian, and once it's released I'll enable it and ensure the proper requirements are set, and that will be it - no reason to do anything special, as libbpf is just another dependency like libmnl or libelf or libselinux etc etc
There's been a trend to move optional dependencies to dlopen(), to minimize hard requirements for container images and so on. It's the maintainers call how to handle this one of course, but in my view it's a good candidate to be handled that way too. Eg, see: https://github.com/systemd/systemd/blob/master/src/shared/cryptsetup-util.c However, it can of course happen later as a follow-up if the maintainers want to handle it that way, I don't think we need to worry about it now. Current approach looks fine. |
a2fe8cf to
49a734a
Compare
|
CentOS CI logs indicate that libbpf package in ArchLinux was bumped up 0.3, whereas the latest release is 0.2
cc @mrc0mmand: Is 0.3 the latest libbpf but versioned differently? That also adds a point about how libbpf dependent features will work with customly versioned libbpf packages residing in private and corporate repos. |
CentOS CI uses the master branch of the libpf repo, so it's usually a newer version than the latest stable release. |
|
Looking at dlopen, major version of shared lib is always hardcoded in search path: Bumping major in a distro will cause systemd to out-out from codepath depending on a lib. |
Not so far, but this usage is very new. The issue you raised is a valid one, and getting the soname at build time rather than hard-coding, and other issues, are currently being discussed here: #17416 This would be a problem only in case of ABI-but-not-API break (if the soname changes because of an API break, there are code changes requires anyway), which is less common, but of course very much possible, and it very much does happen. |
|
Thanks for the pointer once again! Is there a rule of thumb to separate weak dependencies from critical? The present BPF-powered features in systemd are responsible for security (firewall, device control), same as AllowBind introduced with this PR. IMO graceful degradation shouldn't be the case for security. Present features are implemented with 'assembly' bpf_insn and depend only on kernel. Favoring I would love to replace BPF firewall with C code and add ports support, but what's the point if it can be bypassed. |
Ultimately which way to go is a question for the core maintainers. Your PR is fine at the moment and it can be changed later with regards to this aspect. My take is that folks who want to build minimal images, really mean minimal, no matter what the cost - and to them, all of these features are optional. As long as everything is nicely documented, distro builders are able to make their choices. I expect the vast majority of distros to ship with all of these, exceptions being things like Alpine and minimal builds of Yocto/Buildroot. And as long as the failure mode is appropriate, I do not see a security issue. ie: if the options are set, but cannot be satisfied, there should be an error. For example, if you don't have libcrypsetup available and your unit sets some verity options, it will fail to start when dlopen fails. |
anitazha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial surface level comments. Also I think BPFProgramV2 could be more descriptive; maybe LibBPFProgram to distinguish from the raw BPFProgram?
1efa223 to
8d7a3c2
Compare
core: serialize socket_bind bpf links
Verify that service exited correctly if valid ports are passed to
SocketBind{Allow|Deny}=
Use `ncat` program starting a listening service binding to a specified
port, e.g.
"timeout --preserve-status -sSIGTERM 1s /bin/nc -l -p ${port} -vv"
8d7a3c2 to
7dc1707
Compare
|
Looks nice. |
|
cc @iaguis @mauriciovasquezbernal |
In most of our codebase when we referenced "ipv4" and "ipv6" on the right-hand-side of an assignment, we lowercases it (on the left-hand-side we used CamelCase, and thus "IPv4" and "IPv6"). In particular all across the networkd codebase the various "per-protocol booleans" use the lower-case spelling. Hence, let's use lower-case for SocketBindAllow=/SocketBindDeny= too, just make sure things feel like they belong together better. (This work is not included in any released version, hence let's fix this now, before any fixes in this area would be API breakage) Follow-up for systemd#17655
poettering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for being late to the party. lgtm, but i found some nits.
| assert(map_fd >= 0); | ||
|
|
||
| LIST_FOREACH(socket_bind_items, item, head) { | ||
| const uint32_t key = i++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this is unused?
| } | ||
|
|
||
| void cgroup_context_remove_socket_bind(CGroupSocketBindItem **head) { | ||
| CGroupSocketBindItem *h; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to innermost scope?
| return log_unit_error_errno(u, SYNTHETIC_ERRNO(EINVAL), | ||
| "Maximum number of socket bind rules=%u is exceeded", SOCKET_BIND_MAX_RULES); | ||
|
|
||
| obj = socket_bind_bpf__open(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i couldn#t find where this is actually defined? (i figure it's generated, but where is that? i am not smart enough...)
anyway, this doesn#t set any error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's auto-generated skeleton, cc @wat-ze-hex for where exactly that logic lives. But yes, on error it returns NULL and doesn't set errno. We have plans to set errno for such cases soon, as part of libbpf 1.0 preparatory work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto-generated *.skel.h is located in build:
find build/ -name "*.skel.h"
build/src/core/bpf/socket_bind/socket-bind.skel.h
The name of output file is set in meson.build:
https://github.com/systemd/systemd/blob/main/src/core/bpf/socket_bind/meson.build#L7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for resurrecting this thread but having fuzzed __bpf_object__open I'm not sure at this point I like the idea of loading binary blobs via libbpf anywhere in systemd: libbpf/libbpf#390. @anakryiko @wat-ze-hex it would be great if the library could be fuzzed continuously. To that end I opened google/oss-fuzz#6608. I'd appreciate it if you could take a look. Thanks!
| if (socket_bind_bpf__load(obj) != 0) | ||
| return log_unit_error_errno(u, errno, "Failed to load BPF object"); | ||
|
|
||
| allow_map_fd = bpf_map__fd(obj->maps.sd_bind_allow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can't fail? this returns a pre-allocated fd and never allocates one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. looking at bpf codebase this can return EINVAL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will never fail because obj->maps.sd_bind_allow is never NULL. And after successful socket_bind_bpf__load() return is always a valid FD (>=0).
| bpf_map__name(obj->maps.sd_bind_allow)); | ||
|
|
||
| deny_map_fd = bpf_map__fd(obj->maps.sd_bind_deny); | ||
| assert(deny_map_fd >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| r = update_rules_map(allow_map_fd, allow); | ||
| if (r < 0) | ||
| return log_unit_error_errno( | ||
| u, r, "Failed to put socket bind allow rules into BPF map '%s'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suffix error message with : %m please, so that the error is shown
| if (r < 0) | ||
| return log_unit_error_errno( | ||
| u, r, "Failed to put socket bind deny rules into BPF map '%s'", | ||
| bpf_map__name(obj->maps.sd_bind_deny)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| "Failed to resize BPF map '%s': %m", bpf_map__name(obj->maps.sd_bind_deny)); | ||
|
|
||
| if (socket_bind_bpf__load(obj) != 0) | ||
| return log_unit_error_errno(u, errno, "Failed to load BPF object"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so socket_bind_bpf__load() ets errno, but socket_bind_bpf__open() does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
socket_bind_bpf__load() doesn't set errno explicitly, so it shouldn't be checked. socket_bind_bpf__load() returns error directly.
| if (!c) | ||
| return false; | ||
|
|
||
| return c->socket_bind_allow != NULL || c->socket_bind_deny != NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per coding style for ptrs we usually don't do explicit comparison with NULL. but doesn#t matter
| char **candidates = STRV_MAKE("ncat", "nc", "netcat"), **c; | ||
| int r = 0; | ||
|
|
||
| STRV_FOREACH(c, candidates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's FOREACH_STRING() btw
|
hmm, i'll post a follow-up PR addressing a bunch of the items above, but see above, some things regarding errno are not clear to me... |
In most of our codebase when we referenced "ipv4" and "ipv6" on the right-hand-side of an assignment, we lowercases it (on the left-hand-side we used CamelCase, and thus "IPv4" and "IPv6"). In particular all across the networkd codebase the various "per-protocol booleans" use the lower-case spelling. Hence, let's use lower-case for SocketBindAllow=/SocketBindDeny= too, just make sure things feel like they belong together better. (This work is not included in any released version, hence let's fix this now, before any fixes in this area would be API breakage) Follow-up for systemd#17655
In most of our codebase when we referenced "ipv4" and "ipv6" on the right-hand-side of an assignment, we lowercases it (on the left-hand-side we used CamelCase, and thus "IPv4" and "IPv6"). In particular all across the networkd codebase the various "per-protocol booleans" use the lower-case spelling. Hence, let's use lower-case for SocketBindAllow=/SocketBindDeny= too, just make sure things feel like they belong together better. (This work is not included in any released version, hence let's fix this now, before any fixes in this area would be API breakage) Follow-up for systemd#17655
@mrc0mmand could it be that the "dlopen" trick prevents the CI (where as far as I know |
It does, but we have a "workaround" for this in Lines 1131 to 1134 in 231c764
|
|
@mrc0mmand thanks! Now I'm puzzled as to why the CI can't catch libbpf/libbpf#391 |
|
Hm, that's interesting, the output on my side looks a bit different, I wonder what's going on: |
|
Ah, I see, it's because the libbpf is not compiled with ASan/UBSan in CIs. After rebuilding libbpf with |
Thinking about it a bit more, I'm not sure workarounds of this type have ever been applied to various tools building images using |
This PR addresses the feature request in #13496 (comment) by adding
AllowBindPorts=unit property.If the property specified a service is allowed to
bind(2)only to a certain set of ports. This prevents from assigning socket address with disallowed port to a socket.Stretch goal of this PR is to improve BPF ecosystem in systemd by making it more developer friendly and to clear path for more features implemented with BPF. Which is why BPF programs behind
AllowBindPorts=feature are compiled from sources in restricted C in contrast with the present bpf instructions 'assembly' approach.The content of PR:
build-bpf.pybuild script generating a C header file with BPF object file represented as C array dce011dbuild-bpf.pysharedandcorelibraries to support loading, attaching, populating BPF maps ed92be7 15ef74fThe property requires compile time clang llvm (for llc) and libbpf shared lib. If any of the requirements are not satisfied, the property is not supported and the dependent code is not included into systemd build.
I'd appreciate feedback on
Closes #13744