This repository was archived by the owner on Nov 15, 2025. It is now read-only.
forked from systemd/systemd
-
Notifications
You must be signed in to change notification settings - Fork 0
Add RestrictFileSystems= property using LSM BPF #6
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
020342d to
85b6a49
Compare
alban
reviewed
Dec 14, 2020
b81c860 to
b9715e7
Compare
Explicitly document the behavior introduced in systemd#7437: when picking a new UID shift base with "-U", a hash of the machine name will be tried before falling back to fully random UID base candidates.
This commit adds support for disabling the read and write workqueues with the new crypttab options no-read-workqueue and no-write-workqueue. These correspond to the cryptsetup options --perf-no_read_workqueue and --perf-no_write_workqueue respectively.
IPv6 privacy extensions are plural, not singular.
When set to "kernel", systemd is not supposed to touch that sysctl. 5e0534f, part of systemd#17240 forgot to handle that case. Fixes systemd#18003
…kernel network: fix IPv6PrivacyExtensions=kernel
In situations where a service fails to start, systemd suggests the user to use "journalctl -xe" to get details about the failure. While running this command does provide some additional details, most of the information is similar to what was already printed when the service fails. often the actual reason for the failure can be found in the logs of the service that fails to start. This patch updates the wording to suggest using "-u" to view the service logs instead. Signed-off-by: Sebastiaan van Stijn <[email protected]>
Follow-up for 16c89e6.
…to link Fix bug introduced by 2210191.
…address DenyList= filters provided prefixes, not router address. So, RouteDenyLisy= should so for consistency. Fixes 16c89e6.
networkd: add support for prefix allow-list and route allow-list
…_resend() When compiling with CFLAGS='-Werror=maybe-uninitialized -Og' we get a warning about uninitialized "next_timeout" variable. Avoid the warning by adding an (unreachable) "default" label. Fixes: c24288d ("sd-dhcp-client: correct dhcpv4 renew/rebind retransmit timeouts")
Co-authored-by: Alexander Batischev <[email protected]>
man: extend time-{set,sync}.target + systemd-timesyncd/wait-sync docs
* Add `build-bpf` feature gate with 'auto', 'true' and 'false' choices * Add libbpf [0] dependency * Search for clang and llc binaries the build environment. For libbpf [0], make 0.2.0 [1] the minimum required version. If libbpf is satisfied, set HAVE_LIBBPF config option to 1. If `build-bpf` feature gate is set to 'auto', whether feature is enabled or disabled is defined by presence of all of libbpf, clang and llvm in build environment. With 'auto' all dependencies are optional. If the gate is set to `true`, make all of the libbpf, clang and llvm dependencies mandatory. If it's set to `false`, set `BUILD_BPF` to false and make libbpf dependency optional. libbpf dep is dynamic followed by the common pattern in systemd. find_program doesn't allow to set minimum version similary to `dependency` option. The most recent BPF features include BTF which require minimim v.10 LLVM, allow_bind program doesn't use BTF features and builds with clang and llvm 9.0.
Introduce minimalistic set of helpers for bpf programs compiled from
restricted C sources.
Introduce a basic type `struct BPFProgramV2` with 'fd'
and 'attach_type' fields to represent a loaded bpf prog:wqram.
The BPFProgram struct is not used since:
- v2 methods will use libbpf while v1 use raw syscalls
- the majority of its fields is not needed to support BPF program
compiled from sources
- lack of 'attach_type' field
Introduce bpf_object_{} helpers to load bpf programs into kernel, resize
and populate bpf maps, attach program to cgroup hooks.
libbpf dependency must be satisfied to compile the code.
bpf_object_set_inner_map_fd is needed for hash of maps BPF maps and bpf_object_find_program_by_title is needed because bpf_object_get_programs() doesn't return LSM BPF programs, so we need to get it by name.
They were failing in the CI.
Returns the magic number for each filesystem.
It hooks into the file_open LSM hook and allows only when the filesystem
where the open will take place is present in a BPF map for a particular
cgroup.
The BPF map used is a hash of maps with the following structure:
cgroupID -> (s_magic -> uint32)
The inner map is effectively a set.
When the cgroupID is present in the map, it checks the inner map for the
magic number of the filesystem associated with the file that's being
opened. If that magic number is present it allows the open to succeed,
otherwise it returns -EPERM.
If the cgroupID is not present in the map, it allows the open to
succeed.
The BPF program uses CO-RE (Compile-Once Run-Everywhere) to access
internal kernel structures without needing kernel headers present at
runtime.
It uses tools/build-bpf.py to compile the BPF program from the sources.
If systemd#17655 gets merged, there's no need to do this and we can use their test. This removes the bpf_object_get_programs() test because LSM programs are not returned by libbpf.
It returns the cgroupID from a cgroup path.
It didn't reflect the current status.
It will be used later.
They link with libcore and libcore is not using libbpf.
This adds 4 functions to implement RestrictFileSystems= * lsm_bpf_supported() checks if LSM BPF is supported. It checks that cgroupv2 is used, that BPF LSM is enabled, and tries to load the BPF LSM program which makes sure BTF and hash of maps are supported, and BPF LSM programs can be loaded. * lsm_bpf_setup() loads and attaches the LSM BPF program. * bpf_restrict_filesystems() populates the hash of maps BPF map with the cgroupID and the set of allowed filesystems. * cleanup_lsm_bpf() removes a cgroupID entry from the hash of maps.
It attaches the LSM BPF program when the system manager starts up. It populates the hash of maps BPF map when services that have RestrictFileSystems= set start. It cleans up the hash of maps when the unit cgroups is pruned.
Services only have access to filesystems that are listed here. Accepts a list of filesystem names.
libbpf is used in core code now, so we need to add it as dependency for tests.
For distros that ship libbpf 0.2.0.
Author
|
There's an upstream PR now. Closing. |
mauriciovasquezbernal
pushed a commit
that referenced
this pull request
May 18, 2021
C.f. 9793530. We'd crash when trying to access an already-deallocated object: Thread no. 1 (7 frames) #2 log_assert_failed_realm at ../src/basic/log.c:844 #3 event_inotify_data_drop at ../src/libsystemd/sd-event/sd-event.c:3035 #4 source_dispatch at ../src/libsystemd/sd-event/sd-event.c:3250 #5 sd_event_dispatch at ../src/libsystemd/sd-event/sd-event.c:3631 #6 sd_event_run at ../src/libsystemd/sd-event/sd-event.c:3689 #7 sd_event_loop at ../src/libsystemd/sd-event/sd-event.c:3711 systemd#8 run at ../src/home/homed.c:47 The source in question is an inotify source, and the messages are: systemd-homed[1340]: /home/ moved or renamed, recreating watch and rescanning. systemd-homed[1340]: Assertion '*_head == _item' failed at src/libsystemd/sd-event/sd-event.c:3035, function event_inotify_data_drop(). Aborting. on_home_inotify() got called, then manager_watch_home(), which unrefs the existing inotify_event_source. I assume that the source gets dispatched again because it was still in the pending queue. I can't reproduce the issue (timing?), but this should fix systemd#17824, https://bugzilla.redhat.com/show_bug.cgi?id=1899264.
iaguis
pushed a commit
that referenced
this pull request
Sep 20, 2023
When exiting PID 1 we most likely don't have stdio/stdout open, so the final LSan check would not print any actionable information and would just crash PID 1 leading up to a kernel panic, which is a bit annoying. Let's instead attempt to open /dev/console, and if we succeed redirect LSan's report there. The result is a bit messy, as it's slightly interleaved with the kernel panic, but it's definitely better than not having the stack trace at all: [ OK ] Reached target final.target. [ OK ] Finished systemd-poweroff.service. [ OK ] Reached target poweroff.target. ================================================================= 3 1m 43.251782] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 [ 43.252838] CPU: 2 PID: 1 Comm: systemd Not tainted 6.4.12-200.fc38.x86_64 #1 ==[1==ERR O R :4 3Le.a2k53562] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 [ 43.254683] Call Trace: [ 43.254911] <TASK> [ 43.255107] dump_stack_lvl+0x47/0x60 S[ a 43.n2555i05] panic+t0x192/0x350 izer[ :43.255966 ] do_exit+0x990/0xdb10 etec[ 43.256504] do_group_exit+0x31/0x80 [ 43.256889] __x64_sys_exit_group+0x18/0x20 [ 43.257288] do_syscall_64+0x60/0x90 o_user_mod leaks[ 43.257618] ? syscall_exit_t +0x2b/0x40 [ 43.258411] ? do_syscall_64+0x6c/0x90 1mDirect le[ 43.258755] ak of 21 byte(s)? exc_page_fault+0x7f/0x180 [ 43.259446] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 43.259901] RiIP: 0033:0x7f357nb8f3ad4 1 objec[ 43.260354] Ctode: 48 89 (f7 0f 05 c3 sf3 0f 1e fa b8 3b 00 00 00) 0f 05 c3 0f 1f 4 0 00 f3 0f 1e fa 50 58 b8 e7 00 00 00 48 83 ec 08 48 63 ff 0f 051 [ 43.262581] RSP: 002b:00007ffc25872440 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 a RBX: 00007f357be9b218 RCX: 00007f357b8f3ad4m:ffd [ 43.264512] RDX: 0000000000000001 RSI: 00007f357b933b63 RDI: 0000000000000001 [ 43.265355] RBP: 00007f357be9b218 R08: efffffffffffffff R09: 00007ffc258721ef [ 43.266191] R10: 000000000000003f R11: 0000000000000202 R12: 00000fe6ae9e0000 [ 43.266891] R13: 00007f3574f00000 R14: 0000000000000000 R15: 0000000000000007 [ 43.267517] </TASK> #0 0x7f357b8814a8 in strdup (/lib64/libasan.so.8+0x814a8) (BuildId: e5f0a0d511a659fbc47bf41072869139cb2db47f) #1 0x7f3578d43317 in cg_path_decode_unit ../src/basic/cgroup-util.c:1132 #2 0x7f3578d43936 in cg_path_get_unit ../src/basic/cgroup-util.c:1190 #3 0x7f3578d440f6 in cg_pid_get_unit ../src/basic/cgroup-util.c:1234 #4 0x7f35789263d7 in bus_log_caller ../src/shared/bus-util.c:734 #5 0x7f357a9cf10a in method_reload ../src/core/dbus-manager.c:1621 #6 0x7f3578f77497 in method_callbacks_run ../src/libsystemd/sd-bus/bus-objects.c:406 #7 0x7f3578f80dd8 in object_find_and_run ../src/libsystemd/sd-bus/bus-objects.c:1319 systemd#8 0x7f3578f82487 in bus_process_object ../src/libsystemd/sd-bus/bus-objects.c:1439 systemd#9 0x7f3578fe41f1 in process_message ../src/libsystemd/sd-bus/sd-bus.c:3007 systemd#10 0x7f3578fe477b in process_running ../src/libsystemd/sd-bus/sd-bus.c:3049 systemd#11 0x7f3578fe75d1 in bus_process_internal ../src/libsystemd/sd-bus/sd-bus.c:3269 systemd#12 0x7f3578fe776e in sd_bus_process ../src/libsystemd/sd-bus/sd-bus.c:3296 systemd#13 0x7f3578feaedc in io_callback ../src/libsystemd/sd-bus/sd-bus.c:3638 systemd#14 0x7f35791c2f68 in source_dispatch ../src/libsystemd/sd-event/sd-event.c:4187 systemd#15 0x7f35791cc6f9 in sd_event_dispatch ../src/libsystemd/sd-event/sd-event.c:4808 systemd#16 0x7f35791cd830 in sd_event_run ../src/libsystemd/sd-event/sd-event.c:4869 systemd#17 0x7f357abcd572 in manager_loop ../src/core/manager.c:3244 systemd#18 0x41db21 in invoke_main_loop ../src/core/main.c:1960 systemd#19 0x426615 in main ../src/core/main.c:3125 systemd#20 0x7f3577c49b49 in __libc_start_call_main (/lib64/libc.so.6+0x27b49) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9) systemd#21 0x7f3577c49c0a in __libc_start_main_alias_2 (/lib64/libc.so.6+0x27c0a) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9) systemd#22 0x408494 in _start (/usr/lib/systemd/systemd+0x408494) (BuildId: fe61e1b0f00b6a36aa34e707a98c15c52f6b960a) SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). [ 43.295912] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 43.297036] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]--- Originally noticed in systemd#28579.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds the
RestrictFileSystems=property. When used, processesbelonging to a service are only able to access the filesystems listed in the
property.
This is implemented by attaching a BPF program to the
file_openBPF LSM hook.The program is attached at boot time and stays there forever. Then, when a
service specifying the
RestrictFileSystems=property is started, an entry isadded to a global hash of maps BPF map pinned to the BPF filesystem under
/sys/fs/bpf/systemd/lsm_bpf_map. The map stores a set of filesystemmagic numbers per cgroupID. When a process tries to open a file, the BPF
program is executed and checks the cgroup the process is running in: if an
entry is present in the global map it checks if the filesystem the process is
trying to access is present in the set, if not, it denies access to it.
RestrictFileSystems=is only supported on systems with the LSM BPF hookenabled and using cgroup2 (unified or hybrid).
This PR makes use of the libbpf framework proposed on systemd#17655. Same as that PR,
it requires clang and llvm at compile time, and the
libbpf shared library.
Thanks to the usage of libbpf, the program can use the CO-RE (Compile-Once
Run-Everywhere) technology so it doesn't require kernel headers at runtime to
access internal kernel structures.