Add RestrictFileSystems= property using LSM BPF #6

iaguis · 2020-12-11T17:51:57Z

This PR adds the RestrictFileSystems= property. When used, processes
belonging to a service are only able to access the filesystems listed in the
property.

This is implemented by attaching a BPF program to the file_open BPF LSM hook.
The program is attached at boot time and stays there forever. Then, when a
service specifying the RestrictFileSystems= property is started, an entry is
added to a global hash of maps BPF map pinned to the BPF filesystem under
/sys/fs/bpf/systemd/lsm_bpf_map. The map stores a set of filesystem
magic numbers per cgroupID. When a process tries to open a file, the BPF
program is executed and checks the cgroup the process is running in: if an
entry is present in the global map it checks if the filesystem the process is
trying to access is present in the set, if not, it denies access to it.

RestrictFileSystems= is only supported on systems with the LSM BPF hook
enabled and using cgroup2 (unified or hybrid).

This PR makes use of the libbpf framework proposed on systemd#17655. Same as that PR,
it requires clang and llvm at compile time, and the
libbpf shared library.

Thanks to the usage of libbpf, the program can use the CO-RE (Compile-Once
Run-Everywhere) technology so it doesn't require kernel headers at runtime to
access internal kernel structures.

src/core/bpf-lsm.c

src/core/bpf-lsm.h

src/core/bpf-lsm.c

src/core/bpf/restrict_fs/restrict-fs.c

src/shared/exit-status.c

Explicitly document the behavior introduced in systemd#7437: when picking a new UID shift base with "-U", a hash of the machine name will be tried before falling back to fully random UID base candidates.

This commit adds support for disabling the read and write workqueues with the new crypttab options no-read-workqueue and no-write-workqueue. These correspond to the cryptsetup options --perf-no_read_workqueue and --perf-no_write_workqueue respectively.

IPv6 privacy extensions are plural, not singular.

When set to "kernel", systemd is not supposed to touch that sysctl. 5e0534f, part of systemd#17240 forgot to handle that case. Fixes systemd#18003

…kernel network: fix IPv6PrivacyExtensions=kernel

In situations where a service fails to start, systemd suggests the user to use "journalctl -xe" to get details about the failure. While running this command does provide some additional details, most of the information is similar to what was already printed when the service fails. often the actual reason for the failure can be found in the logs of the service that fails to start. This patch updates the wording to suggest using "-u" to view the service logs instead. Signed-off-by: Sebastiaan van Stijn <[email protected]>

Follow-up for 16c89e6.

…to link Fix bug introduced by 2210191.

…address DenyList= filters provided prefixes, not router address. So, RouteDenyLisy= should so for consistency. Fixes 16c89e6.

…tRA]

networkd: add support for prefix allow-list and route allow-list

…_resend() When compiling with CFLAGS='-Werror=maybe-uninitialized -Og' we get a warning about uninitialized "next_timeout" variable. Avoid the warning by adding an (unreachable) "default" label. Fixes: c24288d ("sd-dhcp-client: correct dhcpv4 renew/rebind retransmit timeouts")

Fixes systemd#18078

Let's link the three man pages together more tightly and explain what the two targets are about, emphasizing local/quick/reliable/approximate vs remote/slow/unreliable/accurate synchronization. Follow-up for: 1431b2f fe934b4

Co-authored-by: Alexander Batischev <[email protected]>

man: extend time-{set,sync}.target + systemd-timesyncd/wait-sync docs

* Add `build-bpf` feature gate with 'auto', 'true' and 'false' choices * Add libbpf [0] dependency * Search for clang and llc binaries the build environment. For libbpf [0], make 0.2.0 [1] the minimum required version. If libbpf is satisfied, set HAVE_LIBBPF config option to 1. If `build-bpf` feature gate is set to 'auto', whether feature is enabled or disabled is defined by presence of all of libbpf, clang and llvm in build environment. With 'auto' all dependencies are optional. If the gate is set to `true`, make all of the libbpf, clang and llvm dependencies mandatory. If it's set to `false`, set `BUILD_BPF` to false and make libbpf dependency optional. libbpf dep is dynamic followed by the common pattern in systemd. find_program doesn't allow to set minimum version similary to `dependency` option. The most recent BPF features include BTF which require minimim v.10 LLVM, allow_bind program doesn't use BTF features and builds with clang and llvm 9.0.

Introduce minimalistic set of helpers for bpf programs compiled from restricted C sources. Introduce a basic type `struct BPFProgramV2` with 'fd' and 'attach_type' fields to represent a loaded bpf prog:wqram. The BPFProgram struct is not used since: - v2 methods will use libbpf while v1 use raw syscalls - the majority of its fields is not needed to support BPF program compiled from sources - lack of 'attach_type' field Introduce bpf_object_{} helpers to load bpf programs into kernel, resize and populate bpf maps, attach program to cgroup hooks. libbpf dependency must be satisfied to compile the code.

bpf_object_set_inner_map_fd is needed for hash of maps BPF maps and bpf_object_find_program_by_title is needed because bpf_object_get_programs() doesn't return LSM BPF programs, so we need to get it by name.

They were failing in the CI.

Returns the magic number for each filesystem.

It hooks into the file_open LSM hook and allows only when the filesystem where the open will take place is present in a BPF map for a particular cgroup. The BPF map used is a hash of maps with the following structure: cgroupID -> (s_magic -> uint32) The inner map is effectively a set. When the cgroupID is present in the map, it checks the inner map for the magic number of the filesystem associated with the file that's being opened. If that magic number is present it allows the open to succeed, otherwise it returns -EPERM. If the cgroupID is not present in the map, it allows the open to succeed. The BPF program uses CO-RE (Compile-Once Run-Everywhere) to access internal kernel structures without needing kernel headers present at runtime.

It uses tools/build-bpf.py to compile the BPF program from the sources.

If systemd#17655 gets merged, there's no need to do this and we can use their test. This removes the bpf_object_get_programs() test because LSM programs are not returned by libbpf.

It returns the cgroupID from a cgroup path.

It didn't reflect the current status.

It will be used later.

They link with libcore and libcore is not using libbpf.

This adds 4 functions to implement RestrictFileSystems= * lsm_bpf_supported() checks if LSM BPF is supported. It checks that cgroupv2 is used, that BPF LSM is enabled, and tries to load the BPF LSM program which makes sure BTF and hash of maps are supported, and BPF LSM programs can be loaded. * lsm_bpf_setup() loads and attaches the LSM BPF program. * bpf_restrict_filesystems() populates the hash of maps BPF map with the cgroupID and the set of allowed filesystems. * cleanup_lsm_bpf() removes a cgroupID entry from the hash of maps.

It attaches the LSM BPF program when the system manager starts up. It populates the hash of maps BPF map when services that have RestrictFileSystems= set start. It cleans up the hash of maps when the unit cgroups is pruned.

Services only have access to filesystems that are listed here. Accepts a list of filesystem names.

libbpf is used in core code now, so we need to add it as dependency for tests.

For distros that ship libbpf 0.2.0.

iaguis · 2021-01-06T16:11:06Z

There's an upstream PR now. Closing.

C.f. 9793530. We'd crash when trying to access an already-deallocated object: Thread no. 1 (7 frames) #2 log_assert_failed_realm at ../src/basic/log.c:844 #3 event_inotify_data_drop at ../src/libsystemd/sd-event/sd-event.c:3035 #4 source_dispatch at ../src/libsystemd/sd-event/sd-event.c:3250 #5 sd_event_dispatch at ../src/libsystemd/sd-event/sd-event.c:3631 #6 sd_event_run at ../src/libsystemd/sd-event/sd-event.c:3689 #7 sd_event_loop at ../src/libsystemd/sd-event/sd-event.c:3711 systemd#8 run at ../src/home/homed.c:47 The source in question is an inotify source, and the messages are: systemd-homed[1340]: /home/ moved or renamed, recreating watch and rescanning. systemd-homed[1340]: Assertion '*_head == _item' failed at src/libsystemd/sd-event/sd-event.c:3035, function event_inotify_data_drop(). Aborting. on_home_inotify() got called, then manager_watch_home(), which unrefs the existing inotify_event_source. I assume that the source gets dispatched again because it was still in the pending queue. I can't reproduce the issue (timing?), but this should fix systemd#17824, https://bugzilla.redhat.com/show_bug.cgi?id=1899264.

When exiting PID 1 we most likely don't have stdio/stdout open, so the final LSan check would not print any actionable information and would just crash PID 1 leading up to a kernel panic, which is a bit annoying. Let's instead attempt to open /dev/console, and if we succeed redirect LSan's report there. The result is a bit messy, as it's slightly interleaved with the kernel panic, but it's definitely better than not having the stack trace at all: [ OK ] Reached target final.target. [ OK ] Finished systemd-poweroff.service. [ OK ] Reached target poweroff.target. ================================================================= 3 1m 43.251782] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 [ 43.252838] CPU: 2 PID: 1 Comm: systemd Not tainted 6.4.12-200.fc38.x86_64 #1 ==[1==ERR O R :4 3Le.a2k53562] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 [ 43.254683] Call Trace: [ 43.254911] <TASK> [ 43.255107] dump_stack_lvl+0x47/0x60 S[ a 43.n2555i05] panic+t0x192/0x350 izer[ :43.255966 ] do_exit+0x990/0xdb10 etec[ 43.256504] do_group_exit+0x31/0x80 [ 43.256889] __x64_sys_exit_group+0x18/0x20 [ 43.257288] do_syscall_64+0x60/0x90 o_user_mod leaks[ 43.257618] ? syscall_exit_t +0x2b/0x40 [ 43.258411] ? do_syscall_64+0x6c/0x90 1mDirect le[ 43.258755] ak of 21 byte(s)? exc_page_fault+0x7f/0x180 [ 43.259446] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 43.259901] RiIP: 0033:0x7f357nb8f3ad4 1 objec[ 43.260354] Ctode: 48 89 (f7 0f 05 c3 sf3 0f 1e fa b8 3b 00 00 00) 0f 05 c3 0f 1f 4 0 00 f3 0f 1e fa 50 58 b8 e7 00 00 00 48 83 ec 08 48 63 ff 0f 051 [ 43.262581] RSP: 002b:00007ffc25872440 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 a RBX: 00007f357be9b218 RCX: 00007f357b8f3ad4m:ffd [ 43.264512] RDX: 0000000000000001 RSI: 00007f357b933b63 RDI: 0000000000000001 [ 43.265355] RBP: 00007f357be9b218 R08: efffffffffffffff R09: 00007ffc258721ef [ 43.266191] R10: 000000000000003f R11: 0000000000000202 R12: 00000fe6ae9e0000 [ 43.266891] R13: 00007f3574f00000 R14: 0000000000000000 R15: 0000000000000007 [ 43.267517] </TASK> #0 0x7f357b8814a8 in strdup (/lib64/libasan.so.8+0x814a8) (BuildId: e5f0a0d511a659fbc47bf41072869139cb2db47f) #1 0x7f3578d43317 in cg_path_decode_unit ../src/basic/cgroup-util.c:1132 #2 0x7f3578d43936 in cg_path_get_unit ../src/basic/cgroup-util.c:1190 #3 0x7f3578d440f6 in cg_pid_get_unit ../src/basic/cgroup-util.c:1234 #4 0x7f35789263d7 in bus_log_caller ../src/shared/bus-util.c:734 #5 0x7f357a9cf10a in method_reload ../src/core/dbus-manager.c:1621 #6 0x7f3578f77497 in method_callbacks_run ../src/libsystemd/sd-bus/bus-objects.c:406 #7 0x7f3578f80dd8 in object_find_and_run ../src/libsystemd/sd-bus/bus-objects.c:1319 systemd#8 0x7f3578f82487 in bus_process_object ../src/libsystemd/sd-bus/bus-objects.c:1439 systemd#9 0x7f3578fe41f1 in process_message ../src/libsystemd/sd-bus/sd-bus.c:3007 systemd#10 0x7f3578fe477b in process_running ../src/libsystemd/sd-bus/sd-bus.c:3049 systemd#11 0x7f3578fe75d1 in bus_process_internal ../src/libsystemd/sd-bus/sd-bus.c:3269 systemd#12 0x7f3578fe776e in sd_bus_process ../src/libsystemd/sd-bus/sd-bus.c:3296 systemd#13 0x7f3578feaedc in io_callback ../src/libsystemd/sd-bus/sd-bus.c:3638 systemd#14 0x7f35791c2f68 in source_dispatch ../src/libsystemd/sd-event/sd-event.c:4187 systemd#15 0x7f35791cc6f9 in sd_event_dispatch ../src/libsystemd/sd-event/sd-event.c:4808 systemd#16 0x7f35791cd830 in sd_event_run ../src/libsystemd/sd-event/sd-event.c:4869 systemd#17 0x7f357abcd572 in manager_loop ../src/core/manager.c:3244 systemd#18 0x41db21 in invoke_main_loop ../src/core/main.c:1960 systemd#19 0x426615 in main ../src/core/main.c:3125 systemd#20 0x7f3577c49b49 in __libc_start_call_main (/lib64/libc.so.6+0x27b49) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9) systemd#21 0x7f3577c49c0a in __libc_start_main_alias_2 (/lib64/libc.so.6+0x27c0a) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9) systemd#22 0x408494 in _start (/usr/lib/systemd/systemd+0x408494) (BuildId: fe61e1b0f00b6a36aa34e707a98c15c52f6b960a) SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). [ 43.295912] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 43.297036] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]--- Originally noticed in systemd#28579.

iaguis requested a review from alban December 11, 2020 17:51

iaguis force-pushed the iaguis/lsm-bpf branch 2 times, most recently from 020342d to 85b6a49 Compare December 12, 2020 11:08

alban reviewed Dec 14, 2020

View reviewed changes

iaguis force-pushed the iaguis/lsm-bpf branch 4 times, most recently from b81c860 to b9715e7 Compare December 22, 2020 19:28

github-actions bot added the mkosi label Dec 22, 2020

angdraug and others added 21 commits December 23, 2020 10:18

man/systemd-nspawn: document hashing machine name for uid base

68709a6

Explicitly document the behavior introduced in systemd#7437: when picking a new UID shift base with "-U", a hash of the machine name will be tried before falling back to fully random UID base candidates.

network: fix typo

7eeaf72

IPv6 privacy extensions are plural, not singular.

network: fix IPv6PrivacyExtensions=kernel handling

d3ccb1b

When set to "kernel", systemd is not supposed to touch that sysctl. 5e0534f, part of systemd#17240 forgot to handle that case. Fixes systemd#18003

Merge pull request systemd#18069 from flokli/ipv6-privacy-extensions-…

bc1a4d2

…kernel network: fix IPv6PrivacyExtensions=kernel

network: drop redundant TAKE_PTR()

8c86196

Follow-up for 16c89e6.

sd-ndisc: fix indentation

0afa4d5

network: fix condition for checking the provided gateway is assigned …

1cd5267

…to link Fix bug introduced by 2210191.

network: make RouteDenyList= filter route prefix rather than gateway …

19e334b

…address DenyList= filters provided prefixes, not router address. So, RouteDenyLisy= should so for consistency. Fixes 16c89e6.

network: rename DenyList= -> PrefixDenyList=

3f0af4a

networkd: add support for prefix allow-list and route allow-list

de6b6ff

network: introduce RouterAllowList= and RouterDenyList= in [IPv6Accep…

75d2641

…tRA]

test-network: add tests for [IPv6AcceptRA] PrefixDenyList= or friends

635f2a6

Merge pull request systemd#18021 from ssahani/route-allow-list

b945573

networkd: add support for prefix allow-list and route allow-list

shared/dns: fix dlopen_idn return code check

5def1f1

Fixes systemd#18078

man: apply @Minoru's suggestions from code review

57b3b8f

Co-authored-by: Alexander Batischev <[email protected]>

Merge pull request systemd#18048 from poettering/timesync-man-more

d514454

man: extend time-{set,sync}.target + systemd-timesyncd/wait-sync docs

network: set FRA_PROTOCOL to RTPROT_STATIC by default

1e5fd32

Julia Kartseva and others added 3 commits January 6, 2021 13:40

bpf-object: add set inner map fd and find prog by title functions

90ecc23

bpf_object_set_inner_map_fd is needed for hash of maps BPF maps and bpf_object_find_program_by_title is needed because bpf_object_get_programs() doesn't return LSM BPF programs, so we need to get it by name.

iaguis force-pushed the iaguis/lsm-bpf branch from 4a6e0c0 to 07c99f3 Compare January 6, 2021 12:41

github-actions bot added the hwdb label Jan 6, 2021

iaguis added 20 commits January 6, 2021 17:10

missing: add several filesystems

8574d0c

They were failing in the CI.

stat-util: add fs_type_from_string()

efc77dc

Returns the magic number for each filesystem.

bpf: add meson build rules for the restrict_fs program

a7c834c

It uses tools/build-bpf.py to compile the BPF program from the sources.

test: use restrict-fs BPF program to test bpf_object functions

df04d6c

If systemd#17655 gets merged, there's no need to do this and we can use their test. This removes the bpf_object_get_programs() test because LSM programs are not returned by libbpf.

cgroup-util: add cg_path_get_cgroupid()

9a1706d

It returns the cgroupID from a cgroup path.

exit-status: fix mappings comment

32366ab

It didn't reflect the current status.

exit-status: add EXIT_BPF

5de75b3

It will be used later.

meson: add libbpf deps for systemd and systemd-analyze

c4f68ee

They link with libcore and libcore is not using libbpf.

core: use LSM BPF functions to implement RestrictFileSystems=

9367152

It attaches the LSM BPF program when the system manager starts up. It populates the hash of maps BPF map when services that have RestrictFileSystems= set start. It cleans up the hash of maps when the unit cgroups is pruned.

core: add RestrictFileSystems= fragment parser

3e6cb4b

Services only have access to filesystems that are listed here. Accepts a list of filesystem names.

core: add dbus RestrictFileSystems= properties

c77cf8e

test: add libbpf dependency

730fca2

libbpf is used in core code now, so we need to add it as dependency for tests.

mkosi: add libbpf dependency

786e6b5

For distros that ship libbpf 0.2.0.

man: add RestrictFileSystems= documentation

1ab39f4

test: add test-bpf-lsm

6df76d1

README: add missing BPF requirements

6eb3474

README: document LSM BPF requirements

d649a10

test/fuzz: add RestrictFileSystems= directive

6a2525d

iaguis force-pushed the iaguis/lsm-bpf branch from 07c99f3 to 6a2525d Compare January 6, 2021 16:10

iaguis closed this Jan 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add RestrictFileSystems= property using LSM BPF #6

Add RestrictFileSystems= property using LSM BPF #6

Uh oh!

iaguis commented Dec 11, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iaguis commented Jan 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Add RestrictFileSystems= property using LSM BPF #6

Add RestrictFileSystems= property using LSM BPF #6

Uh oh!

Conversation

iaguis commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iaguis commented Jan 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

iaguis commented Dec 11, 2020 •

edited

Loading