-
Notifications
You must be signed in to change notification settings - Fork 0
Add RestrictNetworkInterfaces= option #7
Conversation
|
Regarding your first point: The bpf-firewall code already sets |
Assorted CI tweaks
8e621da to
fa0046d
Compare
I see. I think I can use a similar approach, my only concern is that this is not scalable as the bpf-firewall has to be updated each time a new feature using CGROUP_SKB is used. Do you know what's the reason not to enable MULTI if it's supported instead of having the additional |
Add a build script to compile bpf source code. A program in restricted C is compiled into a temporary object file. Script generates a single C header file with defined `const unsigned char[]` array (hexdump) representing object file: each octet of the object file is an element of the array. To generate C array script reads *.o file in byte mode in chunks of 16 bytes. Each byte is represented as hexadecimal number with 0x prefix (hence hexdump), a chunk of 16 numbers forms a line of the target file, e.g: ``` 0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ``` Why hexdump? - Ready in compile time. No need to distribute *.o files along with systemd package, the bytecode of BPF program ships with `#include`. - Transparent and stdout friendly. In user space hexdump may be included as a regular C header and read into `struct bpf_object [0] with `bpf_object__open_mem` helper [1]. If build with custom meson build rule, the target header will reside in build/ directory (not in source tree), e.g the path for allow_bind: `build/src/core/bpf/allow_bind/allow-bind-hexdump.h` Summing up, script runs three phases: * clang to generate LLVM *.bc from restricted C * llc to compile *.o from *.bc input * hexdump generation These phases are logged to stderr for debug purposes. To include BTF debug information, -g option is passed to clang. Sample run for `src/core/bpf/allow_bind` program ``` ./tools/build-bpf.py --llc_exec /usr/bin/llc --clang_exec /usr/bin/clang src/core/bpf/allow_bind/allow-bind.c src/core/bpf/allow_bind/allow-bind-hexdump.h --bpf_hexdump_buffer allow_bind_hexdump_buffer DEBUG:root:Generating LLVM bitcode *.bc: DEBUG:root:/usr/bin/clang -Wno-compare-distinct-pointer-types -O2 -target bpf -emit-llvm -c -D__x86_64__ -I/usr/include/x86_64-linux-gnu/ -I/usr/local/include -I/usr/include -I. src/core/bpf/allow_bind/allow-bind.c -o /tmp/tmpk8sgch4l.bc DEBUG:root:Compiling BPF object file: DEBUG:root:/usr/bin/llc -march=bpf -filetype=obj -o /tmp/tmpk6fxj7j_.o /tmp/tmpk8sgch4l.bc DEBUG:root:Generating hexdump for src/core/bpf/allow_bind/allow-bind.c source from /tmp/tmpk6fxj7j_.o object file ``` [0] https://github.com/libbpf/libbpf/blob/master/src/libbpf.h#L61 [1] https://github.com/libbpf/libbpf/blob/master/src/libbpf.h#L103
* Add `build-bpf` feature gate with 'auto', 'true' and 'false' choices * Add libbpf [0] dependency * Search for clang and llc binaries the build environment. For libbpf [0], make 0.2.0 [1] the minimum required version. If libbpf is satisfied, set HAVE_LIBBPF config option to 1. If `build-bpf` feature gate is set to 'auto', whether feature is enabled or disabled is defined by presence of all of libbpf, clang and llvm in build environment. With 'auto' all dependencies are optional. If the gate is set to `true`, make all of the libbpf, clang and llvm dependencies mandatory. If it's set to `false`, set `BUILD_BPF` to false and make libbpf dependency optional. libbpf dep is dynamic followed by the common pattern in systemd. find_program doesn't allow to set minimum version similary to `dependency` option. The most recent BPF features include BTF which require minimim v.10 LLVM, allow_bind program doesn't use BTF features and builds with clang and llvm 9.0.
Introduce minimalistic set of helpers for bpf programs compiled from
restricted C sources.
Introduce a basic type `struct BPFProgramV2` with 'fd'
and 'attach_type' fields to represent a loaded bpf prog:wqram.
The BPFProgram struct is not used since:
- v2 methods will use libbpf while v1 use raw syscalls
- the majority of its fields is not needed to support BPF program
compiled from sources
- lack of 'attach_type' field
Introduce bpf_object_{} helpers to load bpf programs into kernel, resize
and populate bpf maps, attach program to cgroup hooks.
libbpf dependency must be satisfied to compile the code.
Introduce cgroup_bpf_attach_programs and cgroup_bpf_detach_programs helpers iterating over a set of cgroup-bpf progs defined by BPFProgramV2 type. If libbpf dependency is not satisfied, return -ENOTSUP.
This commit is a reduced version of jkartseva@431d83f only containing the changes related to the bpf API.
The code is composed by two BPF_PROG_TYPE_CGROUP_SKB programs that
are loaded in the cgroup inet ingress and egress hooks
(BPF_CGROUP_INET_{INGRESS|EGRESS}).
The decision to let a packet pass or not is based on a map that contains
the ifindexes of the interfaces. The key 0 of the map is used to signal
whether it's an allow or deny-list.
Signed-off-by: Mauricio Vásquez <[email protected]>
This commit adds the following functions() - restrict_network_interfaces_supported: checks if the kernel has the features needed to support the BPF programs required. - restrict_network_interfaces_install() loads and attaches the RestrictNetworkInterfaces BPF programs. Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Mauricio Vásquez <[email protected]>
Use the previously introduced functions to load and attach the BPF programs when a unit is created. Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Mauricio Vásquez <[email protected]>
Add a unit test that creates a set of veth pairs to test tha the RestrictNetworkInterfaces= actually blocks traffic in given interfaces. Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Mauricio Vásquez <[email protected]>
Signed-off-by: Mauricio Vásquez <[email protected]>
fa0046d to
347d336
Compare
C.f. 9793530. We'd crash when trying to access an already-deallocated object: Thread no. 1 (7 frames) #2 log_assert_failed_realm at ../src/basic/log.c:844 #3 event_inotify_data_drop at ../src/libsystemd/sd-event/sd-event.c:3035 #4 source_dispatch at ../src/libsystemd/sd-event/sd-event.c:3250 #5 sd_event_dispatch at ../src/libsystemd/sd-event/sd-event.c:3631 #6 sd_event_run at ../src/libsystemd/sd-event/sd-event.c:3689 #7 sd_event_loop at ../src/libsystemd/sd-event/sd-event.c:3711 systemd#8 run at ../src/home/homed.c:47 The source in question is an inotify source, and the messages are: systemd-homed[1340]: /home/ moved or renamed, recreating watch and rescanning. systemd-homed[1340]: Assertion '*_head == _item' failed at src/libsystemd/sd-event/sd-event.c:3035, function event_inotify_data_drop(). Aborting. on_home_inotify() got called, then manager_watch_home(), which unrefs the existing inotify_event_source. I assume that the source gets dispatched again because it was still in the pending queue. I can't reproduce the issue (timing?), but this should fix systemd#17824, https://bugzilla.redhat.com/show_bug.cgi?id=1899264.
When exiting PID 1 we most likely don't have stdio/stdout open, so the final LSan check would not print any actionable information and would just crash PID 1 leading up to a kernel panic, which is a bit annoying. Let's instead attempt to open /dev/console, and if we succeed redirect LSan's report there. The result is a bit messy, as it's slightly interleaved with the kernel panic, but it's definitely better than not having the stack trace at all: [ OK ] Reached target final.target. [ OK ] Finished systemd-poweroff.service. [ OK ] Reached target poweroff.target. ================================================================= 3 1m 43.251782] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 [ 43.252838] CPU: 2 PID: 1 Comm: systemd Not tainted 6.4.12-200.fc38.x86_64 #1 ==[1==ERR O R :4 3Le.a2k53562] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 [ 43.254683] Call Trace: [ 43.254911] <TASK> [ 43.255107] dump_stack_lvl+0x47/0x60 S[ a 43.n2555i05] panic+t0x192/0x350 izer[ :43.255966 ] do_exit+0x990/0xdb10 etec[ 43.256504] do_group_exit+0x31/0x80 [ 43.256889] __x64_sys_exit_group+0x18/0x20 [ 43.257288] do_syscall_64+0x60/0x90 o_user_mod leaks[ 43.257618] ? syscall_exit_t +0x2b/0x40 [ 43.258411] ? do_syscall_64+0x6c/0x90 1mDirect le[ 43.258755] ak of 21 byte(s)? exc_page_fault+0x7f/0x180 [ 43.259446] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 43.259901] RiIP: 0033:0x7f357nb8f3ad4 1 objec[ 43.260354] Ctode: 48 89 (f7 0f 05 c3 sf3 0f 1e fa b8 3b 00 00 00) 0f 05 c3 0f 1f 4 0 00 f3 0f 1e fa 50 58 b8 e7 00 00 00 48 83 ec 08 48 63 ff 0f 051 [ 43.262581] RSP: 002b:00007ffc25872440 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 a RBX: 00007f357be9b218 RCX: 00007f357b8f3ad4m:ffd [ 43.264512] RDX: 0000000000000001 RSI: 00007f357b933b63 RDI: 0000000000000001 [ 43.265355] RBP: 00007f357be9b218 R08: efffffffffffffff R09: 00007ffc258721ef [ 43.266191] R10: 000000000000003f R11: 0000000000000202 R12: 00000fe6ae9e0000 [ 43.266891] R13: 00007f3574f00000 R14: 0000000000000000 R15: 0000000000000007 [ 43.267517] </TASK> #0 0x7f357b8814a8 in strdup (/lib64/libasan.so.8+0x814a8) (BuildId: e5f0a0d511a659fbc47bf41072869139cb2db47f) #1 0x7f3578d43317 in cg_path_decode_unit ../src/basic/cgroup-util.c:1132 #2 0x7f3578d43936 in cg_path_get_unit ../src/basic/cgroup-util.c:1190 #3 0x7f3578d440f6 in cg_pid_get_unit ../src/basic/cgroup-util.c:1234 #4 0x7f35789263d7 in bus_log_caller ../src/shared/bus-util.c:734 #5 0x7f357a9cf10a in method_reload ../src/core/dbus-manager.c:1621 #6 0x7f3578f77497 in method_callbacks_run ../src/libsystemd/sd-bus/bus-objects.c:406 #7 0x7f3578f80dd8 in object_find_and_run ../src/libsystemd/sd-bus/bus-objects.c:1319 systemd#8 0x7f3578f82487 in bus_process_object ../src/libsystemd/sd-bus/bus-objects.c:1439 systemd#9 0x7f3578fe41f1 in process_message ../src/libsystemd/sd-bus/sd-bus.c:3007 systemd#10 0x7f3578fe477b in process_running ../src/libsystemd/sd-bus/sd-bus.c:3049 systemd#11 0x7f3578fe75d1 in bus_process_internal ../src/libsystemd/sd-bus/sd-bus.c:3269 systemd#12 0x7f3578fe776e in sd_bus_process ../src/libsystemd/sd-bus/sd-bus.c:3296 systemd#13 0x7f3578feaedc in io_callback ../src/libsystemd/sd-bus/sd-bus.c:3638 systemd#14 0x7f35791c2f68 in source_dispatch ../src/libsystemd/sd-event/sd-event.c:4187 systemd#15 0x7f35791cc6f9 in sd_event_dispatch ../src/libsystemd/sd-event/sd-event.c:4808 systemd#16 0x7f35791cd830 in sd_event_run ../src/libsystemd/sd-event/sd-event.c:4869 systemd#17 0x7f357abcd572 in manager_loop ../src/core/manager.c:3244 systemd#18 0x41db21 in invoke_main_loop ../src/core/main.c:1960 systemd#19 0x426615 in main ../src/core/main.c:3125 systemd#20 0x7f3577c49b49 in __libc_start_call_main (/lib64/libc.so.6+0x27b49) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9) systemd#21 0x7f3577c49c0a in __libc_start_main_alias_2 (/lib64/libc.so.6+0x27c0a) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9) systemd#22 0x408494 in _start (/usr/lib/systemd/systemd+0x408494) (BuildId: fe61e1b0f00b6a36aa34e707a98c15c52f6b960a) SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). [ 43.295912] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 43.297036] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]--- Originally noticed in systemd#28579.
Add
RestrictNetworkInterfaces=option to limit the network interfaces a program in a unit can use.TODO:
RestrictNetworkIntferfacesboth programs should useBPF_F_ALLOW_MULTIbut it's not used by bpf-firewall: systemd@acf7f25.