Skip to content
This repository was archived by the owner on Nov 15, 2025. It is now read-only.

Conversation

@mauriciovasquezbernal
Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal commented Jan 25, 2021

Add RestrictNetworkInterfaces= option to limit the network interfaces a program in a unit can use.

TODO:

@pothos
Copy link

pothos commented Jan 25, 2021

Regarding your first point: The bpf-firewall code already sets BPF_F_ALLOW_MULTI if custom bpf programs are to be attached or if the cgroup is delegated (flags, few lines above): https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/core/bpf-firewall.c#L710

@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/restrict-network-interfaces branch from 8e621da to fa0046d Compare January 25, 2021 20:56
@mauriciovasquezbernal
Copy link
Member Author

mauriciovasquezbernal commented Jan 25, 2021

Regarding your first point: The bpf-firewall code already sets BPF_F_ALLOW_MULTI if custom bpf programs are to be attached or if the cgroup is delegated (flags, few lines above): https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/core/bpf-firewall.c#L710

I see. I think I can use a similar approach, my only concern is that this is not scalable as the bpf-firewall has to be updated each time a new feature using CGROUP_SKB is used. Do you know what's the reason not to enable MULTI if it's supported instead of having the additional u->type == UNIT_SLICE || unit_cgroup_delegate(u) check?

fbuihuu and others added 17 commits January 26, 2021 11:21
Add a build script to compile bpf source code. A program in restricted
C is compiled into a temporary object file.
Script generates a single C header file with defined
`const unsigned char[]` array (hexdump) representing object file: each
octet of the object file is an element of the array.
To generate C array script reads *.o file in byte mode in chunks of 16
bytes. Each byte is represented as hexadecimal number with 0x prefix
(hence hexdump), a chunk of 16 numbers forms a line of the target file,
e.g:
```
0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00
```

Why hexdump?
- Ready in compile time. No need to distribute *.o files along with
systemd package, the bytecode of BPF program ships with `#include`.
- Transparent and stdout friendly.

In user space hexdump may be included as a regular C header and
read into `struct bpf_object [0] with `bpf_object__open_mem` helper [1].

If build with custom meson build rule, the target header will reside in
build/ directory (not in source tree), e.g the path for allow_bind:
`build/src/core/bpf/allow_bind/allow-bind-hexdump.h`

Summing up, script runs three phases:
* clang to generate LLVM *.bc from restricted C
* llc to compile *.o from *.bc input
* hexdump generation
These phases are logged to stderr for debug purposes.

To include BTF debug information, -g option is passed to clang.

Sample run for `src/core/bpf/allow_bind` program
```
./tools/build-bpf.py --llc_exec /usr/bin/llc --clang_exec /usr/bin/clang
src/core/bpf/allow_bind/allow-bind.c
src/core/bpf/allow_bind/allow-bind-hexdump.h --bpf_hexdump_buffer
allow_bind_hexdump_buffer
DEBUG:root:Generating LLVM bitcode *.bc:
DEBUG:root:/usr/bin/clang -Wno-compare-distinct-pointer-types -O2
-target bpf -emit-llvm -c -D__x86_64__ -I/usr/include/x86_64-linux-gnu/
-I/usr/local/include -I/usr/include -I.
src/core/bpf/allow_bind/allow-bind.c -o /tmp/tmpk8sgch4l.bc
DEBUG:root:Compiling BPF object file:
DEBUG:root:/usr/bin/llc -march=bpf -filetype=obj -o /tmp/tmpk6fxj7j_.o
/tmp/tmpk8sgch4l.bc
DEBUG:root:Generating hexdump for src/core/bpf/allow_bind/allow-bind.c
source from /tmp/tmpk6fxj7j_.o object file
```

[0] https://github.com/libbpf/libbpf/blob/master/src/libbpf.h#L61
[1] https://github.com/libbpf/libbpf/blob/master/src/libbpf.h#L103
* Add `build-bpf` feature gate with 'auto', 'true' and 'false' choices
* Add libbpf [0] dependency
* Search for clang and llc binaries the build environment.

For libbpf [0], make 0.2.0 [1] the minimum required version.
If libbpf is satisfied, set HAVE_LIBBPF config option to 1.

If `build-bpf` feature gate is set to 'auto', whether feature is enabled
or disabled is defined by presence of all of libbpf, clang and llvm in build
environment. With 'auto' all dependencies are optional.
If the gate is set to `true`, make all of the libbpf, clang and llvm
dependencies mandatory.
If it's set to `false`, set `BUILD_BPF` to false and make libbpf
dependency optional.

libbpf dep is dynamic followed by the common pattern in systemd.
find_program doesn't allow to set minimum version similary to
`dependency` option. The most recent BPF features include BTF which
require minimim v.10 LLVM, allow_bind program doesn't use BTF features
and builds with clang and llvm 9.0.
Introduce minimalistic set of helpers for bpf programs compiled from
restricted C sources.

Introduce a basic type `struct BPFProgramV2` with 'fd'
and 'attach_type' fields to represent a loaded bpf prog:wqram.
The BPFProgram struct is not used since:
- v2 methods will use libbpf while v1 use raw syscalls
- the majority of its fields is not needed to support BPF program
compiled from sources
- lack of 'attach_type' field

Introduce bpf_object_{} helpers to load bpf programs into kernel, resize
and populate bpf maps, attach program to cgroup hooks.

libbpf dependency must be satisfied to compile the code.
Introduce cgroup_bpf_attach_programs and
cgroup_bpf_detach_programs helpers iterating over a set of cgroup-bpf
progs defined by BPFProgramV2 type.

If libbpf dependency is not satisfied, return -ENOTSUP.
This commit is a reduced version of jkartseva@431d83f
only containing the changes related to the bpf API.
The code is composed by two BPF_PROG_TYPE_CGROUP_SKB programs that
are loaded in the cgroup inet ingress and egress hooks
(BPF_CGROUP_INET_{INGRESS|EGRESS}).

The decision to let a packet pass or not is based on a map that contains
the ifindexes of the interfaces. The key 0 of the map is used to signal
whether it's an allow or deny-list.

Signed-off-by: Mauricio Vásquez <[email protected]>
This commit adds the following functions()
- restrict_network_interfaces_supported: checks if the kernel has the
features needed to support the BPF programs required.
- restrict_network_interfaces_install() loads and attaches the
RestrictNetworkInterfaces BPF programs.

Signed-off-by: Mauricio Vásquez <[email protected]>
Use the previously introduced functions to load and attach the BPF
programs when a unit is created.

Signed-off-by: Mauricio Vásquez <[email protected]>
Add a unit test that creates a set of veth pairs to test tha the
RestrictNetworkInterfaces= actually blocks traffic in given interfaces.

Signed-off-by: Mauricio Vásquez <[email protected]>
@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/restrict-network-interfaces branch from fa0046d to 347d336 Compare January 26, 2021 15:52
mauriciovasquezbernal pushed a commit that referenced this pull request May 18, 2021
C.f. 9793530.

We'd crash when trying to access an already-deallocated object:

Thread no. 1 (7 frames)
 #2 log_assert_failed_realm at ../src/basic/log.c:844
 #3 event_inotify_data_drop at ../src/libsystemd/sd-event/sd-event.c:3035
 #4 source_dispatch at ../src/libsystemd/sd-event/sd-event.c:3250
 #5 sd_event_dispatch at ../src/libsystemd/sd-event/sd-event.c:3631
 #6 sd_event_run at ../src/libsystemd/sd-event/sd-event.c:3689
 #7 sd_event_loop at ../src/libsystemd/sd-event/sd-event.c:3711
 systemd#8 run at ../src/home/homed.c:47

The source in question is an inotify source, and the messages are:

systemd-homed[1340]: /home/ moved or renamed, recreating watch and rescanning.
systemd-homed[1340]: Assertion '*_head == _item' failed at src/libsystemd/sd-event/sd-event.c:3035, function event_inotify_data_drop(). Aborting.

on_home_inotify() got called, then manager_watch_home(), which unrefs the
existing inotify_event_source. I assume that the source gets dispatched again
because it was still in the pending queue.

I can't reproduce the issue (timing?), but this should
fix systemd#17824, https://bugzilla.redhat.com/show_bug.cgi?id=1899264.
iaguis pushed a commit that referenced this pull request Sep 20, 2023
When exiting PID 1 we most likely don't have stdio/stdout open, so the
final LSan check would not print any actionable information and would
just crash PID 1 leading up to a kernel panic, which is a bit annoying.
Let's instead attempt to open /dev/console, and if we succeed redirect
LSan's report there.

The result is a bit messy, as it's slightly interleaved with the kernel
panic, but it's definitely better than not having the stack trace at
all:

[  OK  ] Reached target final.target.
[  OK  ] Finished systemd-poweroff.service.
[  OK  ] Reached target poweroff.target.

=================================================================
3 1m  43.251782] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[   43.252838] CPU: 2 PID: 1 Comm: systemd Not tainted 6.4.12-200.fc38.x86_64 #1
==[1==ERR O R :4 3Le.a2k53562] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[   43.254683] Call Trace:
[   43.254911]  <TASK>
[   43.255107]  dump_stack_lvl+0x47/0x60
S[ a  43.n2555i05]  panic+t0x192/0x350
izer[   :43.255966 ]  do_exit+0x990/0xdb10
etec[   43.256504]  do_group_exit+0x31/0x80
[   43.256889]  __x64_sys_exit_group+0x18/0x20
[   43.257288]  do_syscall_64+0x60/0x90
o_user_mod leaks[   43.257618]  ? syscall_exit_t

+0x2b/0x40
[   43.258411]  ? do_syscall_64+0x6c/0x90
1mDirect le[   43.258755]  ak of 21 byte(s)? exc_page_fault+0x7f/0x180
[   43.259446]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
 [   43.259901] RiIP: 0033:0x7f357nb8f3ad4
 1 objec[   43.260354] Ctode: 48 89 (f7 0f 05 c3 sf3 0f 1e fa b8 3b 00 00 00) 0f 05 c3 0f 1f 4 0 00 f3 0f 1e fa 50 58 b8 e7 00 00 00 48 83 ec 08 48 63 ff 0f 051
[   43.262581] RSP: 002b:00007ffc25872440 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
a RBX: 00007f357be9b218 RCX: 00007f357b8f3ad4m:ffd
[   43.264512] RDX: 0000000000000001 RSI: 00007f357b933b63 RDI: 0000000000000001
[   43.265355] RBP: 00007f357be9b218 R08: efffffffffffffff R09: 00007ffc258721ef
[   43.266191] R10: 000000000000003f R11: 0000000000000202 R12: 00000fe6ae9e0000
[   43.266891] R13: 00007f3574f00000 R14: 0000000000000000 R15: 0000000000000007
[   43.267517]  </TASK>

    #0 0x7f357b8814a8 in strdup (/lib64/libasan.so.8+0x814a8) (BuildId: e5f0a0d511a659fbc47bf41072869139cb2db47f)
    #1 0x7f3578d43317 in cg_path_decode_unit ../src/basic/cgroup-util.c:1132
    #2 0x7f3578d43936 in cg_path_get_unit ../src/basic/cgroup-util.c:1190
    #3 0x7f3578d440f6 in cg_pid_get_unit ../src/basic/cgroup-util.c:1234
    #4 0x7f35789263d7 in bus_log_caller ../src/shared/bus-util.c:734
    #5 0x7f357a9cf10a in method_reload ../src/core/dbus-manager.c:1621
    #6 0x7f3578f77497 in method_callbacks_run ../src/libsystemd/sd-bus/bus-objects.c:406
    #7 0x7f3578f80dd8 in object_find_and_run ../src/libsystemd/sd-bus/bus-objects.c:1319
    systemd#8 0x7f3578f82487 in bus_process_object ../src/libsystemd/sd-bus/bus-objects.c:1439
    systemd#9 0x7f3578fe41f1 in process_message ../src/libsystemd/sd-bus/sd-bus.c:3007
    systemd#10 0x7f3578fe477b in process_running ../src/libsystemd/sd-bus/sd-bus.c:3049
    systemd#11 0x7f3578fe75d1 in bus_process_internal ../src/libsystemd/sd-bus/sd-bus.c:3269
    systemd#12 0x7f3578fe776e in sd_bus_process ../src/libsystemd/sd-bus/sd-bus.c:3296
    systemd#13 0x7f3578feaedc in io_callback ../src/libsystemd/sd-bus/sd-bus.c:3638
    systemd#14 0x7f35791c2f68 in source_dispatch ../src/libsystemd/sd-event/sd-event.c:4187
    systemd#15 0x7f35791cc6f9 in sd_event_dispatch ../src/libsystemd/sd-event/sd-event.c:4808
    systemd#16 0x7f35791cd830 in sd_event_run ../src/libsystemd/sd-event/sd-event.c:4869
    systemd#17 0x7f357abcd572 in manager_loop ../src/core/manager.c:3244
    systemd#18 0x41db21 in invoke_main_loop ../src/core/main.c:1960
    systemd#19 0x426615 in main ../src/core/main.c:3125
    systemd#20 0x7f3577c49b49 in __libc_start_call_main (/lib64/libc.so.6+0x27b49) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
    systemd#21 0x7f3577c49c0a in __libc_start_main_alias_2 (/lib64/libc.so.6+0x27c0a) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
    systemd#22 0x408494 in _start (/usr/lib/systemd/systemd+0x408494) (BuildId: fe61e1b0f00b6a36aa34e707a98c15c52f6b960a)

SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s).
[   43.295912] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   43.297036] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]---

Originally noticed in systemd#28579.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants