Skip to content
This repository was archived by the owner on Nov 15, 2025. It is now read-only.

Conversation

@sayanchowdhury
Copy link
Member

Sync v245-flatcar branch to v245-stable

Pull in the changes from v245-stable to fix the issues in the stable build

keszybz and others added 4 commits September 21, 2020 09:28
We never return anything higher than 63, so using "long unsigned"
as the type only confused the reader. (We can still use "long unsigned"
and safe_atolu() to parse the kernel file.)

(cherry picked from commit 864a25d)
Up to now the capability CAP_SETPCAP was raised implicitly in the
function capability_bounding_set_drop.

This functionality is moved into a new function
(capability_gain_cap_setpcap).

The new function optionally provides the capability set as it was
before raisining CAP_SETPCAP.

(cherry picked from commit 57d4d28)
(cherry picked from commit 4f69254)
Desired functionality:
Set securebits for services started as non-root user.

Failure:
The starting of the service fails if no ambient capability shall be
raised.
... systemd[217941]: ...: Failed to set process secure bits: Operation
not permitted
... systemd[217941]: ...: Failed at step SECUREBITS spawning
/usr/bin/abc.service: Operation not permitted
... systemd[1]: abc.service: Failed with result 'exit-code'.

Reason:
For setting securebits the capability CAP_SETPCAP is required. However
the securebits (if no ambient capability shall be raised) are set after
setresuid.
When setresuid is invoked all capabilities are dropped from the
permitted, effective and ambient capability set. If the securebit
SECBIT_KEEP_CAPS is set the permitted capability set is retained, but
the effective and the ambient set are cleared.

If ambient capabilities shall be set, the securebit SECBIT_KEEP_CAPS is
added to the securebits configured in the service file and set together
with the securebits from the service file before setresuid is executed
(in enforce_user).
Before setresuid is executed the capabilities are the same as for pid1.
This means that all capabilities in the effective, permitted and
bounding set are set. Thus the capability CAP_SETPCAP is in the
effective set and the prctl(PR_SET_SECUREBITS, ...) succeeds.
However, if the secure bits aren't set before setresuid is invoked they
shall be set shortly after the uid change in enforce_user.
This fails as SECBIT_KEEP_CAPS wasn't set before setresuid and in
consequence the effective and permitted set was cleared, hence
CAP_SETPCAP is not set in the effective set (and cannot be raised any
longer) and prctl(PR_SET_SECUREBITS, ...) failes with EPERM.

Proposed solution:
The proposed solution consists of three parts
1. Check in enforce_user, if securebits are configured in the service
   file. If securebits are configured, set SECBIT_KEEP_CAPS
   before invoking setresuid.
2. Don't set any other securebits than SECBIT_KEEP_CAPS in enforce_user,
   but set all requested ones after enforce_user.
   This has the advantage that securebits are set at the same place for
   root and non-root services.
3. Raise CAP_SETPCAP to the effective set (if not already set) before
   setting the securebits to avoid EPERM during the prctl syscall.

For gaining CAP_SETPCAP the function capability_bounding_set_drop is
splitted into two functions:
- The first one raises CAP_SETPCAP (required for dropping bounding
  capabilities)
- The second drops the bounding capabilities

Why are ambient capabilities not affected by this change?
Ambient capabilities get cleared during setresuid, no matter if
SECBIT_KEEP_CAPS is set or not.
For raising ambient capabilities for a user different to root, the
requested capability has to be raised in the inheritable set first. Then
the SECBIT_KEEP_CAPS securebit needs to be set before setresuid is
invoked. Afterwards the ambient capability can be raised, because it is
in the inheritable and permitted set.

Security considerations:
Although the manpage is ambiguous SECBIT_KEEP_CAPS is cleared during
execve no matter if SECBIT_KEEP_CAPS_LOCKED is set or not. If both are
set only SECBIT_KEEP_CAPS_LOCKED is set after execve.
Setting SECBIT_KEEP_CAPS in enforce_user for being able to set
securebits is no security risk, as the effective and permitted set are
set to the value of the ambient set during execve (if the executed file
has no file capabilities. For details check man 7 capabilities).

Remark:
In capability-util.c is a comment complaining about the missing
capability CAP_SETPCAP in the effective set, after the kernel executed
/sbin/init. Thus it is checked there if this capability has to be raised
in the effective set before dropping capabilities from the bounding set.
If this were true all the time, ambient capabilities couldn't be set
without dropping at least one capability from the bounding set, as the
capability CAP_SETPCAP would miss and setting SECBIT_KEEP_CAPS would
fail with EPERM.

(cherry picked from commit dbdc409)
(cherry picked from commit ab6fcd9)
@sayanchowdhury sayanchowdhury requested a review from a team November 18, 2020 10:30
@sayanchowdhury sayanchowdhury merged commit 29cb7c4 into v245-flatcar Nov 18, 2020
@sayanchowdhury sayanchowdhury deleted the v245-flatcar-sync branch November 18, 2020 15:09
iaguis pushed a commit that referenced this pull request Nov 26, 2020
This lets the libc/xcrypt allocate as much storage area as it needs.
Should fix systemd#16965:

testsuite-46.sh[74]: ==74==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f3e972e1080 at pc 0x7f3e9be8deed bp 0x7ffce4f28530 sp 0x7ffce4f27ce0
testsuite-46.sh[74]: WRITE of size 131232 at 0x7f3e972e1080 thread T0
testsuite-46.sh[74]:     #0 0x7f3e9be8deec  (/usr/lib/clang/10.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x9feec)
testsuite-46.sh[74]:     #1 0x559cd05a6412 in user_record_make_hashed_password /systemd-meson-build/../build/src/home/user-record-util.c:818:21
testsuite-46.sh[74]:     #2 0x559cd058fb03 in create_home /systemd-meson-build/../build/src/home/homectl.c:1112:29
testsuite-46.sh[74]:     #3 0x7f3e9b5b3058 in dispatch_verb /systemd-meson-build/../build/src/shared/verbs.c:103:24
testsuite-46.sh[74]:     #4 0x559cd058c101 in run /systemd-meson-build/../build/src/home/homectl.c:3325:16
testsuite-46.sh[74]:     #5 0x559cd058c00a in main /systemd-meson-build/../build/src/home/homectl.c:3328:1
testsuite-46.sh[74]:     #6 0x7f3e9a88b151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
testsuite-46.sh[74]:     #7 0x559cd0583e7d in _start (/usr/bin/homectl+0x24e7d)
testsuite-46.sh[74]: Address 0x7f3e972e1080 is located in stack of thread T0 at offset 32896 in frame
testsuite-46.sh[74]:     #0 0x559cd05a60df in user_record_make_hashed_password /systemd-meson-build/../build/src/home/user-record-util.c:789
testsuite-46.sh[74]:   This frame has 6 object(s):
testsuite-46.sh[74]:     [32, 40) 'priv' (line 790)
testsuite-46.sh[74]:     [64, 72) 'np' (line 791)
testsuite-46.sh[74]:     [96, 104) 'salt' (line 809)
testsuite-46.sh[74]:     [128, 32896) 'cd' (line 810)
testsuite-46.sh[74]:     [33152, 33168) '.compoundliteral' <== Memory access at offset 32896 partially underflows this variable
testsuite-46.sh[74]:     [33184, 33192) 'new_array' (line 832) <== Memory access at offset 32896 partially underflows this variable
testsuite-46.sh[74]: HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
testsuite-46.sh[74]:       (longjmp and C++ exceptions *are* supported)
testsuite-46.sh[74]: SUMMARY: AddressSanitizer: stack-buffer-overflow (/usr/lib/clang/10.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x9feec)

It seems 'struct crypt_data' is 32896 bytes, but libclang_rt wants more, at least 33168?
mauriciovasquezbernal pushed a commit that referenced this pull request May 18, 2021
C.f. 9793530.

We'd crash when trying to access an already-deallocated object:

Thread no. 1 (7 frames)
 #2 log_assert_failed_realm at ../src/basic/log.c:844
 #3 event_inotify_data_drop at ../src/libsystemd/sd-event/sd-event.c:3035
 #4 source_dispatch at ../src/libsystemd/sd-event/sd-event.c:3250
 #5 sd_event_dispatch at ../src/libsystemd/sd-event/sd-event.c:3631
 #6 sd_event_run at ../src/libsystemd/sd-event/sd-event.c:3689
 #7 sd_event_loop at ../src/libsystemd/sd-event/sd-event.c:3711
 systemd#8 run at ../src/home/homed.c:47

The source in question is an inotify source, and the messages are:

systemd-homed[1340]: /home/ moved or renamed, recreating watch and rescanning.
systemd-homed[1340]: Assertion '*_head == _item' failed at src/libsystemd/sd-event/sd-event.c:3035, function event_inotify_data_drop(). Aborting.

on_home_inotify() got called, then manager_watch_home(), which unrefs the
existing inotify_event_source. I assume that the source gets dispatched again
because it was still in the pending queue.

I can't reproduce the issue (timing?), but this should
fix systemd#17824, https://bugzilla.redhat.com/show_bug.cgi?id=1899264.
mauriciovasquezbernal pushed a commit that referenced this pull request May 18, 2021
When trying to calculate the next firing of 'Sun *-*-* 01:00:00', we'd fall
into an infinite loop, because mktime() moves us "backwards":

Before this patch:
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
...

We rely on mktime() normalizing the time. The man page does not say that it'll
move the time forward, but our algorithm relies on this. So let's catch this
case explicitly.

With this patch:
$ TZ=Europe/Dublin faketime 2021-03-21 build/systemd-analyze calendar --iterations=5 'Sun *-*-* 01:00:00'
Normalized form: Sun *-*-* 01:00:00
    Next elapse: Sun 2021-03-21 01:00:00 GMT
       (in UTC): Sun 2021-03-21 01:00:00 UTC
       From now: 59min left
       Iter. #2: Sun 2021-04-04 01:00:00 IST
       (in UTC): Sun 2021-04-04 00:00:00 UTC
       From now: 1 weeks 6 days left           <---- note the 2 week jump here
       Iter. #3: Sun 2021-04-11 01:00:00 IST
       (in UTC): Sun 2021-04-11 00:00:00 UTC
       From now: 2 weeks 6 days left
       Iter. #4: Sun 2021-04-18 01:00:00 IST
       (in UTC): Sun 2021-04-18 00:00:00 UTC
       From now: 3 weeks 6 days left
       Iter. #5: Sun 2021-04-25 01:00:00 IST
       (in UTC): Sun 2021-04-25 00:00:00 UTC
       From now: 1 months 4 days left

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1941335.
iaguis pushed a commit that referenced this pull request Sep 20, 2023
When exiting PID 1 we most likely don't have stdio/stdout open, so the
final LSan check would not print any actionable information and would
just crash PID 1 leading up to a kernel panic, which is a bit annoying.
Let's instead attempt to open /dev/console, and if we succeed redirect
LSan's report there.

The result is a bit messy, as it's slightly interleaved with the kernel
panic, but it's definitely better than not having the stack trace at
all:

[  OK  ] Reached target final.target.
[  OK  ] Finished systemd-poweroff.service.
[  OK  ] Reached target poweroff.target.

=================================================================
3 1m  43.251782] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[   43.252838] CPU: 2 PID: 1 Comm: systemd Not tainted 6.4.12-200.fc38.x86_64 #1
==[1==ERR O R :4 3Le.a2k53562] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[   43.254683] Call Trace:
[   43.254911]  <TASK>
[   43.255107]  dump_stack_lvl+0x47/0x60
S[ a  43.n2555i05]  panic+t0x192/0x350
izer[   :43.255966 ]  do_exit+0x990/0xdb10
etec[   43.256504]  do_group_exit+0x31/0x80
[   43.256889]  __x64_sys_exit_group+0x18/0x20
[   43.257288]  do_syscall_64+0x60/0x90
o_user_mod leaks[   43.257618]  ? syscall_exit_t

+0x2b/0x40
[   43.258411]  ? do_syscall_64+0x6c/0x90
1mDirect le[   43.258755]  ak of 21 byte(s)? exc_page_fault+0x7f/0x180
[   43.259446]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
 [   43.259901] RiIP: 0033:0x7f357nb8f3ad4
 1 objec[   43.260354] Ctode: 48 89 (f7 0f 05 c3 sf3 0f 1e fa b8 3b 00 00 00) 0f 05 c3 0f 1f 4 0 00 f3 0f 1e fa 50 58 b8 e7 00 00 00 48 83 ec 08 48 63 ff 0f 051
[   43.262581] RSP: 002b:00007ffc25872440 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
a RBX: 00007f357be9b218 RCX: 00007f357b8f3ad4m:ffd
[   43.264512] RDX: 0000000000000001 RSI: 00007f357b933b63 RDI: 0000000000000001
[   43.265355] RBP: 00007f357be9b218 R08: efffffffffffffff R09: 00007ffc258721ef
[   43.266191] R10: 000000000000003f R11: 0000000000000202 R12: 00000fe6ae9e0000
[   43.266891] R13: 00007f3574f00000 R14: 0000000000000000 R15: 0000000000000007
[   43.267517]  </TASK>

    #0 0x7f357b8814a8 in strdup (/lib64/libasan.so.8+0x814a8) (BuildId: e5f0a0d511a659fbc47bf41072869139cb2db47f)
    #1 0x7f3578d43317 in cg_path_decode_unit ../src/basic/cgroup-util.c:1132
    #2 0x7f3578d43936 in cg_path_get_unit ../src/basic/cgroup-util.c:1190
    #3 0x7f3578d440f6 in cg_pid_get_unit ../src/basic/cgroup-util.c:1234
    #4 0x7f35789263d7 in bus_log_caller ../src/shared/bus-util.c:734
    #5 0x7f357a9cf10a in method_reload ../src/core/dbus-manager.c:1621
    #6 0x7f3578f77497 in method_callbacks_run ../src/libsystemd/sd-bus/bus-objects.c:406
    #7 0x7f3578f80dd8 in object_find_and_run ../src/libsystemd/sd-bus/bus-objects.c:1319
    systemd#8 0x7f3578f82487 in bus_process_object ../src/libsystemd/sd-bus/bus-objects.c:1439
    systemd#9 0x7f3578fe41f1 in process_message ../src/libsystemd/sd-bus/sd-bus.c:3007
    systemd#10 0x7f3578fe477b in process_running ../src/libsystemd/sd-bus/sd-bus.c:3049
    systemd#11 0x7f3578fe75d1 in bus_process_internal ../src/libsystemd/sd-bus/sd-bus.c:3269
    systemd#12 0x7f3578fe776e in sd_bus_process ../src/libsystemd/sd-bus/sd-bus.c:3296
    systemd#13 0x7f3578feaedc in io_callback ../src/libsystemd/sd-bus/sd-bus.c:3638
    systemd#14 0x7f35791c2f68 in source_dispatch ../src/libsystemd/sd-event/sd-event.c:4187
    systemd#15 0x7f35791cc6f9 in sd_event_dispatch ../src/libsystemd/sd-event/sd-event.c:4808
    systemd#16 0x7f35791cd830 in sd_event_run ../src/libsystemd/sd-event/sd-event.c:4869
    systemd#17 0x7f357abcd572 in manager_loop ../src/core/manager.c:3244
    systemd#18 0x41db21 in invoke_main_loop ../src/core/main.c:1960
    systemd#19 0x426615 in main ../src/core/main.c:3125
    systemd#20 0x7f3577c49b49 in __libc_start_call_main (/lib64/libc.so.6+0x27b49) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
    systemd#21 0x7f3577c49c0a in __libc_start_main_alias_2 (/lib64/libc.so.6+0x27c0a) (BuildId: 245240a31888ad5c11bbc55b18e02d87388f59a9)
    systemd#22 0x408494 in _start (/usr/lib/systemd/systemd+0x408494) (BuildId: fe61e1b0f00b6a36aa34e707a98c15c52f6b960a)

SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s).
[   43.295912] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   43.297036] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 ]---

Originally noticed in systemd#28579.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants