Skip to content

Conversation

@yuwata
Copy link
Member

@yuwata yuwata commented Nov 16, 2023

No description provided.

@github-actions
Copy link

github-actions bot commented Nov 16, 2023

We had successfully released a new major release. We are no longer in a development freeze phase.
We will try our best to get back to your PR as soon as possible. Thank you for your patience.

@yuwata yuwata force-pushed the assert-return-critical branch 3 times, most recently from 82508a4 to 4677093 Compare November 16, 2023 03:58
src/basic/log.c Outdated
if (e && log_set_ratelimit_kmsg_from_string(e) < 0)
log_warning("Failed to parse log ratelimit kmsg boolean '%s'. Ignoring.", e);

e = getenv("SYSTEMD_LOG_ASSERT_RETURN_IS_CRITICAL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a bit uneasy about this switch given that it can start crashing systemd left and right :-) Another option would be a compile-time option that could be passed by the CI. (at least that's what NetworkManager did as far as I can remember)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like NM flipped it at runtime to judge from #22406 (comment). I still think it should be a compile-time switch though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was planning to introduce a build time option and explicitly enable it in the CI, since this has a very high potential of screwing up the whole system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly. This switch here is harder to misuse and not as bad as many others. It seems reasonable to allow this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would actually be nice if it was possible to control it at runtime too because I can imagine scenarios where it would be useful to crash resolved (or any other program linked against libsystemd) without touching networkd but that runtime switch should be hidden behind a compile time switch probably :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use release mode build when you want to test other software, e.g. NetworkManager?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because assert_return isn't supposed to crash in release mode but I'd want it to crash on the CI. For example here's what NM did according to #22406 (comment)

NM redefines assert_return() to behave like g_return_if_fail()... which under normal conditions only prints a message.
But in NM-CI we run with G_DEBUG=fatal-warnings, which turns this into an abort().

(NM replaced those systemd parts so it's a hypothetical example)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look at NM and it seems the dhcpv6 client is still there so maybe it isn't a hypothetical example after all :-) Either way systemd crashes aren't that unusual to me so I don't feel strongly about this. It's just that I'm not sure what is going to happen with this option enabled everywhere if some external code linked against libsystemd starts crashing on CentOS CI once it pulls the latest version of Arch or something like that. I'll leave it to @mrc0mmand to decide :-)

Copy link
Contributor

@evverx evverx Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly

It's kind of hard to set them all to get almost all the systemd components to crash at once.

not as bad as many others

Dunno. I turned this on globally at some point and rolled it back because even in the environment prepared to deal with all sorts of systemd crashes where PID1, journald, resolved, networkd and so on go down on a regular basis it was unbearable. I guess if weird packets are never sent or other weird things like spawning a gazillion of weird jobs are never done it can be somewhat tolerated probably. Either way I think it's a debug feature and it should be available in debug builds only by analogy with the other additional assertions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said it it was in release mode it would make it easier to set it outside of, say, various CIs but I'm not sure who would need it because to, for example, instrument systemd it would still be necessary to compile systemd separately. All in all I'm for the runtime option available in debug builds only but I don't feel strongly about this.

@yuwata yuwata force-pushed the assert-return-critical branch 2 times, most recently from 622cd66 to d3ba7bb Compare November 20, 2023 00:32
@yuwata yuwata added this to the v256 milestone Nov 20, 2023
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I had some pending comments from a long time ago. I'll publish them, maybe they'll still be useful.

src/basic/log.c Outdated
if (e && log_set_ratelimit_kmsg_from_string(e) < 0)
log_warning("Failed to parse log ratelimit kmsg boolean '%s'. Ignoring.", e);

e = getenv("SYSTEMD_LOG_ASSERT_RETURN_IS_CRITICAL");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly. This switch here is harder to misuse and not as bad as many others. It seems reasonable to allow this.

@yuwata yuwata force-pushed the assert-return-critical branch from d3ba7bb to 6df248e Compare December 6, 2023 05:46
yuwata added a commit to yuwata/systemd that referenced this pull request Dec 6, 2023
@yuwata yuwata marked this pull request as ready for review December 6, 2023 05:46
@github-actions github-actions bot added the please-review PR is ready for (re-)review by a maintainer label Dec 6, 2023
@yuwata
Copy link
Member Author

yuwata commented Dec 6, 2023

Let's add env variable and command line option when necessary. PTAL.

@yuwata
Copy link
Member Author

yuwata commented Dec 22, 2023

Could anyone review this, and hopefully set the green label? There may be several rooms to improve our test suites more, but at least this should be a good starting point.

cc @mrc0mmand @evverx @keszybz

@mrc0mmand
Copy link
Member

Could anyone review this, and hopefully set the green label? There may be several rooms to improve our test suites more, but at least this should be a good starting point.

cc @mrc0mmand @evverx @keszybz

I'm a bit confused about the additional macros - would you mind extending the commit message(s) with a rationale behind them and when they should be used? I think I understand what the macros do, but not completely sure why we need them (compared to just making all assert_return() calls fail when -Dmode=developer is used).

yuwata and others added 3 commits December 24, 2023 01:52
These can be used to check if we trigger assert_return()
unexpectedly.

Co-authored-by: Frantisek Sumsal <[email protected]>
Several test cases intentionally trigger assert_return(). So, to avoid
the entire test fails, this introduces several macros that tentatively
make assert_return() not critical.
Triggering assert_return() should be a bug in general, and we should
really fix that.  But, previously, it is hard to notice such bug, as
it was not critical.
This is for making CI or our testing environment fail if we unexpectedly
trigger assert_return(). So, hopefully we can easily find such bugs.
@yuwata
Copy link
Member Author

yuwata commented Dec 23, 2023

Extended commit messages, moved the macros to src/shared/tests.h, and moved them from the first commit to the second commit. Total code change is the same as the previous version.

@mrc0mmand mrc0mmand added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed please-review PR is ready for (re-)review by a maintainer labels Dec 23, 2023
@mrc0mmand
Copy link
Member

Extended commit messages, moved the macros to src/shared/tests.h, and moved them from the first commit to the second commit. Total code change is the same as the previous version.

Thank you!

@yuwata yuwata merged commit b6c424a into systemd:main Dec 23, 2023
@yuwata yuwata deleted the assert-return-critical branch December 23, 2023 18:39
@github-actions github-actions bot removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Dec 23, 2023
bluca pushed a commit to systemd/systemd-stable that referenced this pull request Dec 23, 2023
By making assert_return() critical, we observe the following:
---
 Program received signal SIGABRT, Aborted.
 0x00007f01320b0884 in __pthread_kill_implementation () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00007f01320b0884 in __pthread_kill_implementation ()
    from /lib64/libc.so.6
 #1  0x00007f013205fafe in raise () from /lib64/libc.so.6
 #2  0x00007f013204887f in abort () from /lib64/libc.so.6
 #3  0x00007f01338d02d6 in log_assert_failed (
     text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
     file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
     func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
     at ../src/basic/log.c:948
 #4  0x00007f01338d0457 in log_assert_failed_return (
     text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
     file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
     func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
     at ../src/basic/log.c:967
 #5  0x00007f0133c7ed83 in sd_event_add_time (e=0x617000022280,
     ret=0x610000007e98, clock=7, usec=24054941030, accuracy=0,
     callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
     at ../src/libsystemd/sd-event/sd-event.c:1399
 #6  0x00007f0133c7f725 in sd_event_add_time_relative (e=0x617000022280,
     ret=0x610000007e98, clock=7, usec=1000000, accuracy=0,
     callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
     at ../src/libsystemd/sd-event/sd-event.c:1462
 #7  0x0000000000464cac in dns_scope_announce (scope=0x610000007e40, goodbye=true) at ../src/resolve/resolved-dns-scope.c:1530
 #8  0x0000000000504d08 in link_free (l=0x612000023d40) at ../src/resolve/resolved-link.c:83
 #9  0x000000000052dbbd in manager_free (m=0x619000000a80) at ../src/resolve/resolved-manager.c:697
 #10 0x0000000000562328 in manager_freep (p=0x7f012f800040) at ../src/resolve/resolved-manager.h:198
 #11 0x000000000056315a in run (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:25
 #12 0x0000000000563284 in main (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:99
---
Prompted by systemd/systemd#30049 (comment).

(cherry picked from commit a4be4ad)
(cherry picked from commit 390e942)
(cherry picked from commit 3529def)
bluca pushed a commit to bluca/systemd-stable that referenced this pull request Dec 23, 2023
By making assert_return() critical, we observe the following:
---
 Program received signal SIGABRT, Aborted.
 0x00007f01320b0884 in __pthread_kill_implementation () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00007f01320b0884 in __pthread_kill_implementation ()
    from /lib64/libc.so.6
 systemd#1  0x00007f013205fafe in raise () from /lib64/libc.so.6
 systemd#2  0x00007f013204887f in abort () from /lib64/libc.so.6
 systemd#3  0x00007f01338d02d6 in log_assert_failed (
     text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
     file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
     func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
     at ../src/basic/log.c:948
 systemd#4  0x00007f01338d0457 in log_assert_failed_return (
     text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
     file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
     func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
     at ../src/basic/log.c:967
 systemd#5  0x00007f0133c7ed83 in sd_event_add_time (e=0x617000022280,
     ret=0x610000007e98, clock=7, usec=24054941030, accuracy=0,
     callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
     at ../src/libsystemd/sd-event/sd-event.c:1399
 systemd#6  0x00007f0133c7f725 in sd_event_add_time_relative (e=0x617000022280,
     ret=0x610000007e98, clock=7, usec=1000000, accuracy=0,
     callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
     at ../src/libsystemd/sd-event/sd-event.c:1462
 systemd#7  0x0000000000464cac in dns_scope_announce (scope=0x610000007e40, goodbye=true) at ../src/resolve/resolved-dns-scope.c:1530
 systemd#8  0x0000000000504d08 in link_free (l=0x612000023d40) at ../src/resolve/resolved-link.c:83
 systemd#9  0x000000000052dbbd in manager_free (m=0x619000000a80) at ../src/resolve/resolved-manager.c:697
 systemd#10 0x0000000000562328 in manager_freep (p=0x7f012f800040) at ../src/resolve/resolved-manager.h:198
 systemd#11 0x000000000056315a in run (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:25
 systemd#12 0x0000000000563284 in main (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:99
---
Prompted by systemd/systemd#30049 (comment).

(cherry picked from commit a4be4ad)
(cherry picked from commit 390e942)
(cherry picked from commit 3529def)
(cherry picked from commit 8e70e22)
bluca pushed a commit to systemd/systemd-stable that referenced this pull request Dec 24, 2023
By making assert_return() critical, we observe the following:
---
 Program received signal SIGABRT, Aborted.
 0x00007f01320b0884 in __pthread_kill_implementation () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00007f01320b0884 in __pthread_kill_implementation ()
    from /lib64/libc.so.6
 #1  0x00007f013205fafe in raise () from /lib64/libc.so.6
 #2  0x00007f013204887f in abort () from /lib64/libc.so.6
 #3  0x00007f01338d02d6 in log_assert_failed (
     text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
     file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
     func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
     at ../src/basic/log.c:948
 #4  0x00007f01338d0457 in log_assert_failed_return (
     text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
     file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
     func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
     at ../src/basic/log.c:967
 #5  0x00007f0133c7ed83 in sd_event_add_time (e=0x617000022280,
     ret=0x610000007e98, clock=7, usec=24054941030, accuracy=0,
     callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
     at ../src/libsystemd/sd-event/sd-event.c:1399
 #6  0x00007f0133c7f725 in sd_event_add_time_relative (e=0x617000022280,
     ret=0x610000007e98, clock=7, usec=1000000, accuracy=0,
     callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
     at ../src/libsystemd/sd-event/sd-event.c:1462
 #7  0x0000000000464cac in dns_scope_announce (scope=0x610000007e40, goodbye=true) at ../src/resolve/resolved-dns-scope.c:1530
 #8  0x0000000000504d08 in link_free (l=0x612000023d40) at ../src/resolve/resolved-link.c:83
 #9  0x000000000052dbbd in manager_free (m=0x619000000a80) at ../src/resolve/resolved-manager.c:697
 #10 0x0000000000562328 in manager_freep (p=0x7f012f800040) at ../src/resolve/resolved-manager.h:198
 #11 0x000000000056315a in run (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:25
 #12 0x0000000000563284 in main (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:99
---
Prompted by systemd/systemd#30049 (comment).

(cherry picked from commit a4be4ad)
(cherry picked from commit 390e942)
(cherry picked from commit 3529def)
(cherry picked from commit 8e70e22)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants