-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
test: make assert_return() critical by default #30049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We had successfully released a new major release. We are no longer in a development freeze phase. |
82508a4 to
4677093
Compare
src/basic/log.c
Outdated
| if (e && log_set_ratelimit_kmsg_from_string(e) < 0) | ||
| log_warning("Failed to parse log ratelimit kmsg boolean '%s'. Ignoring.", e); | ||
|
|
||
| e = getenv("SYSTEMD_LOG_ASSERT_RETURN_IS_CRITICAL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a bit uneasy about this switch given that it can start crashing systemd left and right :-) Another option would be a compile-time option that could be passed by the CI. (at least that's what NetworkManager did as far as I can remember)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like NM flipped it at runtime to judge from #22406 (comment). I still think it should be a compile-time switch though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was planning to introduce a build time option and explicitly enable it in the CI, since this has a very high potential of screwing up the whole system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly. This switch here is harder to misuse and not as bad as many others. It seems reasonable to allow this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would actually be nice if it was possible to control it at runtime too because I can imagine scenarios where it would be useful to crash resolved (or any other program linked against libsystemd) without touching networkd but that runtime switch should be hidden behind a compile time switch probably :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use release mode build when you want to test other software, e.g. NetworkManager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because assert_return isn't supposed to crash in release mode but I'd want it to crash on the CI. For example here's what NM did according to #22406 (comment)
NM redefines assert_return() to behave like g_return_if_fail()... which under normal conditions only prints a message.
But in NM-CI we run with G_DEBUG=fatal-warnings, which turns this into an abort().
(NM replaced those systemd parts so it's a hypothetical example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a closer look at NM and it seems the dhcpv6 client is still there so maybe it isn't a hypothetical example after all :-) Either way systemd crashes aren't that unusual to me so I don't feel strongly about this. It's just that I'm not sure what is going to happen with this option enabled everywhere if some external code linked against libsystemd starts crashing on CentOS CI once it pulls the latest version of Arch or something like that. I'll leave it to @mrc0mmand to decide :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly
It's kind of hard to set them all to get almost all the systemd components to crash at once.
not as bad as many others
Dunno. I turned this on globally at some point and rolled it back because even in the environment prepared to deal with all sorts of systemd crashes where PID1, journald, resolved, networkd and so on go down on a regular basis it was unbearable. I guess if weird packets are never sent or other weird things like spawning a gazillion of weird jobs are never done it can be somewhat tolerated probably. Either way I think it's a debug feature and it should be available in debug builds only by analogy with the other additional assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said it it was in release mode it would make it easier to set it outside of, say, various CIs but I'm not sure who would need it because to, for example, instrument systemd it would still be necessary to compile systemd separately. All in all I'm for the runtime option available in debug builds only but I don't feel strongly about this.
4677093 to
6bcfb73
Compare
622cd66 to
d3ba7bb
Compare
keszybz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I had some pending comments from a long time ago. I'll publish them, maybe they'll still be useful.
src/basic/log.c
Outdated
| if (e && log_set_ratelimit_kmsg_from_string(e) < 0) | ||
| log_warning("Failed to parse log ratelimit kmsg boolean '%s'. Ignoring.", e); | ||
|
|
||
| e = getenv("SYSTEMD_LOG_ASSERT_RETURN_IS_CRITICAL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly. This switch here is harder to misuse and not as bad as many others. It seems reasonable to allow this.
d3ba7bb to
6df248e
Compare
Follow-up for ce5a6d5. Addresses systemd#30049 (comment).
|
Let's add env variable and command line option when necessary. PTAL. |
|
Could anyone review this, and hopefully set the green label? There may be several rooms to improve our test suites more, but at least this should be a good starting point. |
I'm a bit confused about the additional macros - would you mind extending the commit message(s) with a rationale behind them and when they should be used? I think I understand what the macros do, but not completely sure why we need them (compared to just making all |
These can be used to check if we trigger assert_return() unexpectedly. Co-authored-by: Frantisek Sumsal <[email protected]>
Several test cases intentionally trigger assert_return(). So, to avoid the entire test fails, this introduces several macros that tentatively make assert_return() not critical.
Triggering assert_return() should be a bug in general, and we should really fix that. But, previously, it is hard to notice such bug, as it was not critical. This is for making CI or our testing environment fail if we unexpectedly trigger assert_return(). So, hopefully we can easily find such bugs.
eb0d21a to
fce9e8a
Compare
|
Extended commit messages, moved the macros to src/shared/tests.h, and moved them from the first commit to the second commit. Total code change is the same as the previous version. |
Thank you! |
By making assert_return() critical, we observe the following:
---
Program received signal SIGABRT, Aborted.
0x00007f01320b0884 in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f01320b0884 in __pthread_kill_implementation ()
from /lib64/libc.so.6
#1 0x00007f013205fafe in raise () from /lib64/libc.so.6
#2 0x00007f013204887f in abort () from /lib64/libc.so.6
#3 0x00007f01338d02d6 in log_assert_failed (
text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
at ../src/basic/log.c:948
#4 0x00007f01338d0457 in log_assert_failed_return (
text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
at ../src/basic/log.c:967
#5 0x00007f0133c7ed83 in sd_event_add_time (e=0x617000022280,
ret=0x610000007e98, clock=7, usec=24054941030, accuracy=0,
callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
at ../src/libsystemd/sd-event/sd-event.c:1399
#6 0x00007f0133c7f725 in sd_event_add_time_relative (e=0x617000022280,
ret=0x610000007e98, clock=7, usec=1000000, accuracy=0,
callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
at ../src/libsystemd/sd-event/sd-event.c:1462
#7 0x0000000000464cac in dns_scope_announce (scope=0x610000007e40, goodbye=true) at ../src/resolve/resolved-dns-scope.c:1530
#8 0x0000000000504d08 in link_free (l=0x612000023d40) at ../src/resolve/resolved-link.c:83
#9 0x000000000052dbbd in manager_free (m=0x619000000a80) at ../src/resolve/resolved-manager.c:697
#10 0x0000000000562328 in manager_freep (p=0x7f012f800040) at ../src/resolve/resolved-manager.h:198
#11 0x000000000056315a in run (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:25
#12 0x0000000000563284 in main (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:99
---
Prompted by systemd/systemd#30049 (comment).
(cherry picked from commit a4be4ad)
(cherry picked from commit 390e942)
(cherry picked from commit 3529def)
By making assert_return() critical, we observe the following:
---
Program received signal SIGABRT, Aborted.
0x00007f01320b0884 in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f01320b0884 in __pthread_kill_implementation ()
from /lib64/libc.so.6
systemd#1 0x00007f013205fafe in raise () from /lib64/libc.so.6
systemd#2 0x00007f013204887f in abort () from /lib64/libc.so.6
systemd#3 0x00007f01338d02d6 in log_assert_failed (
text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
at ../src/basic/log.c:948
systemd#4 0x00007f01338d0457 in log_assert_failed_return (
text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
at ../src/basic/log.c:967
systemd#5 0x00007f0133c7ed83 in sd_event_add_time (e=0x617000022280,
ret=0x610000007e98, clock=7, usec=24054941030, accuracy=0,
callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
at ../src/libsystemd/sd-event/sd-event.c:1399
systemd#6 0x00007f0133c7f725 in sd_event_add_time_relative (e=0x617000022280,
ret=0x610000007e98, clock=7, usec=1000000, accuracy=0,
callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
at ../src/libsystemd/sd-event/sd-event.c:1462
systemd#7 0x0000000000464cac in dns_scope_announce (scope=0x610000007e40, goodbye=true) at ../src/resolve/resolved-dns-scope.c:1530
systemd#8 0x0000000000504d08 in link_free (l=0x612000023d40) at ../src/resolve/resolved-link.c:83
systemd#9 0x000000000052dbbd in manager_free (m=0x619000000a80) at ../src/resolve/resolved-manager.c:697
systemd#10 0x0000000000562328 in manager_freep (p=0x7f012f800040) at ../src/resolve/resolved-manager.h:198
systemd#11 0x000000000056315a in run (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:25
systemd#12 0x0000000000563284 in main (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:99
---
Prompted by systemd/systemd#30049 (comment).
(cherry picked from commit a4be4ad)
(cherry picked from commit 390e942)
(cherry picked from commit 3529def)
(cherry picked from commit 8e70e22)
By making assert_return() critical, we observe the following:
---
Program received signal SIGABRT, Aborted.
0x00007f01320b0884 in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f01320b0884 in __pthread_kill_implementation ()
from /lib64/libc.so.6
#1 0x00007f013205fafe in raise () from /lib64/libc.so.6
#2 0x00007f013204887f in abort () from /lib64/libc.so.6
#3 0x00007f01338d02d6 in log_assert_failed (
text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
at ../src/basic/log.c:948
#4 0x00007f01338d0457 in log_assert_failed_return (
text=0x7f01340009e0 "e->state != SD_EVENT_FINISHED",
file=0x7f0133fff403 "src/libsystemd/sd-event/sd-event.c", line=1399,
func=0x7f01340045a0 <__func__.148> "sd_event_add_time")
at ../src/basic/log.c:967
#5 0x00007f0133c7ed83 in sd_event_add_time (e=0x617000022280,
ret=0x610000007e98, clock=7, usec=24054941030, accuracy=0,
callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
at ../src/libsystemd/sd-event/sd-event.c:1399
#6 0x00007f0133c7f725 in sd_event_add_time_relative (e=0x617000022280,
ret=0x610000007e98, clock=7, usec=1000000, accuracy=0,
callback=0x4625b4 <on_announcement_timeout>, userdata=0x610000007e40)
at ../src/libsystemd/sd-event/sd-event.c:1462
#7 0x0000000000464cac in dns_scope_announce (scope=0x610000007e40, goodbye=true) at ../src/resolve/resolved-dns-scope.c:1530
#8 0x0000000000504d08 in link_free (l=0x612000023d40) at ../src/resolve/resolved-link.c:83
#9 0x000000000052dbbd in manager_free (m=0x619000000a80) at ../src/resolve/resolved-manager.c:697
#10 0x0000000000562328 in manager_freep (p=0x7f012f800040) at ../src/resolve/resolved-manager.h:198
#11 0x000000000056315a in run (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:25
#12 0x0000000000563284 in main (argc=1, argv=0x7fff22b06468) at ../src/resolve/resolved.c:99
---
Prompted by systemd/systemd#30049 (comment).
(cherry picked from commit a4be4ad)
(cherry picked from commit 390e942)
(cherry picked from commit 3529def)
(cherry picked from commit 8e70e22)
No description provided.