Skip to content

Assertion and panic, print crash log without generating SIGSEGV#7585

Merged
oranagra merged 2 commits intoredis:unstablefrom
oranagra:assert_no_segfault
Aug 6, 2020
Merged

Assertion and panic, print crash log without generating SIGSEGV#7585
oranagra merged 2 commits intoredis:unstablefrom
oranagra:assert_no_segfault

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented Jul 29, 2020

This makes it possible to add tests that generate assertions, and run
them with valgrind, making sure that there are no memory violations
prior to the assertion.

New config options:

  • crash-log-enabled - can be disabled for cleaner core dumps
  • crash-memcheck-enabled - useful for faster termination after a crash
  • use-exit-on-panic - to be used by the test suite so that valgrind can
    detect leaks and memory corruptions

Other changes:

  • Crash log is printed even on system that dont HAVE_BACKTRACE, i.e. in
    both SIGSEGV and assert / panic
  • Assertion and panic won't print registers and code around EIP (which
    was useless), but will do fast memory test (which may still indicate
    that the assertion was due to memory corrpution)

I had to reshuffle code in order to re-use it, so i extracted come code
into function without actually doing any changes to the code:

  • logServerInfo
  • logModulesInfo
  • doFastMemoryTest (with the exception of it being conditional)
  • dumpCodeAroundEIP

changes to the crash report on segfault:

  • logRegisters is called right after the stack trace (before info) done
    just in order to have more re-usable code
  • stack trace skips the first two items on the stack (the crash log and
    signal handler functions)

@oranagra oranagra requested a review from yossigo July 29, 2020 14:47
@guybe7
Copy link
Collaborator

guybe7 commented Jul 29, 2020

@oranagra can you please attach examples of:

  1. crash
  2. assertion

@oranagra
Copy link
Member Author

here are examples using the DEBUG command.
sigsegv.txt
panic.txt
assert.txt

@oranagra
Copy link
Member Author

anyone got any clue why Salvatore bothered to save the assertion info in variables and print them for a second time from within the signal handler? i fear i may be overlooking something.
fa5af01

@oranagra
Copy link
Member Author

One disadvantage of this approach is that it's not possible to generate a core dump on assertions.
maybe we want it conditional (a config that controls if we exit or SIGSEGV)?

src/debug.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra consider making this static (maybe a good opportunity to make other func defs in this file static as well).

src/debug.c Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra Did you intentionally move the backtrace down? In extreme cases with stack all messed up it may make a difference (capturing it as early as we can). Not sure....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i moved it since i wanted to make use of the trace array for the first call to backtrace_symbols_fd and still not do trace+1 for the call second call (cleanup). but ended up using the eip argument as input for the first call, so i can undo this now.

Copy link
Member Author

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo do you have any feedback on the questions i asked i in the conversation?
specifically the reason for server.assert_file, and the disadvantages of not terminating with SIGSEGV

src/debug.c Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i moved it since i wanted to make use of the trace array for the first call to backtrace_symbols_fd and still not do trace+1 for the call second call (cleanup). but ended up using the eip argument as input for the first call, so i can undo this now.

@yossigo
Copy link
Collaborator

yossigo commented Jul 31, 2020

@oranagra
Being able to dump core on assert is certainly useful. If we do that, triggering a signal and producing the output there (rather than producing the output, then signalling to dump core) has the benefit of switching the stack and reducing chances of corruption.

I think the sole purpose of pushing stuff through struct server is to do exactly that, i.e. pass arguments from assert to the signal handler.

BTW maybe using abort() for assertions would be better, so it's easier to tell assertions from crashes without looking at logs.

@oranagra
Copy link
Member Author

@yossigo If the first thing _serverAssert does is collect stack trace, why would it be better to raise a signal and catch it there? i feel it might be the other way around (the signal messes up the trace a bit).
The only things that _serverAssert does before collecting the stack trace are things that it did before triggering the SIGSEGV too.
Granted, the EIP is captured correctly in case we raise a signal, but for assertions we already know the function that triggered them, what we don't know is which function called it.

in order to silence these assertions in valgraind i must use exit, can't use abort.
but maybe if i add a config/debug flag (possibly even enabled by default) that controls if after an assertion the server is terminated by signal rather than exit, i can use aobrt() instead of writing into NULL pointer.

@oranagra
Copy link
Member Author

ohh, i misunderstood you.. you where talking about corrupting the "caller's" stack for the code dump, not about the stack trace.
well the signal handler in redis is anyway running on the same as the "caller", and the code in the signal handler can corrupt other things (not to mention the memory check, LOL).
what we may want to do is add a config that completely disables the signal handler and crash log, so that code dump is generated immediately.

@oranagra
Copy link
Member Author

oranagra commented Aug 2, 2020

squash-pushed a new commit with some code review fixes are these new config options:

  • crash-log-enabled - can be disabled for cleaner core dumps
  • crash-memcheck-enabled - useful for faster termination after a crash
  • use-exit-on-panic - (undocumented) to be used by the test suite so that valgrind can detect leaks and memory corruptions

@oranagra oranagra marked this pull request as ready for review August 2, 2020 11:51
@oranagra oranagra requested review from guybe7 and yossigo August 2, 2020 11:51
This makes it possible to add tests that generate assertions, and run
them with valgrind, making sure that there are no memory violations
prior to the assertion.

New config options:
- crash-log-enabled - can be disabled for cleaner core dumps
- crash-memcheck-enabled - useful for faster termination after a crash
- use-exit-on-panic - to be used by the test suite so that valgrind can
  detect leaks and memory corruptions

Other changes:
- Crash log is printed even on system that dont HAVE_BACKTRACE, i.e. in
  both SIGSEGV and assert / panic
- Assertion and panic won't print registers and code around EIP (which
  was useless), but will do fast memory test (which may still indicate
  that the assertion was due to memory corrpution)

I had to reshuffle code in order to re-use it, so i extracted come code
into function without actually doing any changes to the code:
- logServerInfo
- logModulesInfo
- doFastMemoryTest (with the exception of it being conditional)
- dumpCodeAroundEIP

changes to the crash report on segfault:
- logRegisters is called right after the stack trace (before info) done
  just in order to have more re-usable code
- stack trace skips the first two items on the stack (the crash log and
  signal handler functions)
this race would only happen when two threads paniced at the same time,
and even then the only consequence is some extra log lines.

race reported in #7391
@oranagra oranagra linked an issue Aug 4, 2020 that may be closed by this pull request
@oranagra oranagra merged commit 81f8524 into redis:unstable Aug 6, 2020
@oranagra oranagra deleted the assert_no_segfault branch August 6, 2020 13:47
@oranagra oranagra mentioned this pull request Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential Data Races Detected by Static Code Scanner

3 participants