Implement async-signal-safe lock for DwarfFDECache#34
Implement async-signal-safe lock for DwarfFDECache#34al13n321 merged 1 commit intoClickHouse:masterfrom
Conversation
8729af7 to
7729044
Compare
al13n321
left a comment
There was a problem hiding this comment.
Neat, thanks for the fix!
(I suspect the implementation would be slightly simpler and faster if the two counters were packed into one 64-bit atomic, but it probably doesn't matter.)
| while (true) { | ||
| int expected = 0; | ||
| if (atomic_compare_exchange_weak(&state, &expected, -1)) { | ||
| atomic_fetch_sub(&waiting_writers, 1); |
There was a problem hiding this comment.
Nitpick (EDIT: meh, merging without it):
| atomic_fetch_sub(&waiting_writers, 1); | |
| int current_waiting_writers = atomic_fetch_sub(&waiting_writers, 1); | |
| if (current_waiting_writers <= 0) { | |
| abort(); | |
| } |
|
@al13n321 why is this PR not merged to master yet? |
It'd probably make the correctness of the implementation more plausible if we keep all state (not just a variable called |
|
Turns out DwarfFDECache is supposed to be disabled in clickhouse (using |
I originally wanted to delete this cache too, but I'm worried that it might have a performance impact on stack unwinding. |
There are at least three types of stack unwinding in ClickHouse:
SIGUSR1: Used by the CPU Profiler
SIGUSR2: Used by the Real Profiler
SIGTRIM: Used by system.stack_trace
Jemalloc profile unwinding
Exception unwinding
During stack unwinding, libunwind manipulates the
DwarfFDECachewhich acquires a read-write mutex (RWMutex). This is not async-signal-safe and can lead to serious issues. In particular, a signal handler might try to acquire the mutex while current thread is already in the process of unwinding and trying to acquire the same mutex , potentially leading to corruption of the mutex state and making debugging extremely difficult. A deadlock occurring during stack unwinding typically results in a "stop-the-world" situation.An intuitive way to solve this problem is to block related signals before unwinding stack and unblock them later. However it's almost impossible to unblock signals later after exception unwinding.
Another way is to remove the use of pthread rw lock which is not async-signal-safe. This PR implements a simple lockfree RWMutex which should be good enough for stack unwinding purpose. It has been verified across a cluster of over 100 nodes. Prior to this implementation, the system experienced node deadlocks every 3-5 days. After applying the PR, the cluster has now been stable for two weeks without any reported issues.
Similar issue: ClickHouse/ClickHouse#69904