-
Notifications
You must be signed in to change notification settings - Fork 38.7k
tracing: lock contention analysis #25081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a templated class, so this also adds the two needed explicit instantiations. Why do this? having tracepoints in the header leads to lots of traces which quickly exhaust the limit. Therefore this moves the implementation into the .cpp file.
Stores the arguments as members so they are available in the destructor. This will be useful later when adding tracepoints. Note that it is remembered as a `char const*`, so this only works for string literals where long lifetime is guaranteed. UniqueLock is only used with the `LOCK`/`LOCK2` macro, so this is fine. Using `std::string` would be an unnecessary overhead.
This enables tracing of locks & lock contention analysis.
Same as `UniqueLock`, moves the implementation of `AnnotatedMixin` into the cpp file. That way we can add tracepoints to the creation and destruction of a mutex.
This bpftrace script can be used to analyze lock contention. It shows code locations that have to wait a long time to actually acquire a lock, and also shows the last location that lock was held, to see who was actually responsible for blocking. When monitoring ends, prints the locations with the longest waiting durations.
|
Interesting, would like to give this a try soon. |
|
This looks interesting. Planning to play around with this, e.g. running this on bitcoind.observer. I think it would be good to document these tracepoints in doc/tracing.md and, if possible, have an interface test similar to these added in e.g. #24358 ( Edit: Oh also, documentation of the bpftrace script in |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Oh, I didn't even know we had something like that... I guess I need to learn more about eBPF and python 😅 |
For info, #24757 would define it by default in debug builds ( |
|
Concept ACK, I like using tracepoints for this kind of analysis instead of our own instrumentation.
No longer inlining the locks will add some overhead to locking even in the no-tracing case. We'll need to benchmark this. |
|
Concept ACK. |
|
Hvaen't reviewed the code yet, but just ran a quick IBD bench on a "contemporary" region of the chain to get a sense of any performance impact. No significant slowdown there. commands index
#25081 vs. $mergebase (absolute)
#25081 vs. $mergebase (relative)
Reproduce withbitcoinperf bench-pr --num-blocks 10_000 --run-id lock-tracing \
--bitcoind-args='-dbcache=10000' --run-count 2 25081 |
|
Concept ACK!
This was one of the main usecases of tracepoints that I had in mind! Very nice, will test.
|
|
I've started playing around with these tracepoints. It seems like there are two ELF notes generated for each Output of `readelf -n ./src/bitcoind`I think, this is caused the templates were using - and that's not something we want to change. E.g. Lines 329 to 333 in f7917c2
Two ELF notes with the same context/provider and name cause problems for me. E.g., when using bcc via the binary path (as we do it in e.g. the interface tests). According to iovisor/bcc#1774 (comment) this works when using the PID, which we can do for the interface tests. Still, supporting hooking into the tracepoints both via PID and binary path would be good, but I'm not sure if that's possible in this case. |
Currently the templates are instantiated only for template <typename PARENT>
AnnotatedMixin<PARENT>::AnnotatedMixin()
{
if constexpr (std::is_same_v<std::mutex, PARENT>) {
TRACE1(sync, mutex_ctor, this);
} else if constexpr (std::is_same_v<std::recursive_mutex, PARENT>) {
TRACE1(sync, recursive_mutex_ctor, this);
} else {
assert(false && "unknown lock used");
}
}Unfortunately currently I don't have much time to work on this though |
|
there are two ELF notes generated for each sync tracepoint
If the template is getting instantiated twice creating two tracepoints, why not just create a standalone function containing the trace? Then perhaps we wouldn't need to move the header-only templates as well.
|
|
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
|
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
|
Hi, I lost interest / don't have time for this unfortunately. You can close it and mark it as up-for-grabs. |
Currently there is a macro
DEBUG_LOCKCONTENTION(seedeveloper-notes.md) to enable logging of contention. The disadvantage of this method is thatbitcoindneeds to be especially compiled with that enabled, and when enabled this quickly produces huge log files, and has a relatively large overhead.This PR introduces some tracepoints and adds a tracing script
lock_contention_analysis.btthat can be started at any time whilebitcoindis running. This has low overhead and gives better analysis possibilities.When the script is attached with
sudo bpftrace contrib/tracing/lock_contention_analysis.bt, you'll get a markdown formatted output that looks like this:When monitoring is stopped (press Ctrl+C), the script prints the source locations where it has waited the longest while the script was running (unfortunately there seems to be no way to format this more nicely):
For this to work I had to add traces to
UniqueLockandAnnotatedMixin, and move the implementation into the cpp file (without moving, due to inlining the maximum number of traces of 512 is quickly exhausted)