Skip to content

Conversation

@fjahr
Copy link
Contributor

@fjahr fjahr commented Oct 29, 2025

This adds a rough way to run specific benchmarks with different numbers of worker threads to see how these impact performance. This is useful for me in the batch validation context and for potential refactorings of checkqueue but I am not sure if this is useful in many other contexts so I am leaving this as a RFC draft for now to see if there is any interest in merging something like this.

Example output:

$ build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr -min-time=1000 -scale-threads
Running benchmarks with worker threads from 1 to 15

|            ns/block |             block/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       43,118,187.50 |               23.19 |    0.1% |      0.99 | `ConnectBlockAllSchnorr_1_threads`
|       29,015,444.33 |               34.46 |    0.2% |      1.10 | `ConnectBlockAllSchnorr_2_threads`
|       22,241,437.50 |               44.96 |    0.1% |      1.09 | `ConnectBlockAllSchnorr_3_threads`
|       17,893,361.00 |               55.89 |    0.3% |      1.09 | `ConnectBlockAllSchnorr_4_threads`
|       15,105,819.50 |               66.20 |    0.2% |      1.09 | `ConnectBlockAllSchnorr_5_threads`
|       13,145,510.38 |               76.07 |    0.2% |      1.08 | `ConnectBlockAllSchnorr_6_threads`
|       11,646,614.62 |               85.86 |    0.2% |      1.06 | `ConnectBlockAllSchnorr_7_threads`
|       10,476,425.89 |               95.45 |    0.2% |      1.07 | `ConnectBlockAllSchnorr_8_threads`
|        9,547,125.00 |              104.74 |    0.2% |      1.08 | `ConnectBlockAllSchnorr_9_threads`
|        9,574,246.27 |              104.45 |    0.6% |      1.08 | `ConnectBlockAllSchnorr_10_threads`
|        9,557,633.40 |              104.63 |    1.1% |      1.08 | `ConnectBlockAllSchnorr_11_threads`
|        9,495,175.00 |              105.32 |    0.8% |      1.08 | `ConnectBlockAllSchnorr_12_threads`
|        9,297,012.50 |              107.56 |    0.3% |      1.07 | `ConnectBlockAllSchnorr_13_threads`
|        9,775,795.45 |              102.29 |    0.7% |      1.11 | `ConnectBlockAllSchnorr_14_threads`
|        9,776,140.18 |              102.29 |    0.9% |      1.06 | `ConnectBlockAllSchnorr_15_threads`

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 29, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33740.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #34210 (bench: Remove -priority-level= option by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fjahr
Copy link
Contributor Author

fjahr commented Oct 29, 2025

I people could test the exact command as above it might be interesting to post the results here. For me at least, 13 worker threads are consistently delivering a better result than 15 worker threads. EDIT: After some time I started to see the best results with 14 and 15, so maybe this was nothing.

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether the concept of script validation has to creep into the benchmarking parameters, maybe we could generalize it to -thread-count or -par or something. Left a few nits.

if (threads > 0) {
bool found = false;
for (auto& arg : current_setup_args) {
if (arg.starts_with("-worker-threads=") == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, this looks like a refactor gone wrong, was this arg.find("-worker-threads=") == 0 originally?

Suggested change
if (arg.starts_with("-worker-threads=") == 0) {
if (arg.starts_with("-worker-threads=")) {

so I guess if (!found) { should also be adjusted after this


Alternatively, instead of finding and overwriting, what if we unconditionally erased and pushed back the actual thread count, something like:

std::vector current_setup_args {args.setup_args};
if (threads > 0) {
    std::erase_if(current_setup_args, [](const auto& arg) { return arg.starts_with("-worker-threads="); });
    current_setup_args.push_back(strprintf("-worker-threads=%d", threads));
}

Comment on lines +121 to +125
// Scale from 1 to MAX_SCRIPTCHECK_THREADS
constexpr int MAX_SCRIPTCHECK_THREADS = 15;
for (int i = 1; i <= MAX_SCRIPTCHECK_THREADS; ++i) {
thread_scales.push_back(i);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script checks are currently the main source of threading, but not necessarily limited to this (e.g. https://github.com/bitcoin/bitcoin/pull/31132/commits or #26966 or #33689 or #32747), so I would suggest untangling multithreading from script validation here. If you insist this being about script validation (which sounds a bit too specific to include in a general benchmark), ignore the comment.

nit: redundant comment, explains exactly what the code already explains without adding additional context

} else if (m_node.args->IsArgSet("-worker-threads")) {
worker_threads = m_node.args->GetIntArg("-worker-threads", 2);
if (worker_threads < 0) {
throw std::runtime_error("-worker-threads must be non-negative");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen when worker_threads > MAX_SCRIPTCHECK_THREADS?

std::cout << "Warning: -scale-threads requires -filter to be set. Running with default thread count." << std::endl;
}

std::vector<int> thread_scales;
Copy link
Contributor

@l0rinc l0rinc Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of a vector over a range? Are we planning on testing threads 2,4,8,16 instead of 1,2,3,4,5,6...?
If not, consider making it a range, something like:

constexpr int DEFAULT_MAX_BENCH_THREADS{16};
const auto [start_threads, end_threads]{is_thread_scaling ? std::pair{1, DEFAULT_MAX_BENCH_THREADS} : std::pair{0, 0}};

and maybe iterate as

for (int threads{start_threads}; threads <= end_threads; ++threads) {
    ...
}

for (const auto& arg : args.setup_args) ret.emplace_back(arg.c_str());
return ret;
};
bool is_thread_scaling = args.scale_threads && (args.regex_filter != ".*");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be out of scope for this PR, but instead of inlining DEFAULT_BENCH_FILTER here, can we move it over to src/bench/bench.cpp

Suggested change
bool is_thread_scaling = args.scale_threads && (args.regex_filter != ".*");
bool is_thread_scaling = args.scale_threads && (args.regex_filter != DEFAULT_BENCH_FILTER);

@fjahr
Copy link
Contributor Author

fjahr commented Nov 26, 2025

Not sure whether the concept of script validation has to creep into the benchmarking parameters, maybe we could generalize it to -thread-count or -par or something.

Thanks, happy to change this and address the other comments. But aside from that do you think this is generally interesting enough to be considered merging? I was unsure and since I haven't worked much on benchmarking I would like to get some concept-ack-ish feedback from the benchmarking wg :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants