-
Notifications
You must be signed in to change notification settings - Fork 38.7k
WIP NOMERGE [bench] gitian builds for OP_IF bench #18352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Gitian builds
|
|
Tested on ARM: Current master (with optimizations): This pull (without the opt): |
|
@fjahr , would be nice if you can test this on macOs to compare against #16902 (comment) The bench_bitcoin binary is contained in the |
|
@fjahr Alternatively, it would be nice to know what optimization flags you used to compile #16902 (comment) |
Just the defaults of |
The pr was mostly to get two gitian builds for osx ( |
Sorry, misread at first, here are results for Master: Master and this PR: |
|
Hmmm 🤔 Now it seems to even slow it down by a bit. (Master should be the faster one) |
Yeah, I had some direct exchanges with @ajtowns as well after my results on #16902 but we could not figure it out. Let me know if you want me to do more specific testing or investigate my system aside from the information I shared so far. |
|
The hardware you are using and the version of the operating system would help to check if this can be reproduced or if it is only happening on your end. |
|
I think the most relevant info is:
|
|
Testing on a mac-mini shows the same result: Slower on master (the optimized version) and faster with the old code: |
Super weird. Maybe try setting |
This replaces the current benchmarking framework with nanobench [1], an MIT licensed single-header benchmarking library, of which I am the autor. This has in my opinion several advantages, especially on Linux: * fast: Running all benchmarks takes ~6 seconds instead of 4m13s on an Intel i7-8700 CPU @ 3.20GHz. * accurate: I ran e.g. the benchmark for SipHash_32b 10 times and calculate standard deviation / mean = coefficient of variation: * 0.57% CV for old benchmarking framework * 0.20% CV for nanobench So the benchmark results with nanobench seem to vary less than with the old framework. * It automatically determines runtime based on clock precision, no need to specify number of evaluations. * measure instructions, cycles, branches, instructions per cycle, branch misses (only Linux, when performance counters are available) * output in markdown table format. * Warn about unstable environment (frequency scaling, turbo, ...) * For better profiling, it is possible to set the environment variable NANOBENCH_ENDLESS to force endless running of a particular benchmark without the need to recompile. This makes it to e.g. run "perf top" and look at hotspots. Here is an example copy & pasted from the terminal output: | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 2.52 | 396,529,415.94 | 0.6% | 25.42 | 8.02 | 3.169 | 0.06 | 0.0% | 0.03 | `bench/crypto_hash.cpp RIPEMD160` | 1.87 | 535,161,444.83 | 0.3% | 21.36 | 5.95 | 3.589 | 0.06 | 0.0% | 0.02 | `bench/crypto_hash.cpp SHA1` | 3.22 | 310,344,174.79 | 1.1% | 36.80 | 10.22 | 3.601 | 0.09 | 0.0% | 0.04 | `bench/crypto_hash.cpp SHA256` | 2.01 | 496,375,796.23 | 0.0% | 18.72 | 6.43 | 2.911 | 0.01 | 1.0% | 0.00 | `bench/crypto_hash.cpp SHA256D64_1024` | 7.23 | 138,263,519.35 | 0.1% | 82.66 | 23.11 | 3.577 | 1.63 | 0.1% | 0.00 | `bench/crypto_hash.cpp SHA256_32b` | 3.04 | 328,780,166.40 | 0.3% | 35.82 | 9.69 | 3.696 | 0.03 | 0.0% | 0.03 | `bench/crypto_hash.cpp SHA512` [1] https://github.com/martinus/nanobench
This adds support to calculate asymptotic complexity of a benchmark. This is similar to bitcoin#17375, but currently only one asymptote is supported, and I have added support in the benchmark `ComplexMemPool` as an example. Usage is e.g. like this: ``` ./bench_bitcoin -filter=ComplexMemPool -asymptote=25,50,100,200,400,600,800 ``` This runs the benchmark `ComplexMemPool` several times but with different complexityN settings. The benchmark can extract that number and use it accordingly. Here, it's used for `childTxs`. The output is this: | complexityN | ns/op | op/s | err% | ins/op | cyc/op | IPC | total | benchmark |------------:|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|----------:|:---------- | 25 | 1,064,241.00 | 939.64 | 1.4% | 3,960,279.00 | 2,829,708.00 | 1.400 | 0.01 | `ComplexMemPool` | 50 | 1,579,530.00 | 633.10 | 1.0% | 6,231,810.00 | 4,412,674.00 | 1.412 | 0.02 | `ComplexMemPool` | 100 | 4,022,774.00 | 248.58 | 0.6% | 16,544,406.00 | 11,889,535.00 | 1.392 | 0.04 | `ComplexMemPool` | 200 | 15,390,986.00 | 64.97 | 0.2% | 63,904,254.00 | 47,731,705.00 | 1.339 | 0.17 | `ComplexMemPool` | 400 | 69,394,711.00 | 14.41 | 0.1% | 272,602,461.00 | 219,014,691.00 | 1.245 | 0.76 | `ComplexMemPool` | 600 | 168,977,165.00 | 5.92 | 0.1% | 639,108,082.00 | 535,316,887.00 | 1.194 | 1.86 | `ComplexMemPool` | 800 | 310,109,077.00 | 3.22 | 0.1% |1,149,134,246.00 | 984,620,812.00 | 1.167 | 3.41 | `ComplexMemPool` | coefficient | err% | complexity |--------------:|-------:|------------ | 4.78486e-07 | 4.5% | O(n^2) | 6.38557e-10 | 21.7% | O(n^3) | 3.42338e-05 | 38.0% | O(n log n) | 0.000313914 | 46.9% | O(n) | 0.0129823 | 114.4% | O(log n) | 0.0815055 | 133.8% | O(1) The best fitting curve is O(n^2), so the algorithm seems to scale quadratic with `childTxs` in the range 25 to 800.
Instead of `using namespace ankerl::nanobench`, just pull in ankerl::nanobench::Bench. This is the only class needed in the benchmarks anyways.
-output_cvs produces the (legacy) CSV format, and -output_json produces a big JSON file all the data that has been gathered. Put all args into a separate struct to prevent argument explosion of BenchRunner::RunAll.
This reverts commit e6e622e.
This reverts commit d0e8f4d.
6895fba to
991966f
Compare
Gitian builds
|
|
🐙 This pull request conflicts with the target branch and needs rebase. |
|
With the new bench framwork I get this: This pull with the last two commits dropped: This pull: |
|
So it does seem to get slightly faster, which might point to a problem with the current bench framework. |
Get gitian builds for two revisions, with and without #16902