Micro-benchmarking framework part 1 #3858

str4d · 2019-02-22T11:42:34Z

Cherry-picked from the following upstream PRs:

Simple benchmarking framework bitcoin/bitcoin#6733
[Trivial] Ignore bench_bitcoin binary [skip ci] bitcoin/bitcoin#6770
[Trivial] ensure minimal header conventions bitcoin/bitcoin#6892
- Excluding changes to src/policy/policy.h which we don't have yet.
Improve rolling bloom filter performance and benchmark bitcoin/bitcoin#7934
- Just the benchmark, not the performance improvements.
bench: Add crypto hash benchmarks bitcoin/bitcoin#8039
bench: Added base58 encoding/decoding benchmarks bitcoin/bitcoin#8107
Avoid integer division in the benchmark inner-most loop. bitcoin/bitcoin#8115
Kill insecure_random and associated global state bitcoin/bitcoin#8914
- Required resolving several merge conflicts in code that had been refactored upstream. The changes were simple enough that I decided it was okay to impose merge conflicts on pulling in those refactors later.
bench: Fix subtle counting issue when rescaling iteration count bitcoin/bitcoin#9200
bench: Add support for measuring CPU cycles bitcoin/bitcoin#9202
- Adds support for measuring CPU cycles, which is later removed in an upstream PR after the refactor. I am including it to reduce future merge conflicts.
Refactor: Remove using namespace <xxx> from bench/ & test/ sources bitcoin/bitcoin#9281
- Only changes to src/bench/bench.cpp
Basic CCheckQueue Benchmarks bitcoin/bitcoin#9498
bench: Fix initialization order in registration bitcoin/bitcoin#9712
bench: Assert that division by zero is unreachable bitcoin/bitcoin#9547
Prevector Quick Destruct bitcoin/bitcoin#9505
- Just the benchmark, not the performance improvements.
FastRandomContext improvements and switch to ChaCha20 bitcoin/bitcoin#9792
- Just the benchmark, not the performance improvements.
[Tests] Prevent warning: variable 'x' is uninitialized bitcoin/bitcoin#10272
Replace boost::function with std::function (C++11) bitcoin/bitcoin#10395
- Only changes to src/bench/
Avoid static analyzer warnings regarding uninitialized arguments bitcoin/bitcoin#10735
- Only changes to src/bench/base58.cpp
[bench] Restore format state of cout after printing with std::fixed/setprecision bitcoin/bitcoin#10963
Fix estimatesmartfee rounding display issue bitcoin/bitcoin#11303
- Only the benchmark backend change.
bench: use std::chrono rather than gettimeofday bitcoin/bitcoin#11562
Require a steady clock for bench with at least micro precision bitcoin/bitcoin#11646
tests: Initialize recently introduced non-static class member lastCycles to zero in constructor bitcoin/bitcoin#11654

This pulls in all changes to the micro-benchmark framework prior to December 2017, when it was rewritten. The rewrite depends on other upstream PRs we have not pulled in yet.

This does not pull in all benchmarks prior to December 2017. It leaves out benchmarks that either test code we do not have yet (except for the FastRandomContext refactor, which I decided to pull in), or would require rewrites to work with our changes to the codebase.

str4d · 2019-02-22T11:47:05Z

This PR can be reviewed in two ways:

Check that the applied commits match the referenced upstream PRs, aside from necessary Zcash changes (such as adding Zcash libraries to the linker parameters).
Review the PR as a whole (rather than per-commit) to check that the changes make sense.

daira

I did not review the benchmarking code (under src/bench), but I reviewed the overall diff of other files and saw only minor changes of behaviour.

daira · 2019-03-20T16:00:55Z

src/netbase.cpp

-        random_auth.username = strprintf("%i", insecure_rand());
-        random_auth.password = strprintf("%i", insecure_rand());
+        static std::atomic_int counter;
+        random_auth.username = random_auth.password = strprintf("%i", counter++);


I don't understand this change. How is this random at all? What was the intent of proxy.randomize_credentials?

See this comment in the top post of the PR (bitcoin/bitcoin#8914):

The use of insecure_rand in ConnectThroughProxy has been replaced by an atomic integer counter. The only goal here is to have a different credentials pair for each connection to go on a different Tor circuit, it does not need to be random nor unpredictable.

daira · 2019-03-20T16:18:14Z

src/txmempool.cpp

        return;

-    if (insecure_rand() >= nCheckFrequency)
+    if (GetRand(std::numeric_limits<uint32_t>::max()) >= nCheckFrequency)


This is technically a semantic change because insecure_rand() can return 0xFFFFFFFF, and GetRand(std::numeric_limits<uint32_t>::max() can't.

It probably doesn't matter given the intended semantics of nCheckFrequency.

zkbot · 2019-08-01T14:39:45Z

☔ The latest upstream changes (presumably #3926) made this pull request unmergeable. Please resolve the merge conflicts.

Benchmarking framework, loosely based on google's micro-benchmarking library (https://github.com/google/benchmark) Wny not use the Google Benchmark framework? Because adding Even More Dependencies isn't worth it. If we get a dozen or three benchmarks and need nanosecond-accurate timings of threaded code then switching to the full-blown Google Benchmark library should be considered. The benchmark framework is hard-coded to run each benchmark for one wall-clock second, and then spits out .csv-format timing information to stdout. It is left as an exercise for later (or maybe never) to add command-line arguments to specify which benchmark(s) to run, how long to run them for, how to format results, etc etc etc. Again, see the Google Benchmark framework for where that might end up. See src/bench/MilliSleep.cpp for a sanity-test benchmark that just benchmarks 'sleep 100 milliseconds.' To compile and run benchmarks: cd src; make bench Sample output: Benchmark,count,min,max,average Sleep100ms,10,0.101854,0.105059,0.103881

Avoid calling gettimeofday every time through the benchmarking loop, by keeping track of how long each loop takes and doubling the number of iterations done between time checks when they take less than 1/16'th of the total elapsed time.

- ensure header namespaces and end comments are correct - add missing header end comments - ensure minimal formatting (add newlines etc.) Zcash: left out change to src/policy/policy.h which we don't yet have.

Add benchmarks for the cryptographic hash algorithms: - RIPEMD160 - SHA1 - SHA256 - SHA512 Continues work on #7883.

Previously the benchmark code used an integer division (%) with a non-constant in the inner-loop. This is quite slow on many processors, especially ones like ARM that lack a hardware divide. Even on fairly recent x86_64 like haswell an integer division can take something like 100 cycles-- making it comparable to the runtime of siphash. This change avoids the division by using bitmasking instead. This was especially easy since the count was only increased by doubling. This change also restarts the timing when the execution time was very low this avoids mintimes of zero in cases where one execution ends up below the timer resolution. It also reduces the impact of the overhead on the final result. The formatting of the prints is changed to not use scientific notation make it more machine readable (in particular, gnuplot croaks on the non-fixedpoint, and it doesn't sort correctly). This also hoists out all the floating point divisions out of the semi-hot path because it was easy to do so. It might be prudent to break out the critical test into a macro just to guarantee that it gets inlined. It might also make sense to just save out the intermediate counts and times and get the floating point completely out of the timing loop (because e.g. on hardware without a fast hardware FPU like some ARM it will still be slow enough to distort the results). I haven't done either of these in this commit.

There are only a few uses of `insecure_random` outside the tests. This PR replaces uses of insecure_random (and its accompanying global state) in the core code with an FastRandomContext that is automatically seeded on creation. This is meant to be used for inner loops. The FastRandomContext can be in the outer scope, or the class itself, then rand32() is used inside the loop. Useful e.g. for pushing addresses in CNode or the fee rounding, or randomization for coin selection. As a context is created per purpose, thus it gets rid of cross-thread unprotected shared usage of a single set of globals, this should also get rid of the potential race conditions. - I'd say TxMempool::check is not called enough to warrant using a special fast random context, this is switched to GetRand() (open for discussion...) - The use of `insecure_rand` in ConnectThroughProxy has been replaced by an atomic integer counter. The only goal here is to have a different credentials pair for each connection to go on a different Tor circuit, it does not need to be random nor unpredictable. - To avoid having a FastRandomContext on every CNode, the context is passed into PushAddress as appropriate. There remains an insecure_random for test usage in `test_random.h`. Zcash: Resolved conflicts with the following files src/addrman.cpp src/main.cpp src/net.cpp src/net.h src/policy/fees.cpp src/policy/fees.h src/random.cpp src/test/merkle_tests.cpp src/test/net_tests.cpp src/test/prevector_tests.cpp src/test/sighash_tests.cpp src/test/skiplist_tests.cpp src/test/test_bitcoin.cpp src/test/versionbits_tests.cpp src/wallet/test/crypto_tests.cpp

Make sure that the count is a zero modulo the new mask before scaling, otherwise the next time until a measure triggers will take only 1/2 as long as accounted for. This caused the 'min time' to be potentially off by as much as 100%.

This adds cycle min/max/avg to the statistics. Supported on x86 and x86_64 (natively through rdtsc), as well as Linux (perf syscall).

…ce files. Zcash: Only apply changes to src/bench/bench.cpp

…c numbers, fixed scoping of vectors (and memory movement component of benchmark).

The initialization order of global data structures in different implementation units is undefined. Making use of this is essentially gambling on what the linker does, the so-called [Static initialization order fiasco](https://isocpp.org/wiki/faq/ctors#static-init-order). In this case it apparently worked on Linux but failed on OpenBSD and FreeBSD. To create it on first use, make the registration structure local to a function. Fixes #8910.

…lly unreachable

Zcash: Only applied changes to src/bench/

Avoid static analyzer warnings regarding "Function call argument is a pointer to uninitialized value" in cases where we are intentionally using such arguments. This is achieved by using ... `f(b.begin(), b.end())` (`std::array<char, N>`) ... instead of ... `f(b, b + N)` (`char b[N]`) Rationale: * Reduce false positives by guiding static analyzers regarding our intentions. Before this commit: ``` $ clang-tidy-3.5 -checks=* src/bench/base58.cpp bench/base58.cpp:23:9: warning: Function call argument is a pointer to uninitialized value [clang-analyzer-core.CallAndMessage] EncodeBase58(b, b + 32); ^ $ clang-tidy-3.5 -checks=* src/bench/verify_script.cpp bench/verify_script.cpp:59:5: warning: Function call argument is a pointer to uninitialized value [clang-analyzer-core.CallAndMessage] key.Set(vchKey, vchKey + 32, false); ^ $ ``` After this commit: ``` $ clang-tidy-3.5 -checks=* src/bench/base58.cpp $ clang-tidy-3.5 -checks=* src/bench/verify_script.cpp $ ``` Zcash: Only applied changes to src/bench/base58.cpp

…etprecision

We were saving a div by caching the inverse as a float, but this ended up requiring a int -> float -> int conversion, which takes almost as much time as the difference between float mul and div. There are lots of other more pressing issues with the bench framework which probably require simply removing the adaptive iteration count stuff anyway.

std::chrono removes portability issues. Rather than storing doubles, store the untouched time_points. Then convert to nanoseconds for display. This allows for maximum precision, while keeping results comparable between differing hardware/operating systems. Also, display full nanosecond counts rather than sub-second floats.

str4d · 2020-01-22T21:45:37Z

The review on this PR satisfies our merge policy. I'll rebase to fix the trivial merge conflicts (one in configure.ac with #4171, and one in the includes in src/txmempool.h with #3926), and then this can be r+ed when master is free for merging.

str4d · 2020-01-24T16:35:47Z

@zkbot r+

zkbot · 2020-01-24T16:35:48Z

📌 Commit 204384c has been approved by str4d

zkbot · 2020-01-24T18:04:11Z

⌛ Testing commit 204384c with merge aa225ebb0b7232706681cde62c0bd8c56fd11969...

zkbot · 2020-01-24T20:52:27Z

💔 Test failed - pr-merge

str4d · 2020-01-24T22:54:05Z

Transient failure sigh

@zkbot retry

zkbot · 2020-01-24T23:39:54Z

⌛ Testing commit 204384c with merge 74ff73a...

Micro-benchmarking framework part 1 Cherry-picked from the following upstream PRs: - bitcoin/bitcoin#6733 - bitcoin/bitcoin#6770 - bitcoin/bitcoin#6892 - Excluding changes to `src/policy/policy.h` which we don't have yet. - bitcoin/bitcoin#7934 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#8039 - bitcoin/bitcoin#8107 - bitcoin/bitcoin#8115 - bitcoin/bitcoin#8914 - Required resolving several merge conflicts in code that had been refactored upstream. The changes were simple enough that I decided it was okay to impose merge conflicts on pulling in those refactors later. - bitcoin/bitcoin#9200 - bitcoin/bitcoin#9202 - Adds support for measuring CPU cycles, which is later removed in an upstream PR after the refactor. I am including it to reduce future merge conflicts. - bitcoin/bitcoin#9281 - Only changes to `src/bench/bench.cpp` - bitcoin/bitcoin#9498 - bitcoin/bitcoin#9712 - bitcoin/bitcoin#9547 - bitcoin/bitcoin#9505 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#9792 - Just the benchmark, not the performance improvements. - bitcoin/bitcoin#10272 - bitcoin/bitcoin#10395 - Only changes to `src/bench/` - bitcoin/bitcoin#10735 - Only changes to `src/bench/base58.cpp` - bitcoin/bitcoin#10963 - bitcoin/bitcoin#11303 - Only the benchmark backend change. - bitcoin/bitcoin#11562 - bitcoin/bitcoin#11646 - bitcoin/bitcoin#11654 This pulls in all changes to the micro-benchmark framework prior to December 2017, when it was rewritten. The rewrite depends on other upstream PRs we have not pulled in yet. This does not pull in all benchmarks prior to December 2017. It leaves out benchmarks that either test code we do not have yet (except for the `FastRandomContext` refactor, which I decided to pull in), or would require rewrites to work with our changes to the codebase.

zkbot · 2020-01-25T00:44:57Z

☀️ Test successful - pr-merge
Approved by: str4d
Pushing 74ff73a to master...

daira · 2020-01-25T11:29:38Z

@str4d @ebfull @steven-ecc

Process question - there was an unresolved question from my review: #3858 (comment) . Also the review was from before the merge policy was relaxed, so I was partly relying on the fact that there would need to be another review. Should this have been considered to satisfy the merge policy?

(As it happens, this merge broke the macOS build: #4309 . It was a rather large set of changes so I suspect that it should have needed two reviews.)

str4d · 2020-01-27T16:10:22Z

there was an unresolved question from my review

Sorry, I thought I'd clicked "submit" on my reply, but hadn't 😞 Per my now-visible comment (#3858 (comment)), it's explained by upstream.

Also the review was from before the merge policy was relaxed, so I was partly relying on the fact that there would need to be another review. Should this have been considered to satisfy the merge policy?

Hmm, this is a good point (ideally reviews would be fungible, but in practice there will have been this implicit reliance for past reviews). For PRs with prior reviews, we should check with the reviewers to ensure they are happy with the review level.

Worst-case block verification benchmarks The micro-benchmark framework from #3858 is used to measure the time to verify various transaction components. These are leveraged by a script that simulates the verification cost of different compositions of full blocks.

…tr4d FastRandomContext improvements and switch to ChaCha20 Backport of bitcoin/bitcoin#9792 Commits are recorded here in stack order - pick bitcoin/bitcoin@4fd2d2fc97 - pick bitcoin/bitcoin@16329224e7 - pick bitcoin/bitcoin@e04326fe66 - pick bitcoin/bitcoin@663fbae777 -- already applied in #3858, skip - pick bitcoin/bitcoin@c21cbe61c6 -- already applied in #3858, skip

str4d added C-upstream-port Category: Changes that are ported from the Bitcoin Core codebase. A-benchmarks Area: Benchmarks labels Feb 22, 2019

daira approved these changes Mar 20, 2019

View reviewed changes

gavinandresen and others added 25 commits January 22, 2020 21:40

Support very-fast-running benchmarks

a81da44

Avoid calling gettimeofday every time through the benchmarking loop, by keeping track of how long each loop takes and doubling the number of iterations done between time checks when they take less than 1/16'th of the total elapsed time.

Ignore bench_bitcoin binary.

9233b8a

[Trivial] ensure minimal header conventions

e72de18

- ensure header namespaces and end comments are correct - add missing header end comments - ensure minimal formatting (add newlines etc.) Zcash: left out change to src/policy/policy.h which we don't yet have.

Benchmark rolling bloom filter

3161960

bench: Add crypto hash benchmarks

6daf01e

Add benchmarks for the cryptographic hash algorithms: - RIPEMD160 - SHA1 - SHA256 - SHA512 Continues work on #7883.

bench: Added base58 encoding/decoding benchmarks

9585d5a

bench: Add support for measuring CPU cycles

2561e26

This adds cycle min/max/avg to the statistics. Supported on x86 and x86_64 (natively through rdtsc), as well as Linux (perf syscall).

Refactoring: Removed using namespace <xxx> from bench/ and test/ sour…

d2b7c97

…ce files. Zcash: Only apply changes to src/bench/bench.cpp

Add Basic CheckQueue Benchmark

7e1cebe

Address ryanofsky feedback on CCheckQueue benchmarks. Eliminated magi…

f53305e

…c numbers, fixed scoping of vectors (and memory movement component of benchmark).

Assert that what might look like a possible division by zero is actua…

2edcfea

…lly unreachable

Add prevector destructor benchmark

a908ba7

Introduce FastRandomContext::randbool()

2070ada

FastRandom benchmark

8f23ab0

Prevent warning: variable 'x' is uninitialized

f84f566

Replace boost::function with std::function (C++11)

0c89713

Zcash: Only applied changes to src/bench/

Restore default format state of cout after printing with std::fixed/s…

550cb05

…etprecision

str4d added the S-scratching-an-itch Status: Something we haven't planned for a sprint but are doing anyway label Jan 22, 2020

str4d added this to the Core Sprint 2020-03 milestone Jan 22, 2020

str4d force-pushed the microbench-1 branch from 3f3c634 to 204384c Compare January 22, 2020 22:19

zkbot merged commit 204384c into zcash:master Jan 25, 2020

str4d deleted the microbench-1 branch January 27, 2020 16:03

This was referenced Jan 30, 2020

Worst-case block verification benchmarks #4316

Merged

Benchmark-related errors on macOS after #3858 got merged #4309

Closed

oxarbitrage mentioned this pull request Feb 5, 2020

Replace boost::function with std::function #4324

Merged

str4d mentioned this pull request Mar 6, 2020

Bring the librustzcash crate into this repository #4321

Merged

str4d mentioned this pull request Feb 18, 2021

FastRandomContext improvements and switch to ChaCha20 #4991

Merged

Micro-benchmarking framework part 1 #3858

Micro-benchmarking framework part 1 #3858

Uh oh!

Conversation

str4d commented Feb 22, 2019

Uh oh!

str4d commented Feb 22, 2019

Uh oh!

daira left a comment

Choose a reason for hiding this comment

Uh oh!

daira Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

str4d Jan 27, 2020

Choose a reason for hiding this comment

Uh oh!

daira Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

zkbot commented Aug 1, 2019

Uh oh!

str4d commented Jan 22, 2020

Uh oh!

str4d commented Jan 24, 2020

Uh oh!

zkbot commented Jan 24, 2020

Uh oh!

zkbot commented Jan 24, 2020

Uh oh!

zkbot commented Jan 24, 2020

Uh oh!

str4d commented Jan 24, 2020

Uh oh!

zkbot commented Jan 24, 2020

Uh oh!

zkbot commented Jan 25, 2020

Uh oh!

daira commented Jan 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

str4d commented Jan 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

daira commented Jan 25, 2020 •

edited

Loading