Skip to content

Conversation

@gmaxwell
Copy link
Contributor

Previously the benchmark code used an integer division (%) with
a non-constant in the inner-loop. This is quite slow on many
processors, especially ones like ARM that lack a hardware divide.

Even on fairly recent x86_64 like haswell an integer division can
take something like 100 cycles-- making it comparable to the
runtime of siphash.

This change avoids the division by using bitmasking instead. This
was especially easy since the count was only increased by doubling.

This change also restarts the timing when the execution time was
very low this avoids mintimes of zero in cases where one execution
ends up below the timer resolution. It also reduces the impact of
the overhead on the final result.

The formatting of the prints is changed to not use scientific
notation make it more machine readable (in particular, gnuplot
croaks on the non-fixedpoint, and it doesn't sort correctly).

This also hoists out all the floating point divisions out of the
semi-hot path because it was easy to do so.

It might be prudent to break out the critical test into a macro
just to guarantee that it gets inlined. It might also make sense
to just save out the intermediate counts and times and get the
floating point completely out of the timing loop (because e.g.
on hardware without a fast hardware FPU like some ARM it will
still be slow enough to distort the results). I haven't done
either of these in this commit.

@fanquake
Copy link
Member

Master

Benchmark,count,min,max,average
RIPEMD160,415,0.00242996,0.00253913,0.00246098
RollingBloom-refresh,1,0.000801,0.000801,0.000801
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000107,0.000107,0.000107
RollingBloom-refresh,1,0.000131,0.000131,0.000131
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom-refresh,1,0.00011,0.00011,0.00011
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000129,0.000129,0.000129
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000114,0.000114,0.000114
RollingBloom-refresh,1,0.000134,0.000134,0.000134
RollingBloom-refresh,1,0.000109,0.000109,0.000109
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.000108,0.000108,0.000108
RollingBloom-refresh,1,0.00012,0.00012,0.00012
RollingBloom-refresh,1,0.000111,0.000111,0.000111
RollingBloom,1441791,6.77479e-07,1.19209e-06,7.49941e-07
SHA1,575,0.00178885,0.00195658,0.0018124
SHA256,239,0.00424147,0.00436625,0.00429098
SHA512,383,0.00258148,0.00269461,0.00262054
Sleep100ms,10,0.100642,0.105008,0.103793
Trig,46137343,0,9.53674e-07,2.26862e-08

This PR

#Benchmark,count,min,max,average
RIPEMD160,448,0.001245033173334,0.002638196945190,0.002461894814457
RollingBloom-refresh,1,0.000635000000000,0.000635000000000,0.000635000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000107000000000,0.000107000000000,0.000107000000000
RollingBloom-refresh,1,0.000204000000000,0.000204000000000,0.000204000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000120000000000,0.000120000000000,0.000120000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000112000000000,0.000112000000000,0.000112000000000
RollingBloom-refresh,1,0.000127000000000,0.000127000000000,0.000127000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000109000000000,0.000109000000000,0.000109000000000
RollingBloom-refresh,1,0.000134000000000,0.000134000000000,0.000134000000000
RollingBloom-refresh,1,0.000113000000000,0.000113000000000,0.000113000000000
RollingBloom-refresh,1,0.000111000000000,0.000111000000000,0.000111000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom-refresh,1,0.000110000000000,0.000110000000000,0.000110000000000
RollingBloom-refresh,1,0.000107000000000,0.000107000000000,0.000107000000000
RollingBloom-refresh,1,0.000108000000000,0.000108000000000,0.000108000000000
RollingBloom,1310720,0.000000364444583,0.000000838096198,0.000000766858648
SHA1,640,0.000909024336207,0.001938136418660,0.001843086257577
SHA256,256,0.002209486499909,0.008500099182129,0.004300644621253
SHA512,384,0.001319904176016,0.002813005447388,0.002615700786312
Sleep100ms,10,0.205592155456543,0.210056066513062,0.104166316986084
Trig,67108864,0.000000014997003,0.000000015448112,0.000000015188842

@laanwj laanwj added the Tests label May 30, 2016
@laanwj
Copy link
Member

laanwj commented May 30, 2016

utACK

@gmaxwell
Copy link
Contributor Author

There is a bug in the min/max handling (see the sleep 100 timings). I'll fix.

Previously the benchmark code used an integer division (%) with
 a non-constant in the inner-loop.  This is quite slow on many
 processors, especially ones like ARM that lack a hardware divide.

Even on fairly recent x86_64 like haswell an integer division can
 take something like 100 cycles-- making it comparable to the
 runtime of siphash.

This change avoids the division by using bitmasking instead. This
 was especially easy since the count was only increased by doubling.

This change also restarts the timing when the execution time was
 very low this avoids mintimes of zero in cases where one execution
 ends up below the timer resolution. It also reduces the impact of
 the overhead on the final result.

The formatting of the prints is changed to not use scientific
 notation make it more machine readable (in particular, gnuplot
 croaks on the non-fixedpoint, and it doesn't sort correctly).

This also hoists out all the floating point divisions out of the
 semi-hot path because it was easy to do so.

It might be prudent to break out the critical test into a macro
 just to guarantee that it gets inlined.  It might also make sense
 to just save out the intermediate counts and times and get the
 floating point completely out of the timing loop (because e.g.
 on hardware without a fast hardware FPU like some ARM it will
 still be slow enough to distort the results). I haven't done
 either of these in this commit.
@laanwj laanwj merged commit 63ff57d into bitcoin:master May 31, 2016
laanwj added a commit that referenced this pull request May 31, 2016
63ff57d Avoid integer division in the benchmark inner-most loop. (Gregory Maxwell)
codablock pushed a commit to codablock/dash that referenced this pull request Dec 21, 2017
…t loop.

63ff57d Avoid integer division in the benchmark inner-most loop. (Gregory Maxwell)
andvgal pushed a commit to energicryptocurrency/gen2-energi that referenced this pull request Jan 6, 2019
…t loop.

63ff57d Avoid integer division in the benchmark inner-most loop. (Gregory Maxwell)
zkbot added a commit to zcash/zcash that referenced this pull request Jan 24, 2020
Micro-benchmarking framework part 1

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6733
- bitcoin/bitcoin#6770
- bitcoin/bitcoin#6892
  - Excluding changes to `src/policy/policy.h` which we don't have yet.
- bitcoin/bitcoin#7934
  - Just the benchmark, not the performance improvements.
- bitcoin/bitcoin#8039
- bitcoin/bitcoin#8107
- bitcoin/bitcoin#8115
- bitcoin/bitcoin#8914
  - Required resolving several merge conflicts in code that had been refactored upstream. The changes were simple enough that I decided it was okay to impose merge conflicts on pulling in those refactors later.
- bitcoin/bitcoin#9200
- bitcoin/bitcoin#9202
  - Adds support for measuring CPU cycles, which is later removed in an upstream PR after the refactor. I am including it to reduce future merge conflicts.
- bitcoin/bitcoin#9281
  - Only changes to `src/bench/bench.cpp`
- bitcoin/bitcoin#9498
- bitcoin/bitcoin#9712
- bitcoin/bitcoin#9547
- bitcoin/bitcoin#9505
  - Just the benchmark, not the performance improvements.
- bitcoin/bitcoin#9792
  - Just the benchmark, not the performance improvements.
- bitcoin/bitcoin#10272
- bitcoin/bitcoin#10395
  - Only changes to `src/bench/`
- bitcoin/bitcoin#10735
  - Only changes to `src/bench/base58.cpp`
- bitcoin/bitcoin#10963
- bitcoin/bitcoin#11303
  - Only the benchmark backend change.
- bitcoin/bitcoin#11562
- bitcoin/bitcoin#11646
- bitcoin/bitcoin#11654

This pulls in all changes to the micro-benchmark framework prior to December 2017, when it was rewritten. The rewrite depends on other upstream PRs we have not pulled in yet.

This does not pull in all benchmarks prior to December 2017. It leaves out benchmarks that either test code we do not have yet (except for the `FastRandomContext` refactor, which I decided to pull in), or would require rewrites to work with our changes to the codebase.
furszy added a commit to PIVX-Project/PIVX that referenced this pull request Jun 8, 2020
3f3edde [Bench] Use PIVX address in Base58Decode test (random-zebra)
5a1be90 [Travis] Disable benchmark framework for trusty test (random-zebra)
1bd89ac Initialize recently introduced non-static class member lastCycles to zero in constructor (random-zebra)
ec60671 Require a steady clock for bench with at least micro precision (random-zebra)
84069ce bench: prefer a steady clock if the resolution is no worse (random-zebra)
38367b1 bench: switch to std::chrono for time measurements (random-zebra)
a24633a Remove countMaskInv caching in bench framework (random-zebra)
9e9bc22 Restore default format state of cout after printing with std::fixed/setprecision (random-zebra)
3dd559d Avoid static analyzer warnings regarding uninitialized arguments (random-zebra)
e85f224 Replace boost::function with std::function (C++11) (random-zebra)
98c0857 Prevent warning: variable 'x' is uninitialized (random-zebra)
7f0d4b3 FastRandom benchmark (random-zebra)
d9fa0c6 Add prevector destructor benchmark (random-zebra)
e1527ba Assert that what might look like a possible division by zero is actually unreachable (random-zebra)
e94cf15 bench: Fix initialization order in registration (random-zebra)
151c25f Basic CCheckQueue Benchmarks (random-zebra)
51aedbc Use std:thread:hardware_concurrency, instead of Boost, to determine available cores (random-zebra)
d447613 Use real number of cores for default -par, ignore virtual cores (random-zebra)
9162a56 [Refactoring] Removed using namespace <xxx> from bench/ sources (random-zebra)
5c07f67 bench: Add support for measuring CPU cycles (random-zebra)
41ce1ed bench: Fix subtle counting issue when rescaling iteration count (random-zebra)
68ea794 Avoid integer division in the benchmark inner-most loop. (random-zebra)
3fa4f27 bench: Added base58 encoding/decoding benchmarks (random-zebra)
4442118 bench: Add crypto hash benchmarks (random-zebra)
a5179b6 [Trivial] ensure minimal header conventions (random-zebra)
8607d6b Support very-fast-running benchmarks (random-zebra)
4aebb60 Simple benchmarking framework (random-zebra)

Pull request description:

  Introduces the benchmarking framework, loosely based on google's micro-benchmarking library (https://github.com/google/benchmark), ported from Bitcoin, up to 0.16.
  The benchmark framework is hard-coded to run each benchmark for one wall-clock second,
  and then spits out .csv-format timing information to stdout.

  Backported PR:
  - bitcoin#6733
  - bitcoin#6770
  - bitcoin#6892
  - bitcoin#8039
  - bitcoin#8107
  - bitcoin#8115
  - bitcoin#9200
  - bitcoin#9202
  - bitcoin#9281
  - bitcoin#6361
  - bitcoin#10271
  - bitcoin#9498
  - bitcoin#9712
  - bitcoin#9547
  - bitcoin#9505 (benchmark only. Rest was in #1557)
  - bitcoin#9792 (benchmark only. Rest was in #643)
  - bitcoin#10272
  - bitcoin#10395 (base58 only)
  - bitcoin#10963
  - bitcoin#11303 (first commit)
  - bitcoin#11562
  - bitcoin#11646
  - bitcoin#11654

  Current output of `src/bench/bench_pivx`:
  ```
  #Benchmark,count,min(ns),max(ns),average(ns),min_cycles,max_cycles,average_cycles
  Base58CheckEncode,131072,7697,8065,7785,20015,20971,20242
  Base58Decode,294912,3305,3537,3454,8595,9198,8981
  Base58Encode,180224,5498,6020,5767,14297,15652,14994
  CCheckQueueSpeed,320,3159960,3535173,3352787,8216030,9191602,8717388
  CCheckQueueSpeedPrevectorJob,96,9184484,11410840,10823070,23880046,29668680,28140445
  FastRandom_1bit,320,3143690,4838162,3199156,8173726,12579373,8317941
  FastRandom_32bit,60,17097612,17923669,17367440,44454504,46602306,45156079
  PrevectorClear,3072,334741,366618,346731,870340,953224,901516
  PrevectorDestructor,2816,344233,368912,357281,895022,959187,928948
  RIPEMD160,288,3404503,3693917,3577774,8851850,9604334,9302363
  SHA1,384,2718128,2891558,2802513,7067238,7518184,7286652
  SHA256,176,6133760,6580005,6239866,15948035,17108376,16223916
  SHA512,240,4251468,4358706,4313463,11054006,11332826,11215186
  Sleep100ms,10,100221470,100302411,100239073,260580075,260790726,260625870
  ```

  NOTE: Not all the tests have been pulled yet (as we might not have the code being tested, or it  would require rewrites to work with our different code base), but the framework is updated to December 2017.

ACKs for top commit:
  Fuzzbawls:
    ACK 3f3edde

Tree-SHA512: c283311a9accf6d2feeb93b185afa08589ebef3f18b6e86980dbc3647b9845f75ac9ecce2f1b08738d25ceac36596a2c89d41e4dbf3b463502aa695611aa1f8e
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants