-
Notifications
You must be signed in to change notification settings - Fork 38.8k
random: add benchmarks and drop unnecessary Shuffle function #30396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
|
I've also benchmarked Deniel Lemire's nearly divisionless random integer generation (which libstdc++'s For |
d37b7a3 to
e8985f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is determinism of std::shuffle controlled/controllable? If we don't control it maybe it would be better to keep our custom one to ease reproducibility of test failures, despite it being slower.
I see now that you are passing the proper context. 👍
src/bench/random.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
251438 seems to be a fine hook for manhole covers. :)
Worth documenting the choice of seed initialization value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the alphabetic position of the letters of "bench" (2-5-14-3-8) concatenated, but it's really just an arbitrary constant.
Also rename the benchmark names to match the operation names
Benchmarks show it is no longer faster with modern standard C++ libraries, and the debug-mode failure due to self-move has been fixed as well.
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran benchmarks on da28a26 with..
Clang, non-debug, Linux
$ src/bench/bench_bitcoin --filter=".*Random.*"
| ns/number | number/s | err% | ins/number | cyc/number | IPC | bra/number | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 12.84 | 77,883,564.55 | 0.2% | 103.12 | 47.37 | 2.177 | 15.66 | 4.2% | 0.01 | `FastRandom_Shuffle100`
| 10.35 | 96,651,871.86 | 0.1% | 116.75 | 38.15 | 3.060 | 9.38 | 0.0% | 0.01 | `FastRandom_rand32`
| 19.03 | 52,549,720.82 | 0.2% | 214.50 | 70.12 | 3.059 | 15.75 | 0.0% | 0.01 | `FastRandom_rand64`
| 14.56 | 68,674,993.41 | 0.2% | 149.17 | 53.67 | 2.779 | 14.49 | 1.2% | 0.01 | `FastRandom_randbits`
| 1.69 | 591,854,532.68 | 0.0% | 11.38 | 6.23 | 1.826 | 1.26 | 0.2% | 0.01 | `FastRandom_randbool`
| 12.11 | 82,569,289.00 | 0.1% | 95.33 | 44.68 | 2.133 | 13.70 | 4.4% | 0.01 | `FastRandom_randrange100`
| 13.55 | 73,808,837.05 | 0.6% | 108.46 | 49.99 | 2.170 | 14.63 | 4.4% | 0.01 | `FastRandom_randrange1000`
| 17.80 | 56,164,658.15 | 0.1% | 156.81 | 65.65 | 2.389 | 18.39 | 3.3% | 0.20 | `FastRandom_randrange1000000`
| 13.08 | 76,440,558.82 | 0.2% | 134.02 | 48.23 | 2.779 | 12.42 | 0.7% | 0.01 | `FastRandom_stdshuffle100`
| 7.55 | 132,409,503.40 | 0.9% | 52.83 | 27.85 | 1.897 | 10.39 | 6.3% | 0.01 | `InsecureRandom_Shuffle100`
| 1.70 | 589,139,570.89 | 0.2% | 19.50 | 6.25 | 3.118 | 1.50 | 0.0% | 0.01 | `InsecureRandom_rand32`
| 1.11 | 901,858,137.81 | 0.0% | 7.00 | 4.09 | 1.711 | 0.00 | 0.0% | 0.01 | `InsecureRandom_rand64`
| 2.10 | 475,829,327.28 | 0.4% | 20.89 | 7.75 | 2.697 | 2.48 | 0.0% | 0.01 | `InsecureRandom_randbits`
| 1.40 | 714,972,427.11 | 0.1% | 12.25 | 5.16 | 2.374 | 1.02 | 0.0% | 0.01 | `InsecureRandom_randbool`
| 6.46 | 154,870,239.99 | 0.3% | 39.00 | 23.80 | 1.639 | 8.14 | 7.7% | 0.01 | `InsecureRandom_randrange100`
| 7.08 | 141,237,665.66 | 0.4% | 39.80 | 26.10 | 1.525 | 7.97 | 7.9% | 0.01 | `InsecureRandom_randrange1000`
| 6.49 | 154,065,899.22 | 0.6% | 43.93 | 23.80 | 1.846 | 7.80 | 7.0% | 0.07 | `InsecureRandom_randrange1000000`
| 3.61 | 276,805,356.11 | 0.1% | 34.38 | 13.33 | 2.580 | 3.98 | 0.3% | 0.01 | `InsecureRandom_stdshuffle100`FastRandom - std::shuffle slower by ~2%.
InsecureRandom - std::shuffle is ~2.1x faster, the case for migrating is strong here.
GCC equivalent
$ src/bench/bench_bitcoin --filter=".*Random.*"
| ns/number | number/s | err% | ins/number | cyc/number | IPC | bra/number | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 10.90 | 91,706,416.37 | 0.4% | 73.17 | 40.22 | 1.819 | 8.91 | 7.6% | 0.01 | `FastRandom_Shuffle100`
| 10.29 | 97,154,304.49 | 0.1% | 115.31 | 37.96 | 3.037 | 8.94 | 0.0% | 0.01 | `FastRandom_rand32`
| 18.97 | 52,727,508.94 | 0.4% | 210.63 | 69.79 | 3.018 | 15.88 | 0.0% | 0.01 | `FastRandom_rand64`
| 11.56 | 86,507,143.42 | 0.5% | 122.79 | 42.62 | 2.881 | 10.55 | 0.4% | 0.01 | `FastRandom_randbits`
| 1.68 | 594,607,147.44 | 0.1% | 12.32 | 6.20 | 1.987 | 1.25 | 0.2% | 0.01 | `FastRandom_randbool`
| 9.99 | 100,126,193.36 | 0.4% | 61.84 | 36.83 | 1.679 | 6.91 | 9.1% | 0.01 | `FastRandom_randrange100`
| 11.50 | 86,963,003.45 | 0.3% | 75.50 | 42.40 | 1.781 | 7.89 | 8.1% | 0.01 | `FastRandom_randrange1000`
| 15.36 | 65,104,420.98 | 0.4% | 122.93 | 56.09 | 2.192 | 11.41 | 5.2% | 0.17 | `FastRandom_randrange1000000`
| 14.70 | 68,050,253.74 | 0.0% | 138.64 | 54.21 | 2.558 | 11.02 | 0.1% | 0.01 | `FastRandom_stdshuffle100`
| 8.32 | 120,167,896.55 | 0.4% | 51.58 | 30.66 | 1.682 | 7.84 | 8.3% | 0.01 | `InsecureRandom_Shuffle100`
| 1.09 | 920,716,470.48 | 0.0% | 13.00 | 4.01 | 3.244 | 1.00 | 0.0% | 0.01 | `InsecureRandom_rand32`
| 0.88 | 1,133,795,846.45 | 0.1% | 7.00 | 3.25 | 2.153 | 0.00 | 57.0% | 0.01 | `InsecureRandom_rand64`
| 1.66 | 601,775,499.66 | 1.2% | 19.86 | 6.11 | 3.251 | 1.98 | 0.0% | 0.01 | `InsecureRandom_randbits`
| 0.56 | 1,771,130,575.63 | 0.1% | 4.28 | 2.08 | 2.057 | 1.00 | 0.0% | 0.01 | `InsecureRandom_randbool`
| 6.97 | 143,428,585.12 | 0.2% | 30.93 | 25.70 | 1.203 | 5.84 | 10.5% | 0.01 | `InsecureRandom_randrange100`
| 7.25 | 137,895,833.08 | 0.2% | 32.18 | 26.75 | 1.203 | 5.78 | 10.6% | 0.01 | `InsecureRandom_randrange1000`
| 7.37 | 135,732,504.89 | 0.4% | 37.62 | 26.87 | 1.400 | 5.81 | 9.2% | 0.08 | `InsecureRandom_randrange1000000`
| 5.81 | 172,220,492.06 | 0.1% | 33.82 | 21.42 | 1.579 | 3.09 | 0.3% | 0.01 | `InsecureRandom_stdshuffle100`FastRandom - std::shuffle shows a performance loss of 26%. :/
InsecureRandom - std::shuffle wins with ~1.4x speed multiplier.
(Side note: std::shuffle seems much better than Shuffle when it comes to branch misses in this configuration (already significantly better in Clang)).
GCC in debug mode
$ src/bench/bench_bitcoin --filter=".*Random.*"
Warning, results might be unstable:
* DEBUG defined
Recommendations
* Make sure you compile for Release
| ns/number | number/s | err% | ins/number | cyc/number | IPC | bra/number | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 96.61 | 10,350,962.32 | 0.4% | 844.51 | 356.24 | 2.371 | 124.47 | 0.5% | 0.01 | `FastRandom_Shuffle100`
| 182.54 | 5,478,307.47 | 0.0% | 1,703.98 | 673.49 | 2.530 | 204.73 | 0.1% | 0.01 | `FastRandom_rand32`
| 342.44 | 2,920,180.80 | 0.0% | 3,197.03 | 1,263.38 | 2.531 | 370.47 | 0.1% | 0.01 | `FastRandom_rand64`
| 192.05 | 5,207,050.93 | 0.1% | 1,756.26 | 708.80 | 2.478 | 213.54 | 0.3% | 0.01 | `FastRandom_randbits`
| 16.01 | 62,471,637.02 | 0.1% | 148.23 | 59.00 | 2.513 | 23.84 | 0.1% | 0.01 | `FastRandom_randbool`
| 87.24 | 11,461,986.84 | 0.6% | 758.66 | 321.66 | 2.359 | 108.08 | 0.6% | 0.01 | `FastRandom_randrange100`
| 110.16 | 9,077,576.97 | 0.5% | 965.24 | 405.83 | 2.378 | 132.06 | 0.5% | 0.01 | `FastRandom_randrange1000`
| 186.39 | 5,365,223.88 | 0.1% | 1,676.54 | 687.82 | 2.437 | 215.45 | 0.3% | 2.05 | `FastRandom_randrange1000000`
| 215.54 | 4,639,483.16 | 0.2% | 1,906.59 | 794.16 | 2.401 | 229.95 | 0.1% | 0.01 | `FastRandom_stdshuffle100`
| 53.91 | 18,549,333.04 | 0.3% | 457.13 | 198.59 | 2.302 | 79.37 | 0.8% | 0.01 | `InsecureRandom_Shuffle100`
| 22.03 | 45,393,295.79 | 0.1% | 215.50 | 81.27 | 2.652 | 31.50 | 0.0% | 0.01 | `InsecureRandom_rand32`
| 22.72 | 44,019,773.42 | 0.0% | 220.00 | 83.84 | 2.624 | 24.00 | 0.0% | 0.01 | `InsecureRandom_rand64`
| 27.54 | 36,307,742.75 | 0.6% | 244.77 | 101.11 | 2.421 | 37.63 | 1.0% | 0.01 | `InsecureRandom_randbits`
| 10.95 | 91,358,394.49 | 0.1% | 101.70 | 40.40 | 2.518 | 18.42 | 0.1% | 0.01 | `InsecureRandom_randbool`
| 44.06 | 22,693,987.66 | 0.3% | 371.40 | 162.65 | 2.283 | 63.01 | 0.9% | 0.01 | `InsecureRandom_randrange100`
| 45.33 | 22,061,595.98 | 0.2% | 382.82 | 167.30 | 2.288 | 64.26 | 0.9% | 0.01 | `InsecureRandom_randrange1000`
| 50.17 | 19,933,844.35 | 0.2% | 438.63 | 184.96 | 2.372 | 71.38 | 0.8% | 0.55 | `InsecureRandom_randrange1000000`
| 52.96 | 18,880,998.91 | 0.2% | 418.53 | 195.25 | 2.144 | 56.76 | 0.0% | 0.01 | `InsecureRandom_stdshuffle100`FastRandom - std::shuffle is less than half the speed. :/ A lot more instructions & branches.
InsecureRandom - Negligible difference.
Conclusion so far
InsecureRandom is only used in tests, so this would probably be a win for developers/testers/fuzzing but not really for end users. Maybe other systems/configurations show a clearer win.
Also ran make check & test/functional/test_runner.py on 6ecda04 in GCC debug without any failures. Confirming that the self-assign issue with std::shuffle doesn't appear to be an issue. (Haven't tried to create a configuration that triggers the panic though).
|
@hodlinator FWIW, my motivation here is just FastRandomContext non-debug performance. I did notice that I do expect InsecureRandomContext will get used for non-testing purposes eventually, but that's not a huge motivation now. To explain what we're seeing here, worked out for a shuffle of 100 elements:
In FastRandomContext, the cost of a If we really wanted, it would be possible to use the technique that For posterity, this is sufficient to make 32-bit +
+ template<std::integral I>
+ I randrange(I range) noexcept
+ {
+ if constexpr (std::numeric_limits<I>::digits <= 32) {
+ uint32_t x = rand32();
+ uint64_t m = uint64_t{x} * uint32_t(range);
+ uint32_t l = uint32_t(m);
+ if (l < uint32_t(range)) {
+ uint32_t t = (0 - range) % range;
+ while (l < t) {
+ x = rand32();
+ m = uint64_t{x} * uint32_t(range);
+ l = uint32_t(m);
+ }
+ }
+ return m >> 32;
+ }
+ return RandomMixin<InsecureRandomContext>::randrange(range);
+ } |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 6ecda04
Thanks for taking the time to add valuable context!!
(I read up a bit, understand you implemented a 32-bit version of Daniel Lemire's nearly divisionless algo. Nice).
I did notice that
std::shuffleis a bit slower than the custom Shuffle on my system too (AMD 5950X, GCC 13.2)
Maybe worth adding some nuance to the PR summary which currently says "now that it appears that standard library std::shuffle beats it.".
# rand64() invocations
Shufflewould userandrange100 times, which for the ranges involved would invokerand64on average ~13.005 times
Agreed, verified with added instrumentation (rough reasoning through: range of 100 > 0b1100100 > 7 binary digits wide. 64 bits / 7 bits => ~9 randbits() calls per rand64()... with some extra randbits() because of if (ret <= maxval) in randrange() failing => 100 / 9 + extra = ~13).
std::shufflewould invokerand64~100 times directly.
nit: With the added instrumentation my recent testing shows it to be =50 times - as the ranges are small enough std::shuffle() will perform 2 swaps for each rand64(). This goes for both Clang & GCC.
As there is no use case for efficient
InsecureRandomContext::randrangeright now that doesn't seem worth it, but that can change.
Seems like it could make fuzz-tests more efficient even now and hence free up resources for more test coverage. But agree it's probably minor.
(Bigger ranges converge)
I'm also on GCC 13.2. CPU: Intel 9th gen Coffee Lake. Did some test with bigger ranges and performance seems to converge between Shuffle & std::shuffle for FastRandom.
| ns/number | number/s | err% | ins/number | cyc/number | IPC | bra/number | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 12.01 | 83,247,932.44 | 0.4% | 86.95 | 44.32 | 1.962 | 9.88 | 6.8% | 0.01 | `FastRandom_Shuffle1000`
| 14.61 | 68,427,974.21 | 0.1% | 138.35 | 53.91 | 2.566 | 10.95 | 0.0% | 0.01 | `FastRandom_stdshuffle1000`
| 9.65 | 103,602,289.99 | 0.3% | 52.55 | 35.61 | 1.475 | 7.78 | 7.9% | 0.01 | `InsecureRandom_Shuffle1000`
| 5.67 | 176,274,334.46 | 0.0% | 33.53 | 20.92 | 1.603 | 3.01 | 0.0% | 0.01 | `InsecureRandom_stdshuffle1000`
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 13.99 | 71,471,645.86 | 0.4% | 107.03 | 51.56 | 2.076 | 11.44 | 6.2% | 0.01 | `FastRandom_Shuffle10000`
| 14.68 | 68,116,488.28 | 0.1% | 138.32 | 54.15 | 2.554 | 10.94 | 0.0% | 0.01 | `FastRandom_stdshuffle10000`
| 8.60 | 116,299,489.79 | 0.1% | 55.08 | 31.70 | 1.738 | 7.92 | 8.3% | 0.01 | `InsecureRandom_Shuffle10000`
| 5.66 | 176,764,575.63 | 0.1% | 33.50 | 20.87 | 1.606 | 3.00 | 0.0% | 0.01 | `InsecureRandom_stdshuffle10000`
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 15.24 | 65,617,271.52 | 0.3% | 123.32 | 56.05 | 2.200 | 12.64 | 5.0% | 0.02 | `FastRandom_Shuffle100000`
| 15.44 | 64,762,393.26 | 0.1% | 138.31 | 56.89 | 2.431 | 10.94 | 0.0% | 0.02 | `FastRandom_stdshuffle100000`
| 8.41 | 118,884,436.01 | 0.3% | 56.50 | 30.95 | 1.826 | 7.93 | 7.4% | 0.01 | `InsecureRandom_Shuffle100000`
| 5.79 | 172,767,668.95 | 0.0% | 33.50 | 21.35 | 1.569 | 3.00 | 0.0% | 0.01 | `InsecureRandom_stdshuffle100000`
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 16.44 | 60,831,272.72 | 0.4% | 134.57 | 60.60 | 2.220 | 13.41 | 4.4% | 0.18 | `FastRandom_Shuffle1000000`
| 16.49 | 60,636,239.08 | 0.0% | 138.31 | 60.82 | 2.274 | 10.94 | 0.0% | 0.18 | `FastRandom_stdshuffle1000000`
| 8.41 | 118,931,462.65 | 0.2% | 56.74 | 31.00 | 1.830 | 7.81 | 6.9% | 0.09 | `InsecureRandom_Shuffle1000000`
| 6.23 | 160,482,706.30 | 0.6% | 33.50 | 22.98 | 1.458 | 3.00 | 0.0% | 0.07 | `InsecureRandom_stdshuffle1000000`
dergoegge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 6ecda04
|
ACK 6ecda04 |
|
For fuzzing, just to keep it in mind, there is (obviously) no guarantee by the standard as to how
So going forward, I guess one can only assume to surely reproduce a fuzz crash by using the exact same stdlib (and compiler). For context there is also a compiler behavior mismatch discussed in #29043. |
|
post merge ACK. My local bench results had comparable performance as well |
This is a strong counter-point. I guess other stdlib-differences like |
|
I don't think there are any It may make sense to add something like the old |
I hope all code touched in this pull request is covered by fuzz tests. If not, then that should be fixed ;) For example, I don't have a strong feeling either way, because if an effort is made to have uniform fuzz behavior across compilers and stdlibs, it will probably be too hard. |
|
Ported to the CMake-based build system in hebasto#264. |
Hmm, for some reason I only considered calls to |
This adds benchmarks for various operations on
FastRandomContextandInsecureRandomContext, and then removes the ad-hocShufflefunctions, now that it appears that standard librarystd::shufflehas comparable performance. The other reason for keepingShuffle, namely the fact that libstdc++ used self-move (which debug mode panics on) has been fixed as well (see #29625 (comment)).