-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Make prevector::resize() and other prevector operations much faster #12549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Nice speedup! Concept ACK |
|
This exists in #11988 no? |
|
It looks like this branch is failing because This isn't quite the same as #11988 because that still uses placement new. |
|
If it's not hugely different consider reviewing #11988 and suggesting changes to make it better instead. |
|
That's a good idea, I'll see if I can cherry-pick @AkioNak's commit into my branch while I'm testing the GCC 4.8 compatibility issues. |
|
This new branch cherry-picks @AkioNak 's commit from #11988 and then adds my memset optimization. I constructed this PR such that it has three commits:
I have a script that benchmarks the performance for each of the three commits. Here's the summarized data. The first commit is considered to have a baseline speedup of 1x, and then each of the subsequent commits shows the speedup from the base: Here's the raw data I generated the above from: |
src/prevector.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use ptrdiff_t here and in fill? Would allow you to assert against the negative case rather than have a very large value.
Not that the negative case is possible with the current code, but it could be protective going forward.
src/bench/prevector.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think diganostic message can not be omitted on c++11.
src/bench/prevector.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think diganostic message can not be omitted on c++11.
|
Ping @sipa who wrote this code. |
3bc0cad to
cb1dad1
Compare
|
I've addressed the feedback I've gotten so far, and expanded the other benchmarks to test both trivial and non-trivial types. One thing that I also found interesting while doing this change was the way the current code is written to not use SFINAE. On GCC 7.3 (what I have locally) the compiler is smart enough to do the type checks at compile time and not generate runtime checks, meaning it generates optimal code (i.e. the same code it would generate using SFINAE). On GCC 4.8 it's smart enough to do that for trivial types, but does not generate optimal code in the nontrivial path. This is probably fine because the main use of prevector is to hold |
|
I believe your first commit won't compile on GCC 4.8. Also, if you're going to use IS_TRIVIALLY_CONSTRUCTIBLE outside of prevector.h, perhaps it's better to move it elsewhere (like util.h or compat.h)? |
|
Good catch, I only tested the last commit. I'll rebase this branch so all of the commits build cleanly, and move the macro definition. |
This prepares for a series of two additional commits which optimize prevector performance.
In prevector.h, the code which like item_ptr(size()) apears in the loop. Both item_ptr() and size() judge whether values are held directly or indirectly, but in most cases it is sufficient to make that judgement once outside the loop. This PR adds 2 private function fill() which has the loop to initialize by specified value (or iterator of the other prevector's element), but don't call item_ptr() in their loop. Other functions(assign(), constructor, operator=(), insert()) that has similar loop, call fill() instead of original loop. Also, resize() was changed like fill(), but it calls the default constructor for that element each time.
|
Here are the latest results, tested under both GCC 7.3 and GCC 4.8. For both compilers I ran the full test suite and benchmarks for each of the three commits in this PR. GCC 4.8GCC 7.3It's interesting that GCC 4.8 generates better code in some cases. Anyway, hopefully this is enough data to convince people that this is a good change. |
|
utACK This is a good optimization @AkioNak / @eklitzke! I didn't have a hack for getting cross platform support for trivial constructors when I wrote #9505, so glad to see trivial types handled more optimally now (noting this to clarify that there was no other reason not to do this that I came across in #9505) I would only nit that you should amend the line: to make it clear you are only referring to the Benchmark, not to the actual destructor (there should be no performance change under this patchset, and it would be good for you to demonstrate that -- ironically it means I probably under-reported the performance advantage when I added the trivial_destructor commit, because I counted the resizes). |
src/compat.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this redundant with the same in prevector.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, pushing a rebase that fixes this.
Further optimize prevector::resize() (which is called by a number of other prevector methods) to use memset to initialize memory when the prevector contains trivial types.
|
Updated profile for |
|
@sipa please never mind. |
|
utACK 5aad635 |
… much faster 5aad635 Use memset() to optimize prevector::resize() (Evan Klitzke) e46be25 Reduce redundant code of prevector and speed it up (Akio Nakamura) f0e7aa7 Add new prevector benchmarks. (Evan Klitzke) Pull request description: This branch optimizes various `prevector` operations, especially resizing vectors. While profiling the `loadblk` thread I noticed that a lot of time was being spent in `prevector::resize()` which led to this work. I have some data here indicating that it takes up **37%** of the time in `ReadBlockFromDisk()`: https://monad.io/readblockfromdisk.svg This branch improves things significantly. For trivial types, the new results for the prevector benchmark are: * `PrevectorClearTrivial` which tests `prevector::clear()` becomes 24.6x faster * `PrevectorDestructorTrivial` which tests `prevector::~prevector()` becomes 20.5x faster * `PrevectorResizeTrivial` which tests `prevector::resize()` becomes 20.3x faster Note that in practice it looks like the prevector is only used to contain `unsigned char` types, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data for `ReadBlockFromDisk()` I linked to above. The pull request here consists of a series of three commits: * The first adds new benchmarks but does not change the prevector code. * The second is from @AkioNak , and merges some prevector optimizations he submitted in #11988 * The third optimizes `prevector::resize()` to use `memset()` when the prevector contains trivially constructible types Tree-SHA512: 28f7cbb91a19f9f43b6a5942781d7eb2e3197389186b666f086b69df12bee37773140f765426d715bfb8ebff79cb27a5f1206d0325b54b4aa65598b50fb18368
|
I just discovered this causes a warning in GCC 8, which added a new If GCC 8 is released without fixing this we can add an explicit |
…rations much faster 5aad635 Use memset() to optimize prevector::resize() (Evan Klitzke) e46be25 Reduce redundant code of prevector and speed it up (Akio Nakamura) f0e7aa7 Add new prevector benchmarks. (Evan Klitzke) Pull request description: This branch optimizes various `prevector` operations, especially resizing vectors. While profiling the `loadblk` thread I noticed that a lot of time was being spent in `prevector::resize()` which led to this work. I have some data here indicating that it takes up **37%** of the time in `ReadBlockFromDisk()`: https://monad.io/readblockfromdisk.svg This branch improves things significantly. For trivial types, the new results for the prevector benchmark are: * `PrevectorClearTrivial` which tests `prevector::clear()` becomes 24.6x faster * `PrevectorDestructorTrivial` which tests `prevector::~prevector()` becomes 20.5x faster * `PrevectorResizeTrivial` which tests `prevector::resize()` becomes 20.3x faster Note that in practice it looks like the prevector is only used to contain `unsigned char` types, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data for `ReadBlockFromDisk()` I linked to above. The pull request here consists of a series of three commits: * The first adds new benchmarks but does not change the prevector code. * The second is from @AkioNak , and merges some prevector optimizations he submitted in bitcoin#11988 * The third optimizes `prevector::resize()` to use `memset()` when the prevector contains trivially constructible types Tree-SHA512: 28f7cbb91a19f9f43b6a5942781d7eb2e3197389186b666f086b69df12bee37773140f765426d715bfb8ebff79cb27a5f1206d0325b54b4aa65598b50fb18368
d856989 warnings: Compiler warning on memset usage for non-trivial type (Lenny Maiorani) 4cb188e Drop defunct IS_TRIVIALLY_CONSTRUCTIBLE handling from prevector.h (Ben Woosley) c412d5f Reduce redundant code of prevector and speed it up (random-zebra) e6741d0 Add new prevector benchmarks. (random-zebra) Pull request description: Backports from Bitcoin a series of optimizations for `prevector` operations, especially around resizing, which takes a significative portion of the time spent by of `ReadBlockFromDisk()`. Running the benchmark tool before the patch: ``` #Benchmark, count, min, max, avg, minc, maxc, avgc ----------- ------ ---- ---- ---- ----- ----- ---- PrevectorClearTrivial, 3328, 312631, 329431, 320311, 812854, 856536, 832823 PrevectorDestructorTrivial, 3328, 317011, 353344, 325141, 824243, 918711, 845382 PrevectorResizeTrivial, 3328, 321333, 327672, 324713, 835482, 851963, 844269 ``` and after: ``` #Benchmark, count, min, max, avg, minc, maxc, avgc ----------- ------ ---- ---- ---- ----- ----- ---- PrevectorClearTrivial, 65536, 15726, 16368, 16067, 40889, 42559, 41775 PrevectorDestructorTrivial, 65536, 15865, 16416, 16141, 41249, 42684, 41968 PrevectorResizeTrivial, 65536, 16036, 16320, 16213, 41696, 42433, 42156 ``` shows that `prevector::clear()`, `prevector::~prevector()`, and `prevector::resize()` become (on avg) **20X** faster. Based on: - bitcoin#12549 - bitcoin#14651 - bitcoin#14715 ACKs for top commit: Fuzzbawls: utACK d856989 furszy: Quite cool. utACK d856989 Tree-SHA512: 8060458bfd5373448d83a3d905fbc95379cfc1c1aed19b750c98b06a1a6a4ba85a6715865388752a5bd579676e2b2e26d3584ec5b26d51e9ba1380bc4fa73f33
This branch optimizes various
prevectoroperations, especially resizing vectors. While profiling theloadblkthread I noticed that a lot of time was being spent inprevector::resize()which led to this work. I have some data here indicating that it takes up 37% of the time inReadBlockFromDisk(): https://monad.io/readblockfromdisk.svgThis branch improves things significantly. For trivial types, the new results for the prevector benchmark are:
PrevectorClearTrivialwhich testsprevector::clear()becomes 24.6x fasterPrevectorDestructorTrivialwhich testsprevector::~prevector()becomes 20.5x fasterPrevectorResizeTrivialwhich testsprevector::resize()becomes 20.3x fasterNote that in practice it looks like the prevector is only used to contain
unsigned chartypes, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data forReadBlockFromDisk()I linked to above.The pull request here consists of a series of three commits:
prevector::resize()to usememset()when the prevector contains trivially constructible types