-
Notifications
You must be signed in to change notification settings - Fork 38.8k
9% less memory: make SaltedOutpointHasher noexcept #16957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
9% less memory: make SaltedOutpointHasher noexcept #16957
Conversation
|
Concept ACK. Very exciting. Will start some benchmarks tonight. |
|
Concept ACK, good found |
|
Concept ACK |
|
Wow, awesome. ACK |
|
Little big PR! Concept ACK. |
|
Is it slower? |
|
Yes it seems to be slower, by 1.6%. But I'm not sure how much of this is random test variation. I think the lower memory usage more than offsets this: If you increase dbcache by about 9% memory usage will be about the same as before but I'm pretty sure runtime will be faster because of better caching |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
I did another run with the change and
This time synchronization was about 5% faster than master and still used 5% less memory than master. Here is a graph with all the results: |
|
ACK 9538fce097a4acbb669e94770bce4ab8bff62031 -- diff looks correct and Nice find! I believe there are similar opportunities waiting to be found: we underuse Somewhat related previous discussions about the pros/cons of
|
|
ACK 9538fce097a4acbb669e94770bce4ab8bff62031, this is trivially correct, had no idea that no_except could make so much of a concrete difference in generated code. |
|
Nice! |
|
First bench run coincides with @martinus' findings: |
Curious if this applies to libc++ as well as libstdc++ |
|
I'd prefer if the doxygen comment was extended a bit on the effects of the noexcept keyword. Maybe throw some links in there (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/libstdc++/manual/manual/unordered_associative.html#containers.unordered.cache) ? |
@sdaftuar and I have tried reading through the libc++ implementation of hash tables to find the equivalent caching clause to what's in libstdc++, but so far we haven't been able to find it. Does anyone know where this happens in libc++ (clang)? I'll report back soon with clang-based benches. |
Ah, I didn't know this was documented, thanks for the link. I've added a comment |
If the hash is not noexcept, unorderd_map has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding noexcept prevents this caching. In my experiments with -reindex-chainstate -stopatheight=594000, memory usage has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate.
e4a6668 to
67d9990
Compare
|
Tested ACK 67d9990 Clang results are in and they're consistent with what we've seen so far (benched IBD 500,000-535,000 at dbcache=4000). 2019-09-SaltedOutpointHasher-noexcept vs. master (absolute)
2019-09-SaltedOutpointHasher-noexcept vs. master (relative)
|
|
We should also bump the default |
I think that's a good idea. I'll rerun a bench comparing master:-dbcache=450 (default) and martinus:-dbcache=500 to ensure the latter undershoots the former in terms of memory usage. |
|
It's worth pointing out that when memory is not a limiting factor (e.g. a miner sets I still think this change is worthwhile because using |
|
I'm somewhat confused by the "increase dbcache for equivalent performance" thing - shouldn't DynamicUsage(const std::unordered_map<X, Y, Z>& m) take into account the new slightly lower memory usage and thus increase the capacity of the map? Is there some way to get the performance back with some reasonable pre-allocation of the map? |
Nah unfortunately I don't think so - because our memusage estimator optimistically assumes that each node in the hashmap consumes only as much memory as it takes to store the pair of key and value, plus each bucket pointer.
I've tried this with a call to |
The problem was that DynamicMemUsage was wrong before, it underestimated memory usage. Now it's more correct. I think it might be possible to regain the performance loss by caching the hash ourselves. We have some padding in the maps' key, and it might be possible to use these bytes to store the hash value (or at least 4 bytes of it) without having to increase the memory. |
|
Ah, I’m too tired and misread the dropping of Z there. I suppose maybe it makes sense to fix it (or note in the release notes that people may want to slightly increase their dbcache).
… On Sep 26, 2019, at 09:17, jamesob ***@***.***> wrote:
shouldn't DynamicUsage(const std::unordered_map<X, Y, Z>& m) take into account the new slightly lower memory usage and thus increase the capacity of the map?
Nah unfortunately I don't think so - because our memusage estimator optimistically assumes that each node in the hashmap consumes only as much memory as it takes to store the pair of key and value, plus each bucket pointer.
Is there some way to get the performance back with some reasonable pre-allocation of the map?
I've tried this with a call to cacheCoins.reserve(...) in init and didn't see much of a difference, but might be worth investigating again.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
Interestingly, the difference isn't pronounced near default dbcache values. This bench compares master at dbcache=450 (default) to this branch at dbcache=500. Master's runtime is 1.042x longer with this configuration. Maybe the runtime difference isn't as pronounced since this is only for a relatively small subset of the chain (35,000 blocks). In any case, it at least shows that a new dbcache default of 500 with this change still consumes less memory than master with the current default. |
67d9990 make SaltedOutpointHasher noexcept (Martin Ankerl) Pull request description: If the hash is not `noexcept`, `unorderd_map` has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding `noexcept` prevents this caching. In my experiments with `-reindex-chainstate -stopatheight=594000`, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate. | | runtime h:mm:ss | max RSS kbyte | |---------------------------------------|-----------------|--------------| | master | 4:13:59 | 7696728 | | 2019-09-SaltedOutpointHasher-noexcept | 4:18:11 | 6971412 | | change | +1.65% | -9,42% | Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept  ACKs for top commit: jamesob: Tested ACK 67d9990 Tree-SHA512: 9c44e3cca993b5a564dd61ebd2926b9c4a238609ea4d283514c018236f977d935e35a384dd4696486fd3d78781dd2ba190bb72596e20a5e931042fa465872a0b
|
It seems safe to merge this for 0.19. It's clear that there are still opportunities left for optimization in and around the UTXO hash data structure, let's look further at that for 0.20. |
|
Does this need release notes with potential action items for node operators and/or the default bumped? |
|
I'd say the default should be bumped from 450 to 500, and Node operators can increase dbcache by 10% without needing more RAM than before |
67d9990 make SaltedOutpointHasher noexcept (Martin Ankerl) Pull request description: If the hash is not `noexcept`, `unorderd_map` has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding `noexcept` prevents this caching. In my experiments with `-reindex-chainstate -stopatheight=594000`, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate. | | runtime h:mm:ss | max RSS kbyte | |---------------------------------------|-----------------|--------------| | master | 4:13:59 | 7696728 | | 2019-09-SaltedOutpointHasher-noexcept | 4:18:11 | 6971412 | | change | +1.65% | -9,42% | Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept  ACKs for top commit: jamesob: Tested ACK bitcoin@67d9990 Tree-SHA512: 9c44e3cca993b5a564dd61ebd2926b9c4a238609ea4d283514c018236f977d935e35a384dd4696486fd3d78781dd2ba190bb72596e20a5e931042fa465872a0b
Summary: 67d99900b0d770038c9c5708553143137b124a6c make SaltedOutpointHasher noexcept (Martin Ankerl) Pull request description: If the hash is not `noexcept`, `unorderd_map` has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding `noexcept` prevents this caching. In my experiments with `-reindex-chainstate -stopatheight=594000`, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate. | | runtime h:mm:ss | max RSS kbyte | |---------------------------------------|-----------------|--------------| | master | 4:13:59 | 7696728 | | 2019-09-SaltedOutpointHasher-noexcept | 4:18:11 | 6971412 | | change | +1.65% | -9,42% | Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept  Backport of Core [[bitcoin/bitcoin#16957 | PR16957]] Release notes were backported from Core [[bitcoin/bitcoin#17422 | PR17422]] Test Plan: `ninja check check-functional` for sanity Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D7996
67d9990 make SaltedOutpointHasher noexcept (Martin Ankerl) Pull request description: If the hash is not `noexcept`, `unorderd_map` has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding `noexcept` prevents this caching. In my experiments with `-reindex-chainstate -stopatheight=594000`, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate. | | runtime h:mm:ss | max RSS kbyte | |---------------------------------------|-----------------|--------------| | master | 4:13:59 | 7696728 | | 2019-09-SaltedOutpointHasher-noexcept | 4:18:11 | 6971412 | | change | +1.65% | -9,42% | Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept  ACKs for top commit: jamesob: Tested ACK bitcoin@67d9990 Tree-SHA512: 9c44e3cca993b5a564dd61ebd2926b9c4a238609ea4d283514c018236f977d935e35a384dd4696486fd3d78781dd2ba190bb72596e20a5e931042fa465872a0b
67d9990 make SaltedOutpointHasher noexcept (Martin Ankerl) Pull request description: If the hash is not `noexcept`, `unorderd_map` has to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Adding `noexcept` prevents this caching. In my experiments with `-reindex-chainstate -stopatheight=594000`, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate. | | runtime h:mm:ss | max RSS kbyte | |---------------------------------------|-----------------|--------------| | master | 4:13:59 | 7696728 | | 2019-09-SaltedOutpointHasher-noexcept | 4:18:11 | 6971412 | | change | +1.65% | -9,42% | Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept  ACKs for top commit: jamesob: Tested ACK bitcoin@67d9990 Tree-SHA512: 9c44e3cca993b5a564dd61ebd2926b9c4a238609ea4d283514c018236f977d935e35a384dd4696486fd3d78781dd2ba190bb72596e20a5e931042fa465872a0b



If the hash is not
noexcept,unorderd_maphas to assume that it can throw an exception. Thus when rehashing care needs to be taken. libstdc++ solves this by simply caching the hash value, which increases memory of each node by 8 bytes. Addingnoexceptprevents this caching. In my experiments with-reindex-chainstate -stopatheight=594000, memory usage (maximum resident set size) has decreased by 9.4% while runtime has increased by 1.6% due to additional hashing. Additionally, memusage::DynamicUsage() is now more accurate and does not underestimate.Comparison of progress masters vs. 2019-09-SaltedOutpointHasher-noexcept
