-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Switch BlockMap to use an unordered_set under the hood #19677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/validation.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this kosher 100% of the time as a replacement for nullptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix itself in the next rebase?
c61c74e to
e3d661e
Compare
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
fe727c3 to
c80841e
Compare
c80841e to
99a1274
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK!
This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex
Indeed.
99a1274 to
0aa7e94
Compare
|
Thanks for the review! Converted GetBlockHash to be a const ref, good suggestion! I've removed the WIP status here as I eliminated the earlier bug. |
0aa7e94 to
714aea5
Compare
714aea5 to
fdc2f15
Compare
|
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
|
if there is a concept ACK from someone who would merge it on this approach I will rebase it, but since it touches a lot of places I don't have time to rebase it continually. |
|
Yeah, it is non-trivial to review so hard to tell. I wonder if this tiny layers can be peeled off of this or if this needs to be one chunk. Maybe the |
|
hmmm... it's possible? but why? I see two split-up paths: Make phashblock m_hash_block and keep the map structure firstThis uses 24 bytes more memory at first, but doesn't change the lookup data structure. Next, we could change the data structure to an indirect_map type and change the key to a pointer. Then, we could change to a set. Make the data structure a set first and keep phashblockThis changes the data structure, and changes all the lookup sites. No extra storage, no savings. Uses of phashblock remain consistent, although it's just a self-pointer. phashblock could become an inline method too, so you don't change the memory usage, but do need to add (). Eventually switch to using m_hash_block. |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
|
Is this still relevant after #24050? |
|
yes, it's completely orthogonal to #24050 |
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. I started making the same change after reviewing #24050 and #24199. As noted in the PR description, C++20 will allow a simpler implementation of this because it adds unordered_set find and count method overloads that can accept block hashes, instead of CBlockIndex pointers (heterogenous lookups, https://www.cppstories.com/2021/heterogeneous-access-cpp20/). But since the find method is mostly wrapped anyway and not called directly, this is not a big deal.
| size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); } | ||
| // Helper for querying by hash | ||
| // e.g., map.find(map.hash_function()(h)) | ||
| CBlockIndex* operator()(const uint256& hash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Refactor BlockMap to use an unordered_set instead of an unordered_map" (fdc2f15)
I was confused by the idx.hash_function()(hash) stuff initially, and think it could be clearer if you gave this method a different name like FakeBlockIndex instead of operator(). idx.hash_function().FakeBlockIndex(hash) would give a clue that the method is returning a fake block index pointer, not taking the hash of a hash, and also avoid overloading the same operator in two ways that do different things conceptually.
Also I guess you are using a shared mock CBlockIndex instance rather than creating temporary CBlockIndex objects for performance reasons, but it it would be interesting to know if taking a more straightforward with temporary mock objects would actually hurt performance:
struct FakeBlockIndex : public CBlockIndex
{
FakeBlockIndex(const uint256& hash) { m_hash_block = hash; }
FakeBlockIndex* pointer() { return this; }
};Since it would allow simplifying:
- BlockMap::const_iterator it = m_blockman.m_block_index.find(m_blockman.m_block_index.hash_function()(hashAssumeValid));
+ BlockMap::const_iterator it = m_blockman.m_block_index.find(FakeBlockIndex(hashAssumeValid).pointer());and:
struct BlockHasher
{
// this used to call `GetCheapHash()` in uint256, which was later moved; the
// cheap hash function simply calls ReadLE64() however, so the end result is
// identical
- mutable CBlockIndex mock;
- BlockHasher() : mock() {};
size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); }
- // Helper for querying by hash
- // e.g., map.find(map.hash_function()(h))
- CBlockIndex* operator()(const uint256& hash) {
- mock.m_hash_block = hash;
- return &mock;
- }
};There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would since CBlockIndex is a very big struct?
but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would since CBlockIndex is a very big struct?
but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere...
I would probably not do the abstract base thing, since the reason for my suggestion was to simplify code, and trying to overload find that way would seem to add more complexity. Current approach here does seems fine. If dropping the hasher method would be bad for performance, I'd just rename it from operator() to FakeBlockIndex or something for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i probably won't have time to work on this until mid april FYI, but i can pick it up sometime then.
if you'd like to pick it off and rebase it and push it through i would review it + ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. I'm working on some other CBlockIndex changes so might want to do this. No hurry though!
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
|
Closing as this has needed rebase for more than 2 years. Feel free to reopen if you get a chance to work on this again in the future, thanks! |
|
Mark up for grabs? |
Currently we use an
unordered_map<uint256, CBlockIndex*>for our BlockMap. This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex. It would be conceptually simpler if we used aunordered_map<CBlockIndex>and modifiedCBlockIndexto directly own the uint256 for the phashBlock.That's what this patch attempts doing. The only additional complexity is where we do want to query the map by key, the key equivalence stuff is only in C++20, so we add a helper that lets us construct a mock element easily for using with find.
The result is nicer since we get rid of an additional indirection for accessing the phashblock and we get rid of a lot of std::pair de-structuring. We also save a bit of memory per index (I haven't computed it precisely, but my guess is we save something like 24 bytes total per index by eliminating 1 pointer, which saves 8 bytes, and then could save us something like 16 bytes of padding). We also no longer can be in an inconsistent state where phashblock does not point to the entries own hash (although modifying phashblock once it is inserted into the map would be invalid -- it must be removed, modified, and reinserted).
Future work can likely further improve on this by making CBlockIndex owned* by the unordered_set directly, which will eliminate even more indirection/pointer chasing and make a lot of the code using BlockMap simpler (I don't think we ever rely on CBlockIndex being non-owned anywhere, nor should we have reason to in the future).