Switch BlockMap to use an unordered_set under the hood #19677

JeremyRubin · 2020-08-07T05:22:57Z

Currently we use an unordered_map<uint256, CBlockIndex*> for our BlockMap. This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex. It would be conceptually simpler if we used a unordered_map<CBlockIndex> and modified CBlockIndex to directly own the uint256 for the phashBlock.

That's what this patch attempts doing. The only additional complexity is where we do want to query the map by key, the key equivalence stuff is only in C++20, so we add a helper that lets us construct a mock element easily for using with find.

The result is nicer since we get rid of an additional indirection for accessing the phashblock and we get rid of a lot of std::pair de-structuring. We also save a bit of memory per index (I haven't computed it precisely, but my guess is we save something like 24 bytes total per index by eliminating 1 pointer, which saves 8 bytes, and then could save us something like 16 bytes of padding). We also no longer can be in an inconsistent state where phashblock does not point to the entries own hash (although modifying phashblock once it is inserted into the map would be invalid -- it must be removed, modified, and reinserted).

Future work can likely further improve on this by making CBlockIndex owned* by the unordered_set directly, which will eliminate even more indirection/pointer chasing and make a lot of the code using BlockMap simpler (I don't think we ever rely on CBlockIndex being non-owned anywhere, nor should we have reason to in the future).

see https://en.cppreference.com/w/cpp/container/unordered_set, it will still be safe to use pointers for e.g., pprev.

     References and pointers to data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated.

JeremyRubin · 2020-08-07T05:24:29Z

src/validation.cpp

is this kosher 100% of the time as a replacement for nullptr?

This should fix itself in the next rebase?

DrahtBot · 2020-08-07T07:15:57Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

scripted-diff: Restore AssertLockHeld after #19668, remove LockAssertion #19865 (scripted-diff: Restore AssertLockHeld after Do not hide compile-time thread safety warnings #19668, remove LockAssertion by ryanofsky)
Remove mempool global #19556 (Remove mempool global by MarcoFalke)
Introduce deploymentstatus #19438 (Introduce deploymentstatus by ajtowns)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

promag

Concept ACK!

This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex

Indeed.

src/chain.h

JeremyRubin · 2020-08-10T19:29:26Z

Thanks for the review! Converted GetBlockHash to be a const ref, good suggestion!

I've removed the WIP status here as I eliminated the earlier bug.

src/bench/rpc_blockchain.cpp

src/chain.h

src/test/fuzz/chain.cpp

DrahtBot · 2020-09-07T09:15:09Z

🐙 This pull request conflicts with the target branch and needs rebase.

_{Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".}

DrahtBot · 2021-12-15T11:21:33Z

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

JeremyRubin · 2021-12-15T20:19:12Z

if there is a concept ACK from someone who would merge it on this approach I will rebase it, but since it touches a lot of places I don't have time to rebase it continually.

maflcko · 2021-12-15T20:38:15Z

Yeah, it is non-trivial to review so hard to tell.

I wonder if this tiny layers can be peeled off of this or if this needs to be one chunk. Maybe the phashBlock change can be split out?

JeremyRubin · 2021-12-15T20:45:01Z

hmmm...

it's possible? but why?

I see two split-up paths:

Make phashblock m_hash_block and keep the map structure first

This uses 24 bytes more memory at first, but doesn't change the lookup data structure.

Next, we could change the data structure to an indirect_map type and change the key to a pointer.

Then, we could change to a set.

Make the data structure a set first and keep phashblock

This changes the data structure, and changes all the lookup sites.

No extra storage, no savings.

Uses of phashblock remain consistent, although it's just a self-pointer.

phashblock could become an inline method too, so you don't change the memory usage, but do need to add ().

Eventually switch to using m_hash_block.

DrahtBot · 2022-03-21T13:07:10Z

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

maflcko · 2022-03-21T14:04:15Z

Is this still relevant after #24050?

JeremyRubin · 2022-03-21T14:47:52Z

yes, it's completely orthogonal to #24050

ryanofsky

Concept ACK. I started making the same change after reviewing #24050 and #24199. As noted in the PR description, C++20 will allow a simpler implementation of this because it adds unordered_set find and count method overloads that can accept block hashes, instead of CBlockIndex pointers (heterogenous lookups, https://www.cppstories.com/2021/heterogeneous-access-cpp20/). But since the find method is mostly wrapped anyway and not called directly, this is not a big deal.

ryanofsky · 2022-03-21T19:26:58Z

src/validation.h

+    size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); }
+    // Helper for querying by hash
+    // e.g., map.find(map.hash_function()(h))
+    CBlockIndex* operator()(const uint256& hash) {


In commit "Refactor BlockMap to use an unordered_set instead of an unordered_map" (fdc2f15)

I was confused by the idx.hash_function()(hash) stuff initially, and think it could be clearer if you gave this method a different name like FakeBlockIndex instead of operator(). idx.hash_function().FakeBlockIndex(hash) would give a clue that the method is returning a fake block index pointer, not taking the hash of a hash, and also avoid overloading the same operator in two ways that do different things conceptually.

Also I guess you are using a shared mock CBlockIndex instance rather than creating temporary CBlockIndex objects for performance reasons, but it it would be interesting to know if taking a more straightforward with temporary mock objects would actually hurt performance:

struct FakeBlockIndex : public CBlockIndex { FakeBlockIndex(const uint256& hash) { m_hash_block = hash; } FakeBlockIndex* pointer() { return this; } };

Since it would allow simplifying:

- BlockMap::const_iterator it = m_blockman.m_block_index.find(m_blockman.m_block_index.hash_function()(hashAssumeValid)); + BlockMap::const_iterator it = m_blockman.m_block_index.find(FakeBlockIndex(hashAssumeValid).pointer());

and:

struct BlockHasher { // this used to call `GetCheapHash()` in uint256, which was later moved; the // cheap hash function simply calls ReadLE64() however, so the end result is // identical - mutable CBlockIndex mock; - BlockHasher() : mock() {}; size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); } - // Helper for querying by hash - // e.g., map.find(map.hash_function()(h)) - CBlockIndex* operator()(const uint256& hash) { - mock.m_hash_block = hash; - return &mock; - } };

I think it would since CBlockIndex is a very big struct?

but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere...

I think it would since CBlockIndex is a very big struct?

but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere...

I would probably not do the abstract base thing, since the reason for my suggestion was to simplify code, and trying to overload find that way would seem to add more complexity. Current approach here does seems fine. If dropping the hasher method would be bad for performance, I'd just rename it from operator() to FakeBlockIndex or something for clarity.

i probably won't have time to work on this until mid april FYI, but i can pick it up sometime then.

if you'd like to pick it off and rebase it and push it through i would review it + ack.

Good to know. I'm working on some other CBlockIndex changes so might want to do this. No hurry though!

DrahtBot · 2022-07-25T07:03:36Z

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

glozow · 2022-09-26T10:01:14Z

Closing as this has needed rebase for more than 2 years. Feel free to reopen if you get a chance to work on this again in the future, thanks!

Rspigler · 2022-09-26T22:34:29Z

Mark up for grabs?

fanquake added Refactoring Validation labels Aug 7, 2020

JeremyRubin commented Aug 7, 2020

View reviewed changes

JeremyRubin force-pushed the refactor-blockmap-blockset branch from c61c74e to e3d661e Compare August 7, 2020 06:05

JeremyRubin force-pushed the refactor-blockmap-blockset branch 3 times, most recently from fe727c3 to c80841e Compare August 7, 2020 10:27

This was referenced Aug 7, 2020

Introduce deploymentstatus #19438

Merged

assumeutxo #15606

Closed

JeremyRubin force-pushed the refactor-blockmap-blockset branch from c80841e to 99a1274 Compare August 7, 2020 17:12

promag reviewed Aug 10, 2020

View reviewed changes

src/chain.h Outdated Show resolved Hide resolved

JeremyRubin force-pushed the refactor-blockmap-blockset branch from 99a1274 to 0aa7e94 Compare August 10, 2020 17:20

JeremyRubin changed the title ~~WIP: Switch BlockMap to use an unordered_set under the hood~~ Switch BlockMap to use an unordered_set under the hood Aug 10, 2020

promag reviewed Aug 11, 2020

View reviewed changes

src/bench/rpc_blockchain.cpp Outdated Show resolved Hide resolved

src/chain.h Outdated Show resolved Hide resolved

src/chain.h Outdated Show resolved Hide resolved

src/test/fuzz/chain.cpp Outdated Show resolved Hide resolved

src/test/fuzz/chain.cpp Outdated Show resolved Hide resolved

JeremyRubin force-pushed the refactor-blockmap-blockset branch from 0aa7e94 to 714aea5 Compare August 11, 2020 17:04

Refactor BlockMap to use an unordered_set instead of an unordered_map

fdc2f15

JeremyRubin force-pushed the refactor-blockmap-blockset branch from 714aea5 to fdc2f15 Compare August 11, 2020 17:05

This was referenced Aug 31, 2020

Remove mempool global #19556

Merged

scripted-diff: Restore AssertLockHeld after #19668, remove LockAssertion #19865

Closed

DrahtBot added the Needs rebase label Sep 7, 2020

maflcko mentioned this pull request Jan 3, 2022

Discussion: Upgrading to C++20 #23363

Closed

ryanofsky reviewed Mar 21, 2022

View reviewed changes

glozow closed this Sep 26, 2022

glozow added the Up for grabs label Sep 27, 2022

bitcoin locked and limited conversation to collaborators Sep 27, 2023

Switch BlockMap to use an unordered_set under the hood #19677

Switch BlockMap to use an unordered_set under the hood #19677

Uh oh!

Conversation

JeremyRubin commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JeremyRubin Aug 7, 2020

Choose a reason for hiding this comment

Uh oh!

maflcko Mar 25, 2022

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

promag left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JeremyRubin commented Aug 10, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DrahtBot commented Sep 7, 2020

Uh oh!

DrahtBot commented Dec 15, 2021

Uh oh!

JeremyRubin commented Dec 15, 2021

Uh oh!

maflcko commented Dec 15, 2021

Uh oh!

JeremyRubin commented Dec 15, 2021

Make phashblock m_hash_block and keep the map structure first

Make the data structure a set first and keep phashblock

Uh oh!

DrahtBot commented Mar 21, 2022

Uh oh!

maflcko commented Mar 21, 2022

Uh oh!

JeremyRubin commented Mar 21, 2022

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

ryanofsky Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

JeremyRubin Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

ryanofsky Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

JeremyRubin Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

ryanofsky Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Jul 25, 2022

Uh oh!

glozow commented Sep 26, 2022

Uh oh!

Rspigler commented Sep 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

JeremyRubin commented Aug 7, 2020 •

edited

Loading

DrahtBot commented Aug 7, 2020 •

edited

Loading

promag left a comment •

edited

Loading