-
Notifications
You must be signed in to change notification settings - Fork 38.7k
validation: ensure assumevalid is always used during reindex #31615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31615. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
|
I had an offline conversation with @mzumsande. During the reindex, we can apply assumevalid for following cases:
Then we download headers from peers in parallel to ActivateBestChain() I'll put this in draft while I implement and test this over the next few days |
57bd31e to
89ca265
Compare
89ca265 to
fda93df
Compare
|
I'm not sure whether I can confirm however that this does seem to solve the issue I had, reported in #31494 (i.e. reindex(-chainstate) after a partial IBD always enabling script verification). To test it, I have added a cmake -B build -DCMAKE_BUILD_TYPE=Debug >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1\
&& mkdir -p demo && build/bin/bitcoind -datadir=demo -stopatheight=1000 >/dev/null && build/bin/bitcoind -datadir=demo -reindex -stopatheight=100 >/dev/null \
&& build/bin/bitcoind -datadir=demo -reindex-chainstate -stopatheight=10 | grep "script verification"
[warning] fScriptChecks is true
[warning] fScriptChecks is falseI can also confirm that adding I also checked that the new Details2025-01-30T15:08:22.897000Z TestFramework (INFO): Success
2025-01-30T15:08:31.343000Z TestFramework (INFO): Restarting node while reindexing..
2025-01-30T15:08:32.551000Z TestFramework (INFO): Testing reindex with -assumevalid
2025-01-30T15:09:23.963000Z TestFramework (INFO): Testing reindex with assumevalid block in block index and minchainwork > best_header chainwork
2025-01-30T15:09:52.592000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
File "/Users/lorinc/IdeaProjects/bitcoin/test/functional/test_framework/test_framework.py", line 135, in main
self.run_test()
~~~~~~~~~~~~~^^
File "/Users/lorinc/IdeaProjects/bitcoin/build/test/functional/feature_reindex.py", line 203, in run_test
self.assume_valid()
~~~~~~~~~~~~~~~~~^^So from me it's an approach ACK, but someone else needs to validate the implementation details and the corner cases that can result from this. |
I agree too.
I'll look into this! |
fda93df to
98dce62
Compare
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
I think the current approach would work.
Even though it means that if the user provided a bad / non-existing assumeutxo hash they could skip script validation during a reindex when they wouldn't have in the original run, I'd say that's not a major problem because a reindex should only be necessary in case of a a db corruption, which should not be remotely triggerable.
So it will hopefully make the reindex a bit faster in case of a database corruption, while not changing the assumevalid security assumption very much, and it does respect -assumevalid=0.
Though it would be good to have some experienced contributors chime in that were around when assumevalid was introduced (#9484).
|
Thanks for the reviews @l0rinc and @mzumsande . Due to your feedback, I have opened the PR for general review. |
|
Needs rebase, if still relevant |
98dce62 to
cb29bc3
Compare
src/node/blockstorage.cpp
Outdated
| return; | ||
| } | ||
| } | ||
| update_all_chainstate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that in case of a reindex, ActivateBestChain(), will be called twice now from ImportBlocks, but that shouldn't be a problem because the second call just won't do any work.
cb29bc3 to
11a9d55
Compare
|
Rebased on master 5e6dbfd and addressed reviews |
stratospher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 11a9d55e.
Agree that applying reindex-assumevalid combo can be useful in the special cases mentioned above to speedup reindexing when we don't have all blocks.
A behaviour change during assumevalid+reindex now would be:
- we'd skip script checks even when the assumevalid blockindex doesn't exist (ex:we put
assumevalid=somegarbage(or)blockhash-with-typo) - this was not the case on master where we skip script checks only when know that the assumevalid blockindex exists.
Don't think it's a problem since assumevalid already puts responsibility on the user and we'd need to have tampered block files on disk from some external source (in which case user shouldn't be using assumevalid in the first place). just wanted to mention it + curious about what others think about this/any other possible behaviour change.
@l0rinc Can you share steps to reproduce? Did the previous reindex use signature validation? Was your node pruned? |
|
Not pruned, but did many reindex-chainstates (it's a rpi4b benchmarking server), maybe a few |
|
While just restarting the node with this commit didn't help for some reason, doing a reindex with it does seem to disable signature verification successfully. But it doesn't actually seem to fix the underlying problem, since So while it does seem to work around the issue described in #33336 (comment), it doesn't seem to solve the underlying problem of reindex ignoring the later headers. |
src/validation.cpp
Outdated
| // The test against the minimum chain work prevents the skipping when denied access to any chain at | ||
| // least as good as the expected chain. | ||
| // During a reindex, skip the minimumchainwork check because the previous IBD run may have been interrupted | ||
| // before it could connect enough blocks to reach the minimumchainwork. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimumchainwork should match the assumevalid block, so if minimumchainwork isn't reached the assumevalid block won't be present and this code block won't be executed.
So this is only relevant for reindexing when you're manually choosing an older assumevalid block than the default, and aren't also adjusting the minchainwork. (If you choose a more recent assumevalid block, then the default minchainwork will be satisfied anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still makes sense to account for this scenario since minimumchainwork and assumevalid block are set separately. I'll update the comment to explain that minimumchainwork only needs to be skipped if you set an older assumevalid block without also reducing the minimumchainwork.
6923417 to
d7e4854
Compare
|
Rebased on master and added a comment explaining why we bother to skip minimumchainwork, even though we expect minimumchainwork to match assumevalid block height most times. |
src/validation.cpp
Outdated
| fScriptChecks = (GetBlockProofEquivalentTime(*m_chainman.m_best_header, *pindex, *m_chainman.m_best_header, params.GetConsensus()) <= 60 * 60 * 24 * 7 * 2); | ||
| } | ||
| } else if (m_chainman.m_blockman.IsReindexing()) { | ||
| // During a reindex, the assumed valid block may not be in the index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the change does seem to fix my personal experience of unexpected script validations, I don't yet understand why reindex does this in the first place - can we fix that instead of adding extra conditions to the script validation checks? Seems a bit hacky and only solves part of the problem, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you are suggesting we do something more invasive. Can you clarify what you mean?
I do not believe something more invasive is required since this simple fix works, but I'm happy to hear what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, but it bothers me that we (or at least I) don't understand what's causing the problem in the first place, the current solution feels like a work-around to me - which is still better than what we had, but it's also arguably more complicated and it doesn't seem to fix the underlying problem of why reindex is missing headers in the first place. And are we sure it's a reindex-only bug or can it happen for reindex-chainstate or IBD as well? I haven't dug too deep into this, but I want it to be fixed and was wondering if the apparent lack of interest in the reviews is because others also felt it's a workaround. Thanks for working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, but it bothers me that we (or at least I) don't understand what's causing the problem in the first place
I tried to describe this in the linked issue, I think the problem is understood. To rephrase it:
Normal IBD has the order 1) Download entire header chain until we have crossed minchainwork 2) Start downloading blocks
=> assumevalid will be applied because we have a header with minchainwork and the assumevalid block in our index
During reindex, we 1) Delete all existing headers 2) Read blocks and headers of just the blocks we have saved on disk 3) Start connecting blocks - without ever downloading the full header chain from peers.
So if we didn't have the assumevalid block / a minchainwork full block on disk before starting the reindex (because we didn't get that far in our previous IBD) assumevalid will not be applied.
|
Concept ACK |
d7e4854 to
06ab084
Compare
06ab084 to
6270f21
Compare
-reindex deletes the block tree db and attempts to rebuild it based on local block files, without attempting to re-request headers for which we don't have the full block. This might cause: - the best known chain to have a chainwork less than the minimumchainwork - the assumedvalid block to not be in the block index if the previous IBD was interrupted before downloading it
Also move feature_reindex.py test to "Tests less than 2m" section in test_runner.py, because it is now much slower and has similar runtimes to other tests in this section
6270f21 to
55daf0e
Compare
|
Rebased on master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK: reindex validation behaviour being different to IBD validation behaviour is undesirable and should be avoided.
Slight approach NACK: I think this is probably the smallest diff we can get to improve reindex behaviour, but I think there are drawbacks that outweigh the benefits:
- it still doesn't make IBD and reindex validation behaviour the same: with the current approach blocks that are NOT part of the assumevalid chain will still be validated as assumevalid. While it doesn't seem likely that this can currently be abused by an attacker, I'm not certain and I'd prefer avoiding having to think about it now and in the future.
- this makes validation slightly more complex by adding another edge case to solve a problem which should only (?) occur in case of corruption, most likely caused by faulty hardware. While I'm not against improving reindex behaviour, I'm not convinced it's important enough to complicate overall validation logic. If the user has fixed their hardware faults, just running a fresh IBD doesn't seem like a big deal for these rare occurences?
Generally speaking, I think we should move towards IBD and reindex validation behaviour converging rather than diverging. Three alternative approaches, sorted by decreasing feasibility, include:
- Log an error that assumevalid is enabled but reindex could not find the specified assumevalid header, print the highest available header in blockindex and inform the user that they can restart with
-assumevalid=0or-assumevalid=<best_available_header>or a fresh IBD, and then shutdown the node. - rearchitect our blockindex storage so that we don't need reindex, or so that it doesn't wipe the assumevalid state. Relevant work is being done in #32427, and @cfields shared some relevant thoughts offline too.
- generalize validation logic so that there is no difference between IBD and reindex, making it possible to do p2p during reindex
My preference for this PR would be to implement approach 1) to not change logic but just increase visibility for the user, and then actually improve things by improving our architecture (approaches 2) and/or 3)) rather than patching things up.
I agree with this approach, but we would still have to update the
I'm not sure how important it is for |
That seems sensible. I'm currently unsure what the best approach is here, but informing in logging seems like the minimum we can do.
I don't think it is. Reindex being fully offline to me seems like an artifact of our current startup and locking logic, rather than a stated goal. Note also that my suggestions don't mean reindex requires network connection, just that it doesn't preclude it so headers can be synced if the user allows network connections. |
|
For my benchmarking needs
Now that #33336 was merge we're getting part of that message already which should already help a lot in this situation. Thanks for taking care of this issue, agree with the concerns of @stickies-v, but I personally would either keep |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this issue would be good to fix!
However also agree with #31615 (review) that logging a warning/error and doing some kind of higher level change instead of patching things up here would be preferable.
Higher level proposal
Add an earlier phase for -reindex where it rebuilds the headers chain/BlockIndex from blocks on disk, and if we're still below minimumchainwork, perform headerssync to populate the index prior to this phase of processing blocks.
A condition for exiting this new phase would be that we fulfill minimumchainwork, that way we shouldn't (ever?) have an issue where best header is below minimum chainwork when we reach ConnectBlock().
There may be a concern that some of the headers taken from blocks on disk somehow got corrupted, leading to us filling the block index with garbage. But if we verify headers connect by prev-hash and obey thresholds and difficulty adjustments before feeding them to the index we should be okay (same as we do for headerssync).
Alternative
An alternative to the above would be to have a completely different fast path for -reindex where if the block exists on disk, we only do minimal hashing to verify that the entire block data is not corrupt. This assumes we do more complete validation before storing blocks to disk (and doesn't account for switching between binaries with different consensus rules).
This would however enable a malicious party to give invalid blocks data to a victim while offline, "here, I optimized IBD for you". But it would still require an eclipse attack / network split to maintain.
Sidebar
(There was an issue with the prior 661a32305c13fdffc7dc4420aa44eb503739fc12 where if we halt IBD before reaching the assumevalid block, and restart with -reindex, it looks like we would have disabled script validation even though the block was younger than 2 weeks.
661a323#diff-97c3a52bc5fad452d82670a7fd291800bae20c7bc35bb82686c2c0a4ea7b5b98R2462
This has been fixed in newer pushes AFAICS).
| assert_equal(node.getblock(best_blockhash)['height'], block_info['height']) | ||
| assert_equal(best_blockhash, block_info['hash']) | ||
|
|
||
| self.log.info("Success") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Few tests report success, the test framework always outputs "Tests successful" if the test(s) it runs succeed already, so seems unnecessary.
₿ git grep 'self.log.info("Success")'
test/functional/feature_pruning.py: self.log.info("Success")
test/functional/feature_pruning.py: self.log.info("Success")
test/functional/feature_pruning.py: self.log.info("Success")
test/functional/feature_pruning.py: self.log.info("Success")
test/functional/feature_pruning.py: self.log.info("Success")
test/functional/feature_reindex.py: self.log.info("Success")
test/functional/feature_reindex.py: self.log.info("Success")
| } else if (it->second.GetAncestor(pindex->nHeight) != pindex) { | ||
| } else if (is_present_in_index && it->second.GetAncestor(pindex->nHeight) != pindex) { | ||
| script_check_reason = (pindex->nHeight > it->second.nHeight) ? "block height above assumevalid height" : "block not in assumevalid chain"; | ||
| } else if (m_chainman.m_best_header->GetAncestor(pindex->nHeight) != pindex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Prior to this PR, all if's from here on down expected the current block to be an ancestor of the assumed valid block).
| bool is_present_in_index{it != m_blockman.m_block_index.end()}; | ||
| bool is_reindexing{m_chainman.m_blockman.IsReindexing()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we keep these, const bool might be nicer, although the scope is small.
| } else if (it->second.GetAncestor(pindex->nHeight) != pindex) { | ||
| } else if (is_present_in_index && it->second.GetAncestor(pindex->nHeight) != pindex) { | ||
| script_check_reason = (pindex->nHeight > it->second.nHeight) ? "block height above assumevalid height" : "block not in assumevalid chain"; | ||
| } else if (m_chainman.m_best_header->GetAncestor(pindex->nHeight) != pindex) { | ||
| script_check_reason = "block not in best header chain"; | ||
| } else if (m_chainman.m_best_header->nChainWork < m_chainman.MinimumChainWork()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remark:
Didn't get this at first - if our block is an ancestor of AssumedValidBlock(), why would either of:
a) it not being part of the best header chain
b) the best header chain being below minimum chainwork
...make us turn on script checking for this assumevalid ancestor?
It's been this way since assumevalid was introduced #9484 / e440ac7.
I guess a) could be to cover the case where a node runner is instructed by a malicious party to run with an -assumevalid hash from a bogus chain fork which has less PoW. If you're off the main fork, we need to make sure you're not on a fork that contains invalid blocks, so we perform full validation.
b) Could be to handle network eclipse attacks in combination with -assumevalid where the malicious party is too lazy to also make users lower the -minimumchainwork setting.
Maybe we could add some clarifying comments to that affect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can infer, we use MinimumChainWork to set an "expected chain length". This protects the node from any chain that is not at least as good as the set expected chain length, especially when it is eclipsed. Since the assumevalid block is typically set at the MinimumChainWork height, it would seem that we can remove the MinimumChainWork check from here by checking that the assumevalid block is an ancestor of (or is) the best_header, but in the case that we are using a malicious assumevalid block hash, we can still choose the expected chain by setting our MinimumChainWork appropriately. I see this as the last defence against such an attack; the attacker would also have to convince you to also reduce your MinimumChainWork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mind adding a comment stating this. Reviewers can confirm if it correctly reflects the intent of the code.
|
Thanks for the reviews and suggestions @l0rinc @stickies-v @hodlinator @mzumsande @TheCharlatan @stratospher . |
assumevalid may be ignored during reindex for the following reasons:
This PR fixes this by skipping script verification during reindex if assumevalid is enabled.
EDIT: While the minimumchainwork is typically set to match the assumevalid block height, users typically adjust the assumevalid block without also adjusting the minimumchainwork; Scenario 2 from the list can occur if the assumevalid block is set to an older block without also reducing the minimumchainwork.