-
Notifications
You must be signed in to change notification settings - Fork 38.6k
consensus: Remove checkpoints (take 2) #31649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consensus: Remove checkpoints (take 2) #31649
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31649. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
If we go through with this, perhaps it's worth keeping the |
|
Concept ACK |
|
You can delete |
5417243 to
a694e1d
Compare
|
Updated the |
|
Concept ACK. |
src/net_processing.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems worth keeping, at least for a few versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed seems better to keep this. Perhaps mention the last mainnet checkpoint height, since MinimumChainWork() is far above that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if there were code to keep for now it would be this. I ultimately decided to remove it because the purpose it serves no longer seems to apply if we remove checkpoints. With the headers sync logic, iiuc, it would be fine for our node in IBD to respond to GETHEADERS because the only headers we could have would belong to a chain with enough work. There isn't a risk anymore of sending a bogus chain to a peer and getting disconnected.
This behavior was first introduced in #6172 and then changed to MinimumChainWork() in #24178. Some more discussion can be found in #21106 (comment) and #21106 (comment).
Let me know if there's something I'm missing here, wrt to the logic of why it can be safely removed. If we do keep it, then I should at least update the comment (to explain why the check is still there?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way I can see this still being a problem is:
- Someone started a node with 23.x or earlier (before p2p: Implement anti-DoS headers sync #25717)
- Started being fed a low-work headers chain.
- Upgraded to 30.x(?) or later (with this PR)
- Never made any headers sync progress since.
That seems pretty unlikely, so I think it's fine to remove this if/when we get rid of checkpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made sense to me without being familiar with how the anti-DoS sync works in details, asked on IRC, Suhas points out it's still possible for a node to have checkpoint-violating headers to serve:
11:00 darosior: that's not quite true, because an adversary can feed you the honest chain during pre-sync, and then a bogus one during redownload
11:00 darosior: so you'd detect it quickly and disconnect but be left with headers that are possibly checkpoint-violating (if your node isn't enforcing checkpoints)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we respond to GETHEADERS as soon as we're listening. It seems like a good idea in general to not answer until we have MinimumChainWork worth of headers. (using an empty headers message to ensure they don't hang up on us)
Even without a malicious peer involved, we might be only part way in obtaining headers from an honest peer. If another peer then wants headers from us, IIUC we'd be wasting their time because we can't send them enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some more discussion on IRC and IRL.
For this to have an effect, it means an attacker needs to get us to accept blocks on an attack chain that has more work than minchainwork (at the time it is accepted), which should be very expensive. So I don't think it's a necessity to keep the check for the purpose of preventing being disconnected by checkpoint-enabled nodes.
On the other hand, there seems to be little reason not to do the check. It's cheap, and it doesn't seem unreasonable to avoid offering peers to sync from us before we're at a point where we can offer enough that we'd accept it ourselves, like @Sjors points out.
I'd vote to keep it, but update the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason to keep this check might be the following:
If we are in IBD and our chain is below minchainwork, we don't send headers to inbound peers, so they don't get to know our chain and won't attempt to download blocks from us (e.g. if they are even further behind in IBD).
I think that while in IBD it makes sense to use the bandwidth for downloading the chain instead of serving others. Though admittedly this isn't a very realistic scenario because we don't self-advertise during IBD and are unlikely to get inbound peers anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I kept the check and adjusted the comment in a separate commit. Let me know if it makes sense or can be improved in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving the check in makes sense to me and can be removed later with little cost if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
The remaining "checkpoint" references in the code seem unrelated.
src/consensus/validation.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To guard against BLOCK_HEADER_LOW_WORK changing its numeric value now (to 8 instead of previous 9), we could hard-code the previous values for the remaining ones
enum class BlockValidationResult {
BLOCK_RESULT_UNSET = 0, //!< initial value. Block has not yet been rejected
BLOCK_CONSENSUS = 1, //!< invalid by consensus rules (excluding any below reasons)
BLOCK_CACHED_INVALID = 2, //!< this block was cached as being invalid and we didn't store the reason why
BLOCK_INVALID_HEADER = 3, //!< invalid proof of work or time too old
BLOCK_MUTATED = 4, //!< the block's data didn't match the data committed to by the PoW
BLOCK_MISSING_PREV = 5, //!< We don't have the previous block the checked one is built on
BLOCK_INVALID_PREV = 6, //!< A block this one builds on is invalid
BLOCK_TIME_FUTURE = 7, //!< block timestamp was > 2 hours in the future (or our clock is bad)
// BLOCK_CHECKPOINT = 8, //!< Deprecated and removed
BLOCK_HEADER_LOW_WORK = 9 //!< the block header may be on a too-little-work chain
};There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if these are stored to disk as part of the block index, or exposed via RPC. Otherwise changing their numeric values shouldn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging from the change to ipc_test.cpp below it seems that eventually (#10102) these values will be sent over IPC. At that point we have to fix their integer values. Might as well do it now? cc @ryanofsky
Should be a separate commit, since we would not want to revert that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these values will be sent over IPC. At that point we have to fix their integer values. Might as well do it now?
Why would they need to be fixed? IPC is only an internal interface. It is currently experimental and in draft and there are no plans for the interface to support mismatching server/client binaries. I'd say the code should be left as-is until there are plans (with support) to even support that. Until then, changing the code for that reason seems like pointless churn to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no plans for the interface to support mismatching server/client binaries
That tends to happen without a plan, but indeed we can wait.
part of the block index, or exposed via RPC
^ this still need to be clarified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of the block index, or exposed via RPC
I don't believe either of these to be the case.
src/init.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice improvement. I've been running my node with -nocheckpoints=1 for years, so now it will just throw a warning rather than abort launch. (which I tested)
|
Concept ACK If we go ahead with this, right after the v29 branch-off would seem like a good time. I plan to expand the alternate mainnet chain in #31583 to 11111+ blocks and have ~p2p_dos_header_tree.py` use that instead. That would make it easier to revert the checkpoints even after testnet3 is completely dropped. (update: actually that test is hard to replace without also replaying the real mainnet) |
a694e1d to
9e91e53
Compare
mzumsande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #25725 the approach was to remove just the actual checkpoints, but leave the supporting code to be removed in the next / a later release (#25725 (comment)). Did you consider this, instead of doing it all in one commit?
dergoegge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 9e91e536a45e3fe802eb8b38ccc893ef894e72e7
The "new" headers pre-sync (shipped in v24.0) protects us against the same denial-of-service issues that the checkpoints were solving in the past. Therefore it is reasonable to remove them, especially considering the testing the pre-sync has undergone and the amount of time that has passed.
Even under the assumption that there is a bug in the headers pre-sync logic (i.e. it's somehow still possible to submit low work headers), the checkpoints only raise the cost of attack slightly to an ever shrinking fraction of the cost of mining a block at the tip (see https://bitcoincore.org/en/2024/09/18/disclose-headers-oom/).
If we are not comfortable with removing the checkpoints, then more of them should be added, as the protections they provide will converge towards zero otherwise.
Philosophically it is nice to get rid of the checkpoints, as they are one of the only things in this code base that dictate what the canonical chain is, which isn't a position we want to be in (this concern was also part of the design considerations for features such as assumevalid and assumeutxo).
I think we could reasonably ship this removal in v29.0, as I don't see what additional assurances waiting even longer would give us.
Merged pull requests tend to get additional attention, e.g. in the Optech Newsletter. Perhaps this will jolt someones memory, and there wouldn't be a rush to revert close to a release. Perhaps someone is sitting on an exploit and wants to collect a bug bounty for it. For it to be a bug, it needs to be merged. That said, there's no such bounty and we probably don't want to incentivise that type of behaviour. |
The headers presync logic (only downloading headers that lead to a chain with sufficient work, implemented in #25717) should be enough to prevent memory DoS using low-work headers. Therefore, we no longer have any use for checkpoints.
All checkpoints and checkpoint logic are removed in a single commit, to make it easy to revert if necessary.
Some previous discussion can be found in #25725. The conclusion at the time was that more testing of the presync logic was needed. Now that we have unit, functional, and fuzz tests for this logic, it seems safe to move forward with checkpoint removal.