-
Notifications
You must be signed in to change notification settings - Fork 38.7k
index: During sync, commit best block after indexing #25074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Concept ACK |
w0xlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 75f53f1
pk-b2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 75f53f1
|
I wonder if it would make sense to always re-index the initial best block too, just to avoid accidentally running with such corruption undetected. |
src/index/base.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current_time is stale here. Maybe instead of moving this, it'd be better to just SetBestBlockIndex(pindex->pprev);?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #25074 (comment)
current_timeis stale here. Maybe instead of moving this, it'd be better to justSetBestBlockIndex(pindex->pprev);?
Yeah probably the reason this bug was introduced is that original author was trying to avoid current_time being stale. But pindex->pprev would be the wrong best block index to use in the Rewind() case above. It would be pointing to parent of the rewind block instead of the block that was actually rewound to. I guess this would be better than current behavior, but still technically incorrect, and seems like it would be fragile even if current index subclasses are ok with it.
I think using stale rewind time is ok and probably better than the alternatives, even if it seems a little wonky. This is all replaced anyway in 1819bdb from #24230, as Martin noted in #24230 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But pindex->pprev would be the wrong best block index to use in the Rewind() case above. It would be pointing to parent of the rewind block instead of the block that was actually rewound to. I guess this would be better than current behavior, but still technically incorrect,
I don't see how it would even be incorrect... After the rewind, pindex is changed to the block one beyond that which the index should be current to...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it would even be incorrect... After the rewind,
pindexis changed to the block one beyond that which the index should be current to...?
Oops, you're right. I didn't notice NextSyncBlock was returning the next block after the fork. So I think your pindex->pprev suggestion is good and makes sense, and probably a little better than the current fix, though I think both fixes are fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to Luke's suggestion and adjusted the commit message, thanks.
While this is simpler, current_time being a bit stale shouldn't ever be a serious problem in my opinion - I think of updating the best block / committing it every 30s as a non-essential but nice-to-have feature that may save users some time in case of shutdowns, if we removed this block completely, it shouldn't break anything.
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 75f53f1e1fc6917eac40e51ce7b6bec2717bd526. Nice catch. This just changes the index sync thread to commit after writing instead of committing before writing, so the block referenced in the commit that is supposed to have been written is actually written.
src/index/base.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #25074 (comment)
current_timeis stale here. Maybe instead of moving this, it'd be better to justSetBestBlockIndex(pindex->pprev);?
Yeah probably the reason this bug was introduced is that original author was trying to avoid current_time being stale. But pindex->pprev would be the wrong best block index to use in the Rewind() case above. It would be pointing to parent of the rewind block instead of the block that was actually rewound to. I guess this would be better than current behavior, but still technically incorrect, and seems like it would be fragile even if current index subclasses are ok with it.
I think using stale rewind time is ok and probably better than the alternatives, even if it seems a little wonky. This is all replaced anyway in 1819bdb from #24230, as Martin noted in #24230 (comment)
|
Concept ACK |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
75f53f1 to
9d4509b
Compare
|
I pushed an update, implementing Luke's suggestion.
It does get detected for the coinstatsindex, leading to an init error - that's how I encountered this, playing around with uprobes that send kill signals to bitcoind (branch). But other indexes may not. |
luke-jr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 9d4509b64ee333ef4587c77fb0d52e8e720a28c4, but it would be good to update commit message & PR description.
I was thinking probably it would be possible write a unit test for this with a mock BaseIndex subclass that overrides the CommitInternal() and WriteBlock() methods and checks that commit is called with the last block written, not the next block about to be written during sync. I don't think this is necessary however, since the bug is pretty theoretical and seems unlikely to happen in practice.
re: #25074 (comment)
This rearranges the order during the index sync phase to do the periodic update of the best block (
Commit()) only after actually indexing this block (WriteBlock()). The previous order would leave the index database temporarily in an inconsistent state - if an unclean shutdown happened at just this point in time, the index could get corrupted because at next startup, we'd assume that we have already indexed this block. Indexing the block ahead of committing the best block is the typical order and not a problem - at worst, we'd have to index this block again.
This PR description is out of date and would be good to fix because it will become part of the merge history.
Committing a block prior to indexing would leave the index database in an inconsistent state until it is indexed, which could corrupt the index in case of a unclean shutdown. Thus commit its predecessor. Co-authored-by: Luke Dashjr <[email protected]>
9d4509b to
7171ebc
Compare
|
9d4509b -> 7171ebc: Updated the commit message and the PR description.
I think that having a test-only mock index is a really interesting idea, and that it could help with increasing the test coverage. I'd be interested to look into this as a follow-up, with the goal of improving the test coverage of the base index logic more broadly (not only with respect to this issue, I think if the mock index class tested only this it might be a bit overkill). |
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 7171ebc. Looks great! Just commit message changes since last review
Committing a block prior to indexing would leave the index database in an inconsistent state until it is indexed, which could corrupt the index in case of a unclean shutdown. Thus commit its predecessor. Co-authored-by: Luke Dashjr <[email protected]> Github-Pull: bitcoin#25074 Rebased-From: 7171ebc
…xing 7171ebc index: Don't commit a best block before indexing it during sync (Martin Zumsande) Pull request description: This changes the periodic commit of the best block during the index sync phase to use the already indexed predecessor of the current block index, instead of committing the current one that will only be indexed (by calling `WriteBlock()`) after committing the best block. The previous code would leave the index database in an inconsistent state until the block is actually indexed - if an unclean shutdown happened at just this point in time, the index could get corrupted because at next startup, we'd assume that we have already indexed this block. ACKs for top commit: ryanofsky: Code review ACK 7171ebc. Looks great! Just commit message changes since last review Tree-SHA512: a008de511dd6a1731b7fdf6a90add48d1e53f7f7d6402672adb83e362677fc5b9f5cd021d3111728cb41d73f1b9c2140db79d7e183df0ab359cda8c01b0ef928
This changes the periodic commit of the best block during the index sync phase to use the already indexed predecessor of the current block index, instead of committing the current one that will only be indexed (by calling
WriteBlock()) after committing the best block.The previous code would leave the index database in an inconsistent state until the block is actually indexed - if an unclean shutdown happened at just this point in time, the index could get corrupted because at next startup, we'd assume that we have already indexed this block.