Skip to content

Conversation

@mzumsande
Copy link
Contributor

@mzumsande mzumsande commented May 6, 2022

This changes the periodic commit of the best block during the index sync phase to use the already indexed predecessor of the current block index, instead of committing the current one that will only be indexed (by calling WriteBlock()) after committing the best block.

The previous code would leave the index database in an inconsistent state until the block is actually indexed - if an unclean shutdown happened at just this point in time, the index could get corrupted because at next startup, we'd assume that we have already indexed this block.

@theStack
Copy link
Contributor

theStack commented May 6, 2022

Concept ACK

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 75f53f1

Copy link

@pk-b2 pk-b2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 75f53f1

@luke-jr
Copy link
Member

luke-jr commented May 9, 2022

I wonder if it would make sense to always re-index the initial best block too, just to avoid accidentally running with such corruption undetected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_time is stale here. Maybe instead of moving this, it'd be better to just SetBestBlockIndex(pindex->pprev);?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #25074 (comment)

current_time is stale here. Maybe instead of moving this, it'd be better to just SetBestBlockIndex(pindex->pprev);?

Yeah probably the reason this bug was introduced is that original author was trying to avoid current_time being stale. But pindex->pprev would be the wrong best block index to use in the Rewind() case above. It would be pointing to parent of the rewind block instead of the block that was actually rewound to. I guess this would be better than current behavior, but still technically incorrect, and seems like it would be fragile even if current index subclasses are ok with it.

I think using stale rewind time is ok and probably better than the alternatives, even if it seems a little wonky. This is all replaced anyway in 1819bdb from #24230, as Martin noted in #24230 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But pindex->pprev would be the wrong best block index to use in the Rewind() case above. It would be pointing to parent of the rewind block instead of the block that was actually rewound to. I guess this would be better than current behavior, but still technically incorrect,

I don't see how it would even be incorrect... After the rewind, pindex is changed to the block one beyond that which the index should be current to...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it would even be incorrect... After the rewind, pindex is changed to the block one beyond that which the index should be current to...?

Oops, you're right. I didn't notice NextSyncBlock was returning the next block after the fork. So I think your pindex->pprev suggestion is good and makes sense, and probably a little better than the current fix, though I think both fixes are fine.

Copy link
Contributor Author

@mzumsande mzumsande May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to Luke's suggestion and adjusted the commit message, thanks.
While this is simpler, current_time being a bit stale shouldn't ever be a serious problem in my opinion - I think of updating the best block / committing it every 30s as a non-essential but nice-to-have feature that may save users some time in case of shutdowns, if we removed this block completely, it shouldn't break anything.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 75f53f1e1fc6917eac40e51ce7b6bec2717bd526. Nice catch. This just changes the index sync thread to commit after writing instead of committing before writing, so the block referenced in the commit that is supposed to have been written is actually written.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #25074 (comment)

current_time is stale here. Maybe instead of moving this, it'd be better to just SetBestBlockIndex(pindex->pprev);?

Yeah probably the reason this bug was introduced is that original author was trying to avoid current_time being stale. But pindex->pprev would be the wrong best block index to use in the Rewind() case above. It would be pointing to parent of the rewind block instead of the block that was actually rewound to. I guess this would be better than current behavior, but still technically incorrect, and seems like it would be fragile even if current index subclasses are ok with it.

I think using stale rewind time is ok and probably better than the alternatives, even if it seems a little wonky. This is all replaced anyway in 1819bdb from #24230, as Martin noted in #24230 (comment)

@naumenkogs
Copy link
Contributor

Concept ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented May 12, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@mzumsande mzumsande force-pushed the 202205_index_sync_order branch from 75f53f1 to 9d4509b Compare May 13, 2022 00:08
@mzumsande
Copy link
Contributor Author

I pushed an update, implementing Luke's suggestion.

I wonder if it would make sense to always re-index the initial best block too, just to avoid accidentally running with such corruption undetected.

It does get detected for the coinstatsindex, leading to an init error - that's how I encountered this, playing around with uprobes that send kill signals to bitcoind (branch). But other indexes may not.
Since this should not be possible unless there is a logic bug in the index code, checking on startup that there is an indexed entry for the best block / aborting otherwise may be another way instead of attempting to recover ?

Copy link
Member

@luke-jr luke-jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 9d4509b64ee333ef4587c77fb0d52e8e720a28c4, but it would be good to update commit message & PR description.

I was thinking probably it would be possible write a unit test for this with a mock BaseIndex subclass that overrides the CommitInternal() and WriteBlock() methods and checks that commit is called with the last block written, not the next block about to be written during sync. I don't think this is necessary however, since the bug is pretty theoretical and seems unlikely to happen in practice.

re: #25074 (comment)

This rearranges the order during the index sync phase to do the periodic update of the best block (Commit()) only after actually indexing this block (WriteBlock()). The previous order would leave the index database temporarily in an inconsistent state - if an unclean shutdown happened at just this point in time, the index could get corrupted because at next startup, we'd assume that we have already indexed this block. Indexing the block ahead of committing the best block is the typical order and not a problem - at worst, we'd have to index this block again.

This PR description is out of date and would be good to fix because it will become part of the merge history.

Committing a block prior to indexing would leave the index database
in an inconsistent state until it is indexed, which could corrupt the
index in case of a unclean shutdown. Thus commit its predecessor.

Co-authored-by: Luke Dashjr <[email protected]>
@mzumsande mzumsande force-pushed the 202205_index_sync_order branch from 9d4509b to 7171ebc Compare May 19, 2022 11:21
@mzumsande
Copy link
Contributor Author

mzumsande commented May 19, 2022

9d4509b -> 7171ebc: Updated the commit message and the PR description.

I was thinking probably it would be possible write a unit test for this with a mock BaseIndex subclass that overrides the CommitInternal() and WriteBlock() methods and checks that commit is called with the last block written, not the next block about to be written during sync. I don't think this is necessary however, since the bug is pretty theoretical and seems unlikely to happen in practice.

I think that having a test-only mock index is a really interesting idea, and that it could help with increasing the test coverage. I'd be interested to look into this as a follow-up, with the goal of improving the test coverage of the base index logic more broadly (not only with respect to this issue, I think if the mock index class tested only this it might be a bit overkill).

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 7171ebc. Looks great! Just commit message changes since last review

@fanquake fanquake merged commit e18fd47 into bitcoin:master May 19, 2022
@mzumsande mzumsande deleted the 202205_index_sync_order branch May 19, 2022 13:26
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request May 22, 2022
Committing a block prior to indexing would leave the index database
in an inconsistent state until it is indexed, which could corrupt the
index in case of a unclean shutdown. Thus commit its predecessor.

Co-authored-by: Luke Dashjr <[email protected]>

Github-Pull: bitcoin#25074
Rebased-From: 7171ebc
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 28, 2022
…xing

7171ebc index: Don't commit a best block before indexing it during sync (Martin Zumsande)

Pull request description:

  This changes the periodic commit of the best block during the index sync phase to use the already indexed predecessor of the current block index, instead of committing the current one that will only be indexed (by calling `WriteBlock()`) after committing the best block.

  The previous code would leave the index database in an inconsistent state until the block is actually indexed - if an unclean shutdown happened at just this point in time, the index could get corrupted because at next startup, we'd assume that we have already indexed this block.

ACKs for top commit:
  ryanofsky:
    Code review ACK 7171ebc. Looks great! Just commit message changes since last review

Tree-SHA512: a008de511dd6a1731b7fdf6a90add48d1e53f7f7d6402672adb83e362677fc5b9f5cd021d3111728cb41d73f1b9c2140db79d7e183df0ab359cda8c01b0ef928
@bitcoin bitcoin locked and limited conversation to collaborators May 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants