Skip to content

Conversation

@fjahr
Copy link
Contributor

@fjahr fjahr commented Aug 27, 2023

While doing manual testing on assumeutxo this week I managed to put the coindb into an inconsistent state twice. For a normal user, this can also happen if their computer crashes during a flush or if they try to stop their node during a flush and then get tired of waiting and just shut their computer down or kill the process. It's an edge case but I wouldn't be surprised if this does happen more often when assumeutxo gets used more widely because there might be multiple flushes happening during loading of the UTXO set in the beginning and users may think something is going wrong because of the unexpected wait or they forgot some configs and want to start over quickly.

The problem is, when this happens at first the node starts up normally until it's time to flush again and then it hits an assert that the user can not understand.

2023-08-25T16:31:09Z [httpworker.0] [snapshot] 52000000 coins loaded (43.30%, 6768 MB)
2023-08-25T16:31:16Z [httpworker.0] Cache size (7272532192) exceeds total space (7256510300)
2023-08-25T16:31:16Z [httpworker.0] FlushSnapshotToDisk: flushing coins cache (7272 MB) started
Assertion failed: (old_heads[0] == hashBlock), function BatchWrite, file txdb.cpp, line 126.
Abort trap: 6

We should at least log an error message that gives users a hint of what the problem is and what they can do to resolve it. I am keeping this separate from the assumeutxo project since this issue can also happen during any regular flush.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 27, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK jonatack, ryanofsky, jamesob, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@fjahr fjahr mentioned this pull request Aug 27, 2023
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK be1b6d5122355e2eff6bde4efd88a51c2740761e

@fjahr fjahr force-pushed the 202308-coindb-error branch from be1b6d5 to df60de7 Compare August 27, 2023 18:16
@jonatack
Copy link
Member

ACK df60de7

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK df60de7

This is an improvement, but I would think just killing the process should not put the coindb in an inconsistent state that would require a reindex. Am I wrong about that, or is there more work that could be done here to debug the issue and update the database atomically?

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK df60de7

@achow101
Copy link
Member

achow101 commented Sep 1, 2023

ACK df60de7

@achow101 achow101 merged commit df98a12 into bitcoin:master Sep 1, 2023
@luke-jr
Copy link
Member

luke-jr commented Sep 5, 2023

For a normal user, this can also happen if their computer crashes during a flush or if they try to stop their node during a flush and then get tired of waiting and just shut their computer down or kill the process.

It shouldn't... That would be a bug.

Frank-GER pushed a commit to syscoin/syscoin that referenced this pull request Sep 8, 2023
…in inconsistent state

df60de7 log: Print error message when coindb is in inconsistent state (Fabian Jahr)

Pull request description:

  While doing manual testing on assumeutxo this week I managed to put the coindb into an inconsistent state twice. For a normal user, this can also happen if their computer crashes during a flush or if they try to stop their node during a flush and then get tired of waiting and just shut their computer down or kill the process. It's an edge case but I wouldn't be surprised if this does happen more often when assumeutxo gets used more widely because there might be multiple flushes happening during loading of the UTXO set in the beginning and users may think something is going wrong because of the unexpected wait or they forgot some configs and want to start over quickly.

  The problem is, when this happens at first the node starts up normally until it's time to flush again and then it hits an assert that the user can not understand.

  ```
  2023-08-25T16:31:09Z [httpworker.0] [snapshot] 52000000 coins loaded (43.30%, 6768 MB)
  2023-08-25T16:31:16Z [httpworker.0] Cache size (7272532192) exceeds total space (7256510300)
  2023-08-25T16:31:16Z [httpworker.0] FlushSnapshotToDisk: flushing coins cache (7272 MB) started
  Assertion failed: (old_heads[0] == hashBlock), function BatchWrite, file txdb.cpp, line 126.
  Abort trap: 6
  ```

  We should at least log an error message that gives users a hint of what the problem is and what they can do to resolve it. I am keeping this separate from the assumeutxo project since this issue can also happen during any regular flush.

ACKs for top commit:
  jonatack:
    ACK df60de7
  achow101:
    ACK df60de7
  ryanofsky:
    Code review ACK df60de7
  jamesob:
    Code review ACK df60de7

Tree-SHA512: b546aa0b0323ece2962867a29c38e014ac83ae8f1ded090da2894b4ff2450c05229629c7e8892f7b550cf7def4038a0b4119812e548e11b00c60b1dc3d4276d2
@bitcoin bitcoin locked and limited conversation to collaborators Sep 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants