-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Log explicit error message when coindb is found in inconsistent state #28350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
jonatack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK be1b6d5122355e2eff6bde4efd88a51c2740761e
be1b6d5 to
df60de7
Compare
|
ACK df60de7 |
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK df60de7
This is an improvement, but I would think just killing the process should not put the coindb in an inconsistent state that would require a reindex. Am I wrong about that, or is there more work that could be done here to debug the issue and update the database atomically?
jamesob
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK df60de7
|
ACK df60de7 |
It shouldn't... That would be a bug. |
…in inconsistent state df60de7 log: Print error message when coindb is in inconsistent state (Fabian Jahr) Pull request description: While doing manual testing on assumeutxo this week I managed to put the coindb into an inconsistent state twice. For a normal user, this can also happen if their computer crashes during a flush or if they try to stop their node during a flush and then get tired of waiting and just shut their computer down or kill the process. It's an edge case but I wouldn't be surprised if this does happen more often when assumeutxo gets used more widely because there might be multiple flushes happening during loading of the UTXO set in the beginning and users may think something is going wrong because of the unexpected wait or they forgot some configs and want to start over quickly. The problem is, when this happens at first the node starts up normally until it's time to flush again and then it hits an assert that the user can not understand. ``` 2023-08-25T16:31:09Z [httpworker.0] [snapshot] 52000000 coins loaded (43.30%, 6768 MB) 2023-08-25T16:31:16Z [httpworker.0] Cache size (7272532192) exceeds total space (7256510300) 2023-08-25T16:31:16Z [httpworker.0] FlushSnapshotToDisk: flushing coins cache (7272 MB) started Assertion failed: (old_heads[0] == hashBlock), function BatchWrite, file txdb.cpp, line 126. Abort trap: 6 ``` We should at least log an error message that gives users a hint of what the problem is and what they can do to resolve it. I am keeping this separate from the assumeutxo project since this issue can also happen during any regular flush. ACKs for top commit: jonatack: ACK df60de7 achow101: ACK df60de7 ryanofsky: Code review ACK df60de7 jamesob: Code review ACK df60de7 Tree-SHA512: b546aa0b0323ece2962867a29c38e014ac83ae8f1ded090da2894b4ff2450c05229629c7e8892f7b550cf7def4038a0b4119812e548e11b00c60b1dc3d4276d2
While doing manual testing on assumeutxo this week I managed to put the coindb into an inconsistent state twice. For a normal user, this can also happen if their computer crashes during a flush or if they try to stop their node during a flush and then get tired of waiting and just shut their computer down or kill the process. It's an edge case but I wouldn't be surprised if this does happen more often when assumeutxo gets used more widely because there might be multiple flushes happening during loading of the UTXO set in the beginning and users may think something is going wrong because of the unexpected wait or they forgot some configs and want to start over quickly.
The problem is, when this happens at first the node starts up normally until it's time to flush again and then it hits an assert that the user can not understand.
We should at least log an error message that gives users a hint of what the problem is and what they can do to resolve it. I am keeping this separate from the assumeutxo project since this issue can also happen during any regular flush.