-
Notifications
You must be signed in to change notification settings - Fork 38.8k
[validation] Crash if disconnecting a block fails #15305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[validation] Crash if disconnecting a block fails #15305
Conversation
|
Concept ACK. |
|
Concept ACK. The test seems a little fragile (an unrelated change to rename undo files or merge them into block files would break it), but I don't see a better way to do it. |
06e19c7 to
daf4654
Compare
|
Thanks @MarcoFalke I took your change to the test, hopefully it passes travis now. If it's too flaky though I'll just drop it from this PR. |
daf4654 to
be15557
Compare
|
Updated this PR to get rid of the g_abort_node global, and instead change DisconnectTip to no longer call AbortNode itself, and instead leave it to the caller to decide how to handle failures. This slightly changes the behavior of invalidateblock and RewindBlockIndex -- I believe it's an improvement but reviewers should take note. |
|
The appveyor test failure appears to be an unrelated, long-standing bug in the |
be15557 to
dafca40
Compare
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
dafca40 to
daf67ad
Compare
test/functional/feature_abortnode.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is the lambda needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use wait_until_stopped instead:
bitcoin/test/functional/test_framework/test_node.py
Lines 304 to 305 in f792395
| def wait_until_stopped(self, timeout=BITCOIND_PROC_WAIT_TIMEOUT): | |
| wait_until(self.is_node_stopped, timeout=timeout) |
|
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK daf67ad1d3cd6de67776fa26d49e6b1ba66aae7d. Or slightly tested. Confirmed that when the fix is reverted, wait_until in the test times out.
src/validation.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[validation] Crash if disconnecting a block fails" (dafa0b7e1615a4e0985ece5d5f044f14753edcf6)
Note: I was initially confused by the change to this line which removes an existing abort call. But this is just cleanup. Aborting here would be redundant with the new abort call below, which is broader and covers more failure cases.
If we're unable to disconnect a block during normal operation, then that is a failure of our local system (such as disk failure) or the chain that we are on (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that we're trying to validate. We should abort rather than stay on a less work chain.
daf67ad to
a47df13
Compare
|
Added a comment as @ryanofsky requested. |
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK a47df13. Only change since last review is new comment.
|
utACK a47df13 |
promag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon.
Left 2 nits and a suggestion to improve the test
Here's
@@ -11,6 +11,7 @@
"""
from test_framework.test_framework import BitcoinTestFramework
+from test_framework.test_node import ErrorMatch
from test_framework.util import wait_until, get_datadir_path, connect_nodes
import os
@@ -42,7 +43,9 @@ class AbortNodeTest(BitcoinTestFramework):
self.log.info("Waiting for crash")
wait_until(lambda: self.nodes[0].is_node_stopped(), timeout=60)
self.log.info("Node crashed - now verifying restart fails")
- self.nodes[0].assert_start_raises_init_error()
+ self.nodes[0].assert_start_raises_init_error(expected_msg='Corrupted block database detected', match=ErrorMatch.PARTIAL_REGEX)
+
+ self.start_node(0, ['-reindex'])
if __name__ == '__main__':
AbortNodeTest().main()
test/functional/feature_abortnode.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use wait_until_stopped instead:
bitcoin/test/functional/test_framework/test_node.py
Lines 304 to 305 in f792395
| def wait_until_stopped(self, timeout=BITCOIND_PROC_WAIT_TIMEOUT): | |
| wait_until(self.is_node_stopped, timeout=timeout) |
| # Connecting to a node with a more work chain will trigger a reorg | ||
| # attempt. | ||
| self.nodes[1].generate(3) | ||
| with self.nodes[0].assert_debug_log(["Failed to disconnect block"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, use ' throughout.
|
Thanks for the review @promag; I'll leave the test nits alone for now, someone can pick up in a future PR if there's interest. |
|
utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but not crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in. |
|
Anything else needed here? |
fanquake
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK a47df13
I'll leave the test nits alone for now, someone can pick up in a future PR if there's interest.
Agree
even if there are other cases I'd argue we should crash in.
@TheBlueMatt Did you want to open an issue listing these cases, or just dump out your thoughts?
a47df13 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar) 4433ed0 [validation] Crash if disconnecting a block fails (Suhas Daftuar) Pull request description: If we're unable to disconnect a block during normal operation, then that is a failure of our local system (such as disk failure) or the chain that we are on (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that we're trying to validate. We should abort rather than stay on a less work chain. Fixes #14341. ACKs for top commit: practicalswift: utACK a47df13 TheBlueMatt: utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but *not* crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in. ryanofsky: utACK a47df13. Only change since last review is new comment. promag: ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon. fanquake: ACK a47df13 Tree-SHA512: 4dec8cef6e7dbbe513c138fc5821a7ceab855e603ece3c16185b51a3830ab7ebbc844a28827bf64e75326f45325991dcb672f13bd7baede53304f27289c4af8d
a47df13 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar) 4433ed0 [validation] Crash if disconnecting a block fails (Suhas Daftuar) Pull request description: If we're unable to disconnect a block during normal operation, then that is a failure of our local system (such as disk failure) or the chain that we are on (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that we're trying to validate. We should abort rather than stay on a less work chain. Fixes bitcoin#14341. ACKs for top commit: practicalswift: utACK a47df13 TheBlueMatt: utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but *not* crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in. ryanofsky: utACK a47df13. Only change since last review is new comment. promag: ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon. fanquake: ACK a47df13 Tree-SHA512: 4dec8cef6e7dbbe513c138fc5821a7ceab855e603ece3c16185b51a3830ab7ebbc844a28827bf64e75326f45325991dcb672f13bd7baede53304f27289c4af8d
fa912a8 doc: move-only ActivateBestChain doxygen comment to header (MarcoFalke) fa99efd doc: ActivateBestChainStep return value (MarcoFalke) Pull request description: It will always return true, unless a system error such as #15305 occurred ACKs for top commit: laanwj: ACK fa912a8 Tree-SHA512: d439da844a467f9705014b946d7d987fb62cb63fe6a325b2fdbbb73a6578fc0ade3f60892044f02face43948204fc4e3c9fa70d108233d4ca8eef27984059689
fa912a8 doc: move-only ActivateBestChain doxygen comment to header (MarcoFalke) fa99efd doc: ActivateBestChainStep return value (MarcoFalke) Pull request description: It will always return true, unless a system error such as bitcoin#15305 occurred ACKs for top commit: laanwj: ACK fa912a8 Tree-SHA512: d439da844a467f9705014b946d7d987fb62cb63fe6a325b2fdbbb73a6578fc0ade3f60892044f02face43948204fc4e3c9fa70d108233d4ca8eef27984059689
Summary: a47df13471e3168e2e02023fb20cdf2414141b36 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar) 4433ed0f730cfd60eeba3694ff3c283ce2c0c8ee [validation] Crash if disconnecting a block fails (Suhas Daftuar) Pull request description: If we're unable to disconnect a block during normal operation, then that is a failure of our local system (such as disk failure) or the chain that we are on (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that we're trying to validate. We should abort rather than stay on a less work chain. Fixes #14341. --- Backport of Core [[bitcoin/bitcoin#15305 | PR15305]] Test Plan: ninja check-all Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D6511
Summary: a47df13471e3168e2e02023fb20cdf2414141b36 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar) 4433ed0f730cfd60eeba3694ff3c283ce2c0c8ee [validation] Crash if disconnecting a block fails (Suhas Daftuar) Pull request description: If we're unable to disconnect a block during normal operation, then that is a failure of our local system (such as disk failure) or the chain that we are on (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that we're trying to validate. We should abort rather than stay on a less work chain. Fixes #14341. --- Backport of Core [[bitcoin/bitcoin#15305 | PR15305]] Test Plan: ninja check-all Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D6511
c623146 test: back port assert_debug_log(expected_msg) (furszy) b0a0227 test: adding wait for rpc connection in assert_start_raises_init_error() (furszy) a5f853d [qa] Test disconnect block failure -> shutdown (Suhas Daftuar) 6788850 [validation] Crash if disconnecting a block fails (Suhas Daftuar) Pull request description: Coming from upstream bitcoin#15305 with some adaptations. > If we're unable to disconnect a block during normal operation, then that is a failure of our local system (such as disk failure) or the chain that we are on (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that we're trying to validate. We should abort rather than stay on a less work chain. ACKs for top commit: Fuzzbawls: ACK c623146 random-zebra: ACK c623146 and merging... Tree-SHA512: 4a9bf69910d2f8c3c4bb1520aed5fcc78e5340a6da4eec787d21051c4afaaab4d26a030d0890e7bf267bcbe008dc7122c4b922821cec99a1db1a2e50cef8ed3f
a47df13 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar) 4433ed0 [validation] Crash if disconnecting a block fails (Suhas Daftuar) Pull request description: If we're unable to disconnect a block during normal operation, then that is a failure of our local system (such as disk failure) or the chain that we are on (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that we're trying to validate. We should abort rather than stay on a less work chain. Fixes bitcoin#14341. ACKs for top commit: practicalswift: utACK a47df13 TheBlueMatt: utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but *not* crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in. ryanofsky: utACK a47df13. Only change since last review is new comment. promag: ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon. fanquake: ACK a47df13 Tree-SHA512: 4dec8cef6e7dbbe513c138fc5821a7ceab855e603ece3c16185b51a3830ab7ebbc844a28827bf64e75326f45325991dcb672f13bd7baede53304f27289c4af8d
a47df13 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar) 4433ed0 [validation] Crash if disconnecting a block fails (Suhas Daftuar) Pull request description: If we're unable to disconnect a block during normal operation, then that is a failure of our local system (such as disk failure) or the chain that we are on (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that we're trying to validate. We should abort rather than stay on a less work chain. Fixes bitcoin#14341. ACKs for top commit: practicalswift: utACK a47df13 TheBlueMatt: utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but *not* crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in. ryanofsky: utACK a47df13. Only change since last review is new comment. promag: ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon. fanquake: ACK a47df13 Tree-SHA512: 4dec8cef6e7dbbe513c138fc5821a7ceab855e603ece3c16185b51a3830ab7ebbc844a28827bf64e75326f45325991dcb672f13bd7baede53304f27289c4af8d
If we're unable to disconnect a block during normal operation, then that is a
failure of our local system (such as disk failure) or the chain that we are on
(eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
we're trying to validate.
We should abort rather than stay on a less work chain.
Fixes #14341.