Skip to content

Conversation

@sdaftuar
Copy link
Member

If we're unable to disconnect a block during normal operation, then that is a
failure of our local system (such as disk failure) or the chain that we are on
(eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
we're trying to validate.

We should abort rather than stay on a less work chain.

Fixes #14341.

@promag
Copy link
Contributor

promag commented Feb 2, 2019

Concept ACK.

@gmaxwell
Copy link
Contributor

gmaxwell commented Feb 2, 2019

Concept ACK. The test seems a little fragile (an unrelated change to rename undo files or merge them into block files would break it), but I don't see a better way to do it.

@sdaftuar sdaftuar force-pushed the 2019-01-disconnect-failure-shutdown branch from 06e19c7 to daf4654 Compare February 5, 2019 18:06
@sdaftuar
Copy link
Member Author

sdaftuar commented Feb 5, 2019

Thanks @MarcoFalke I took your change to the test, hopefully it passes travis now. If it's too flaky though I'll just drop it from this PR.

@sdaftuar sdaftuar force-pushed the 2019-01-disconnect-failure-shutdown branch from daf4654 to be15557 Compare February 5, 2019 21:12
@sdaftuar
Copy link
Member Author

sdaftuar commented Feb 5, 2019

Updated this PR to get rid of the g_abort_node global, and instead change DisconnectTip to no longer call AbortNode itself, and instead leave it to the caller to decide how to handle failures.

This slightly changes the behavior of invalidateblock and RewindBlockIndex -- I believe it's an improvement but reviewers should take note.

@sdaftuar
Copy link
Member Author

sdaftuar commented Feb 7, 2019

The appveyor test failure appears to be an unrelated, long-standing bug in the wallet_txn_doublespend.py test (fixed in #15360).

@sdaftuar sdaftuar force-pushed the 2019-01-disconnect-failure-shutdown branch from be15557 to dafca40 Compare February 8, 2019 18:45
@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 14, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@sdaftuar sdaftuar force-pushed the 2019-01-disconnect-failure-shutdown branch from dafca40 to daf67ad Compare March 8, 2019 14:18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is the lambda needed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use wait_until_stopped instead:

def wait_until_stopped(self, timeout=BITCOIND_PROC_WAIT_TIMEOUT):
wait_until(self.is_node_stopped, timeout=timeout)

@sipa
Copy link
Member

sipa commented Mar 25, 2019

Concept ACK

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK daf67ad1d3cd6de67776fa26d49e6b1ba66aae7d. Or slightly tested. Confirmed that when the fix is reverted, wait_until in the test times out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "[validation] Crash if disconnecting a block fails" (dafa0b7e1615a4e0985ece5d5f044f14753edcf6)

Note: I was initially confused by the change to this line which removes an existing abort call. But this is just cleanup. Aborting here would be redundant with the new abort call below, which is broader and covers more failure cases.

sdaftuar added 2 commits June 5, 2019 05:05
If we're unable to disconnect a block during normal operation, then that is a
failure of our local system (such as disk failure) or the chain that we are on
(eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
we're trying to validate.

We should abort rather than stay on a less work chain.
@sdaftuar sdaftuar force-pushed the 2019-01-disconnect-failure-shutdown branch from daf67ad to a47df13 Compare June 5, 2019 09:19
@sdaftuar
Copy link
Member Author

sdaftuar commented Jun 5, 2019

Added a comment as @ryanofsky requested.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK a47df13. Only change since last review is new comment.

@practicalswift
Copy link
Contributor

utACK a47df13

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon.

Left 2 nits and a suggestion to improve the test
Here's

@@ -11,6 +11,7 @@
 """

 from test_framework.test_framework import BitcoinTestFramework
+from test_framework.test_node import ErrorMatch
 from test_framework.util import wait_until, get_datadir_path, connect_nodes
 import os

@@ -42,7 +43,9 @@ class AbortNodeTest(BitcoinTestFramework):
             self.log.info("Waiting for crash")
             wait_until(lambda: self.nodes[0].is_node_stopped(), timeout=60)
         self.log.info("Node crashed - now verifying restart fails")
-        self.nodes[0].assert_start_raises_init_error()
+        self.nodes[0].assert_start_raises_init_error(expected_msg='Corrupted block database detected', match=ErrorMatch.PARTIAL_REGEX)
+
+        self.start_node(0, ['-reindex'])

 if __name__ == '__main__':
     AbortNodeTest().main()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use wait_until_stopped instead:

def wait_until_stopped(self, timeout=BITCOIND_PROC_WAIT_TIMEOUT):
wait_until(self.is_node_stopped, timeout=timeout)

# Connecting to a node with a more work chain will trigger a reorg
# attempt.
self.nodes[1].generate(3)
with self.nodes[0].assert_debug_log(["Failed to disconnect block"]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, use ' throughout.

@sdaftuar
Copy link
Member Author

sdaftuar commented Jul 8, 2019

Thanks for the review @promag; I'll leave the test nits alone for now, someone can pick up in a future PR if there's interest.

@TheBlueMatt
Copy link
Contributor

utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but not crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in.

@sdaftuar
Copy link
Member Author

Anything else needed here?

Copy link
Member

@fanquake fanquake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a47df13

I'll leave the test nits alone for now, someone can pick up in a future PR if there's interest.

Agree

even if there are other cases I'd argue we should crash in.

@TheBlueMatt Did you want to open an issue listing these cases, or just dump out your thoughts?

@fanquake fanquake merged commit a47df13 into bitcoin:master Jul 25, 2019
fanquake added a commit that referenced this pull request Jul 25, 2019
a47df13 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar)
4433ed0 [validation] Crash if disconnecting a block fails (Suhas Daftuar)

Pull request description:

  If we're unable to disconnect a block during normal operation, then that is a
  failure of our local system (such as disk failure) or the chain that we are on
  (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
  we're trying to validate.

  We should abort rather than stay on a less work chain.

  Fixes #14341.

ACKs for top commit:
  practicalswift:
    utACK a47df13
  TheBlueMatt:
    utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but *not* crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in.
  ryanofsky:
    utACK a47df13. Only change since last review is new comment.
  promag:
    ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon.
  fanquake:
    ACK a47df13

Tree-SHA512: 4dec8cef6e7dbbe513c138fc5821a7ceab855e603ece3c16185b51a3830ab7ebbc844a28827bf64e75326f45325991dcb672f13bd7baede53304f27289c4af8d
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 29, 2019
a47df13 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar)
4433ed0 [validation] Crash if disconnecting a block fails (Suhas Daftuar)

Pull request description:

  If we're unable to disconnect a block during normal operation, then that is a
  failure of our local system (such as disk failure) or the chain that we are on
  (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
  we're trying to validate.

  We should abort rather than stay on a less work chain.

  Fixes bitcoin#14341.

ACKs for top commit:
  practicalswift:
    utACK a47df13
  TheBlueMatt:
    utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but *not* crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in.
  ryanofsky:
    utACK a47df13. Only change since last review is new comment.
  promag:
    ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon.
  fanquake:
    ACK a47df13

Tree-SHA512: 4dec8cef6e7dbbe513c138fc5821a7ceab855e603ece3c16185b51a3830ab7ebbc844a28827bf64e75326f45325991dcb672f13bd7baede53304f27289c4af8d
laanwj added a commit that referenced this pull request Sep 16, 2019
fa912a8 doc: move-only ActivateBestChain doxygen comment to header (MarcoFalke)
fa99efd doc: ActivateBestChainStep return value (MarcoFalke)

Pull request description:

  It will always return true, unless a system error such as #15305 occurred

ACKs for top commit:
  laanwj:
    ACK fa912a8

Tree-SHA512: d439da844a467f9705014b946d7d987fb62cb63fe6a325b2fdbbb73a6578fc0ade3f60892044f02face43948204fc4e3c9fa70d108233d4ca8eef27984059689
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 16, 2019
fa912a8 doc: move-only ActivateBestChain doxygen comment to header (MarcoFalke)
fa99efd doc: ActivateBestChainStep return value (MarcoFalke)

Pull request description:

  It will always return true, unless a system error such as bitcoin#15305 occurred

ACKs for top commit:
  laanwj:
    ACK fa912a8

Tree-SHA512: d439da844a467f9705014b946d7d987fb62cb63fe6a325b2fdbbb73a6578fc0ade3f60892044f02face43948204fc4e3c9fa70d108233d4ca8eef27984059689
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jun 10, 2020
Summary:
a47df13471e3168e2e02023fb20cdf2414141b36 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar)
4433ed0f730cfd60eeba3694ff3c283ce2c0c8ee [validation] Crash if disconnecting a block fails (Suhas Daftuar)

Pull request description:

  If we're unable to disconnect a block during normal operation, then that is a
  failure of our local system (such as disk failure) or the chain that we are on
  (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
  we're trying to validate.

  We should abort rather than stay on a less work chain.

  Fixes #14341.

---

Backport of Core [[bitcoin/bitcoin#15305 | PR15305]]

Test Plan:
  ninja check-all

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D6511
ftrader pushed a commit to bitcoin-cash-node/bitcoin-cash-node that referenced this pull request Aug 17, 2020
Summary:
a47df13471e3168e2e02023fb20cdf2414141b36 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar)
4433ed0f730cfd60eeba3694ff3c283ce2c0c8ee [validation] Crash if disconnecting a block fails (Suhas Daftuar)

Pull request description:

  If we're unable to disconnect a block during normal operation, then that is a
  failure of our local system (such as disk failure) or the chain that we are on
  (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
  we're trying to validate.

  We should abort rather than stay on a less work chain.

  Fixes #14341.

---

Backport of Core [[bitcoin/bitcoin#15305 | PR15305]]

Test Plan:
  ninja check-all

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D6511
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Apr 9, 2021
c623146 test: back port assert_debug_log(expected_msg) (furszy)
b0a0227 test: adding wait for rpc connection in assert_start_raises_init_error() (furszy)
a5f853d [qa] Test disconnect block failure -> shutdown (Suhas Daftuar)
6788850 [validation] Crash if disconnecting a block fails (Suhas Daftuar)

Pull request description:

  Coming from upstream bitcoin#15305 with some adaptations.

  > If we're unable to disconnect a block during normal operation, then that is a
  failure of our local system (such as disk failure) or the chain that we are on
  (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
  we're trying to validate.
  We should abort rather than stay on a less work chain.

ACKs for top commit:
  Fuzzbawls:
    ACK c623146
  random-zebra:
    ACK c623146 and merging...

Tree-SHA512: 4a9bf69910d2f8c3c4bb1520aed5fcc78e5340a6da4eec787d21051c4afaaab4d26a030d0890e7bf267bcbe008dc7122c4b922821cec99a1db1a2e50cef8ed3f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 1, 2021
a47df13 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar)
4433ed0 [validation] Crash if disconnecting a block fails (Suhas Daftuar)

Pull request description:

  If we're unable to disconnect a block during normal operation, then that is a
  failure of our local system (such as disk failure) or the chain that we are on
  (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
  we're trying to validate.

  We should abort rather than stay on a less work chain.

  Fixes bitcoin#14341.

ACKs for top commit:
  practicalswift:
    utACK a47df13
  TheBlueMatt:
    utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but *not* crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in.
  ryanofsky:
    utACK a47df13. Only change since last review is new comment.
  promag:
    ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon.
  fanquake:
    ACK a47df13

Tree-SHA512: 4dec8cef6e7dbbe513c138fc5821a7ceab855e603ece3c16185b51a3830ab7ebbc844a28827bf64e75326f45325991dcb672f13bd7baede53304f27289c4af8d
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 13, 2021
a47df13 [qa] Test disconnect block failure -> shutdown (Suhas Daftuar)
4433ed0 [validation] Crash if disconnecting a block fails (Suhas Daftuar)

Pull request description:

  If we're unable to disconnect a block during normal operation, then that is a
  failure of our local system (such as disk failure) or the chain that we are on
  (eg CVE-2018-17144), but cannot be due to failure of the (more work) chain that
  we're trying to validate.

  We should abort rather than stay on a less work chain.

  Fixes bitcoin#14341.

ACKs for top commit:
  practicalswift:
    utACK a47df13
  TheBlueMatt:
    utACK a47df13. Didn't bother to review the test in detail, it looked fine. Debated whether invalidateblock should ever crash the node, but *not* crashing in the case of hitting a pruned block (which is the only change here) is clearly better, even if there are other cases I'd argue we should crash in.
  ryanofsky:
    utACK a47df13. Only change since last review is new comment.
  promag:
    ACK a47df13, it takes awhile to quit (RPC connection timeouts) but that's unrelated - hope to fix that soon.
  fanquake:
    ACK a47df13

Tree-SHA512: 4dec8cef6e7dbbe513c138fc5821a7ceab855e603ece3c16185b51a3830ab7ebbc844a28827bf64e75326f45325991dcb672f13bd7baede53304f27289c4af8d
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failing to disconnect a block should result in node shutdown

10 participants