Skip to content

Conversation

@sipa
Copy link
Member

@sipa sipa commented May 15, 2012

Immediately issue a "getblocks", instead of a "getdata" (which will trigger the relevant "inv" to be sent anyway), and only do so when the previous set of invs led us into a known and attached part of the block tree.

This patch has been tested on a (constructed) blockchain that was effectively stuck.

@rebroad
Copy link
Contributor

rebroad commented May 15, 2012

I suspected getblocks was probably the answer :)

It would be very nice to see a mini white paper on what this does and how it works....

Immediately issue a "getblocks", instead of a "getdata" (which will
trigger the relevant "inv" to be sent anyway), and only do so when
the previous set of invs led us into a known and attached part of
the block tree.
@gmaxwell
Copy link
Contributor

K. Tested resync from start several times. Tested partial resync. Tested recovery from fork with reorg on non-stuck node. Tested recovery from a forkmode stuck node. Tested with loadblocks. Make sure it wasn't bloating up the chain with a ton of copies of extra block .... I can't break it, so I'm pulling.

gmaxwell added a commit that referenced this pull request May 16, 2012
Hopefully final fix for the stuck blockchain issue
@gmaxwell gmaxwell merged commit 462f5d9 into bitcoin:master May 16, 2012
@rebroad
Copy link
Contributor

rebroad commented May 17, 2012

Just out of interest, does the initial getblocks (that's sent to the first peer upon starting the node) also cause the recovery from afork stuck node? Is it that this change is to enable it to become unstuck without restarting the node? Or did even restarting not fix things?

@gmaxwell
Copy link
Contributor

Even restarting did not fix the particular issue this fix was needed to address— but normal nodes probably can't get into that state. (The nodes in question were ones that got stuck due to incorrectly rejecting the correct chain because e.g. of premature BIP16 enforcement)

@sipa
Copy link
Member Author

sipa commented May 17, 2012

A bit more elaborate: if you were running an 0.6 RC, you would have code that used the old BIP16 switchover date. The date passed, but you did not update your software. Suddenly someone sends an invalid BIP transaction (so, one that is valid according to the traditional rules, but not according to the BIP16 rules). On the main network BIP16 validation is not active, so the transaction gets accepted. However, your old RC enforces BIP16 validation, so it considers this transaction invalid. This only happens after downloading the block that contains it, and adding it to the tree in the database. A few hundred blocks are added on top of this one, all in your database, but this chain does not become the best chain (as it is considers invalid).

Finally, you upgrade your software, and you now have the correct BIP16 switchover date. The correct chain is already downloaded in your block database, but it is not marked as the active best chain. At startup, your node sends a getblocks from its current best tip (which is one block before the one that contained the invalid BIP16 transactions) to the top of the chain. The peer answers by sending 500 invs back, and remembers to request 500 more when the last of those is downloaded. However, we already have the first 500, so not one is requested, and nothing happens. We must somehow make the peer send us the rest of the invs, as that is our only means for reconnecting that chain to the current best block. Earlier versions of this patch forced a getdata of that 500th block, this one sends a getblocks immediately.

@rebroad
Copy link
Contributor

rebroad commented May 17, 2012

I understand the explanation so far but it still doesn't explain how the new getblocks achieves that, and why upon receiving the very latest block, that that doesn't fix it, nor why it can't be coded to re-evaluate the last 6 or so blocks in the last known valid again to see if they are still valid by any new rules. The last solution would be better, IMHO, as it wouldn't increase network traffic, unlike this fix (kludge?).

Sent from my Nokia phone
-----Original Message-----
From: Pieter Wuille
Sent: 17/05/2012 11:10:55
Subject: Re: [bitcoin] Hopefully final fix for the stuck blockchain issue (#1315)

A bit more elaborate: if you were running an 0.6 RC, you would have code that used the old BIP16 switchover date. The date passed, but you did not update your software. Suddenly someone sends an invalid BIP transaction (so, one that is valid according to the traditional rules, but not according to the BIP16 rules). On the main network BIP16 validation is not active, so the transaction gets accepted. However, your old RC enforces BIP16 validation, so it considers this transaction invalid. This only happens after downloading the block that contains it, and adding it to the tree in the database. A few hundred blocks are added on top of this one, all in your database, but this chain does not become the best chain (as it is considers invalid).

Finally, you upgrade your software, and you now have the correct BIP16 switchover date. The correct chain is already downloaded in your block database, but it is not marked as the active best chain. At startup, your node sends a getblocks from its current best tip (which is one block before the one that contained the invalid BIP16 transactions) to the top of the chain. The peer answers by sending 500 invs back, and remembers to request 500 more when the last of those is downloaded. However, we already have the first 500, so not one is requested, and nothing happens. We must somehow make the peer send us the rest of the invs, as that is our only means for reconnecting that chain to the current best block. Earlier versions of this patch forced a getdata of that 500th block, this one sends a getblocks immediately.


Reply to this email directly or view it on GitHub:
#1315 (comment)

@rebroad
Copy link
Contributor

rebroad commented May 17, 2012

Sorry, meant to say, that it could re-check the invalid again upon start up, perhaps by giving a command line option or perhaps automatically whenever the invalid again is longer by 6 blocks or more.

Sent from my Nokia phone
-----Original Message-----
From: Pieter Wuille
Sent: 17/05/2012 11:10:55
Subject: Re: [bitcoin] Hopefully final fix for the stuck blockchain issue (#1315)

A bit more elaborate: if you were running an 0.6 RC, you would have code that used the old BIP16 switchover date. The date passed, but you did not update your software. Suddenly someone sends an invalid BIP transaction (so, one that is valid according to the traditional rules, but not according to the BIP16 rules). On the main network BIP16 validation is not active, so the transaction gets accepted. However, your old RC enforces BIP16 validation, so it considers this transaction invalid. This only happens after downloading the block that contains it, and adding it to the tree in the database. A few hundred blocks are added on top of this one, all in your database, but this chain does not become the best chain (as it is considers invalid).

Finally, you upgrade your software, and you now have the correct BIP16 switchover date. The correct chain is already downloaded in your block database, but it is not marked as the active best chain. At startup, your node sends a getblocks from its current best tip (which is one block before the one that contained the invalid BIP16 transactions) to the top of the chain. The peer answers by sending 500 invs back, and remembers to request 500 more when the last of those is downloaded. However, we already have the first 500, so not one is requested, and nothing happens. We must somehow make the peer send us the rest of the invs, as that is our only means for reconnecting that chain to the current best block. Earlier versions of this patch forced a getdata of that 500th block, this one sends a getblocks immediately.


Reply to this email directly or view it on GitHub:
#1315 (comment)

@sipa
Copy link
Member Author

sipa commented May 17, 2012

This fix will - over the course of an entire blockchain syncup - maybe cause 50 kilobytes extra communication. What you suggest is also possible, but harder and with less guarantees, in my opinion. You'd need to traverse the entire blockchain database and find stale chains, and re-evaluate them all?

@rebroad
Copy link
Contributor

rebroad commented May 17, 2012

Doesn't this fix also increase data transfer even after the node has caught up? I thought it does getblocks upon receipt of every block, doesn't it?

To re-evaluate the invalid again it would only need to re-evaluate one block upon start-up in the example you give. The first block in the longest invalid chain.

@rebroad
Copy link
Contributor

rebroad commented May 17, 2012

Also, technically, i'd say this current fix requires a BIP.

@sipa
Copy link
Member Author

sipa commented May 17, 2012

In normal operation, this patch does nothing. It only sends out a getblocks when an inv is received with blocks that are already known and part of the block tree. During normal operation, this never happens, as you only request invs for the part after the main chain. And the block-sync process has never been well-formalized, though the responses to the network requests are. Those aren't changed however.

@rebroad
Copy link
Contributor

rebroad commented May 17, 2012

I think you are incorrect to say it doesn't happen during normal operation. This is not my experience. When i new block arrives on the network, let's say 8 nodes announce it in invs. My node will getdata it from the first one, download it and ProcessBlock it usually well before the last connected peer sends invs for it, so with this code with each new block, the slowest peers to announce it will receive the new getblocks in this fix. As the network gets bigger, this could get worse. It could also get less worse if ProcessBlocks takes longer due to larger blocks.

@gmaxwell
Copy link
Contributor

While previously testing this I specifically looked for excess requests during normal operations and didn't see any. Either I made a mistake or just had unlucky timing— or it's something about the peer mix thats triggering it, because I see ones now— about 1769 of them on 05/16.

Actually, they seem to be being caused in high volume by specific peers. E.g. I have a couple which are each responsible for several hundred of them.

@rebroad
Copy link
Contributor

rebroad commented May 17, 2012

I've added a fix to this, in my current bitcoin-ParallelBlockDownload branch (the 3rd commit of pull #1326).... I still think the ideal solution is to do it without using the network though....

coblee pushed a commit to litecoin-project/litecoin that referenced this pull request Jul 17, 2012
Hopefully final fix for the stuck blockchain issue
lateminer pushed a commit to lateminer/bitcoin that referenced this pull request May 6, 2020
449872b [Build] Remove Windows 32 bit build, coming from btc@faf666f8148eeb305a9c4f78459aff2c7268016b (furszy)

Pull request description:

  Coming from [btc@15939](bitcoin#15939).

  Based on bitcoin#1315.

ACKs for top commit:
  Fuzzbawls:
    ACK 449872b
  random-zebra:
    ACK 449872b and merging...

Tree-SHA512: 39c8785ade1202c09c76d964fdc3c739f9e162fec5c9b2991ee9a0a60c4935485c3822be82fd2e279fe536e049b3b1e4a9d5137145505adc09feea850420d8f0
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants