net: Better askfor request management #4831

laanwj · 2014-09-03T09:38:45Z

Instead of naively planning requests every two minutes into (potentially) the far future w/ mapAlreadyAskedFor, actively keep track of requests, and which nodes offer the requested items
Better handles the case where nodes announce an inv and go away, or stop responding
Maintains state separate from CNode, which makes it easier to review and understand separately from the rest, as well as localizes changes if extension of the logic is necessary

I also think it can serve as an example of how to 'peel off' a single concern from net/main.

Uses CTimeoutCondition from @ashleyholman (#4230).

Cases to test:

Announce and don't reply to getdata, make sure retry chooses different node after REQUEST_TIMEOUT
Node disconnects and was the current node requested from - should immediately try to get from different node
Last node that had announced a certain transaction goes away - request should be discarded
I've tested these cases using custom pynode clients, as well as by running this with full debugging on my node to see how it behaves,
Premature flushing if more than 1000 getdatas queued to a node (this is a very unlikely path but it needs to be tested)

TODO:

Slim down debug logging (this is initially useful for testing and checking, but too much in production)
Make sure FlushGetdata's handling of CNode locking is correct
Don't request from nodes whose sendbuffer is full

Known issues:

Doesn't currently group getdata requests (sends a getdata request per item instead of one getdata for many invs) - wouldn't be too difficult to fit in but I'm not sure it is worth the complexity, as usually only one inv is sent at a time (done now)

Supercedes: #4828 #4547 #4827

rebroad · 2014-09-04T04:30:16Z

src/netaskfor.cpp

"tx" instead of "netaskfor"? And should this be added to the command line syntax messages in init.cpp?

Could be called 'tx' but I tried to keep it as general as possible. It requests inventory items.
ACK on adding it to command-line syntax.

SergioDemianLerner · 2014-09-05T00:30:08Z

src/netaskfor.cpp

What is the trailing .first->second for?

That's a leftover from the commented code above it. Basically, the CNodeAskForState had some initial assignments here so a reference to it was stored. This is not necessary at the moment so both the commented code and the .first->second should go.

SergioDemianLerner · 2014-09-05T00:57:47Z

This is excellent. Could we unify the handling and scheduling of transactions getdata/invs with block getdata/invs ?
Because sooner or later block fetching will require a similar method to withstand malicious block invs.
Maybe that can be done later on top of this patch.

sipa · 2014-09-05T01:01:44Z

We are actually doing already something very similar for blocks, and the headersfirst branch extends it (#4468). It's a bit more complicated there, as we want a moving window of block fetching to limit out-of-orderness in which blocks arrive.

sipa · 2014-09-05T01:08:05Z

Big +1 on code organization: implementing independent pieces of protocol handling should definitely move to separate files, with separate data structures and separate locks (other examples that afaik could easily be converted into this style are ping/pong and alerts).

It's a bit unfortunate that the block fetching and tx fetching code are separated but implement similar functionality. It's a strict improvement though as they were already separate (originally my fault I guess, as I didn't touch the tx handling code when rewriting the block handling part), and unifying them now would probably interfere with other changes.

laanwj · 2014-09-05T10:08:02Z

@SergioDemianLerner As I see it, blocks handling is much more closely bound to main/core than this (basically independent) inventory item fetcher, making it enough of a separate concern to warrant being a different module. I'm not against unifying them if it can be done sanely, of course. But block handling is essentially different especially after headers-first.

rebroad · 2014-09-05T11:06:24Z

Malicious block invs? What is one of those?

rebroad · 2014-09-06T00:23:17Z

Currently, the logic is such that large orphan txs are requested repeatedly and ignored repeatedly:-

2014-09-06 00:01:46 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=1
2014-09-06 00:01:46 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=11
2014-09-06 00:01:47 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=27
2014-09-06 00:01:48 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=23
2014-09-06 00:01:49 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=12
2014-09-06 00:01:50 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=26
2014-09-06 00:01:51 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=24
2014-09-06 00:01:53 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=20

Is it possible to make the node remember recently ignored txs so that they don't keep being requested? This was working in the previous implementation thanks to #4542

laanwj · 2014-09-06T02:36:36Z

@rebroad This should simply work with #4542? It doesn't touch any part of the same code.

As I see it, the logic of whether to request something is outside scope of this module. If you ask netaskfor to retrieve a transaction for you, it will do so, until being told to stop :) (or until it runs out of peers)

rebroad · 2014-09-06T07:25:30Z

@laanwj oh.. yes, you are right. Ok, I'll re-merge 4542 in that case (which didn't touch large orphan txs anyway I've just noticed!).

laanwj · 2014-09-06T08:21:06Z

src/netaskfor.cpp

This should skip nodes whose sendbuffer is full.

What is the correct way to check this? This would be something like pnode->nSendSize < SendBufferSize(), but I'm not sure what lock is needed for that.

In theory you would need pnode->cs_vSend, but if you can argue that the system keeps working correctly even if the test returns the right result only most of the time, put a big comment, and use no lock...

laanwj · 2014-09-09T15:19:35Z

Looks like the timeout could indeed be reduced.

2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7543
2014-09-09 15:15:27 ThreadHandleAskFor: processing item tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698
2014-09-09 15:15:27 QueueGetdata: Requesting item tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 from node 7543 (first request)
2014-09-09 15:15:27 FlushGetdata: peer=7543 getdata tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6150
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=633
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7455
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7018
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=330
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7471
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6306
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=978
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7436
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7596
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7504
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6215
2014-09-09 15:15:27 Completed: tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 peer=7543

In by-far most cases the transaction is returned in a few seconds. At the same time, invs for it are still coming in from other nodes.

laanwj · 2014-09-10T09:07:50Z

src/netaskfor.cpp

There is nothing in place at the moment to make sure that data items are requested in the same order that they're announced (std::multimap does not preserve insertion order). There probably needs to be.

OTOH when requesting from multiple nodes (or when timeouts are involved) there is no guarantee that the response will come in the same order as the requests. So maybe it's not an issue.

A condition that can wait for a specified timeout, this is useful when it is known in advance that events have to be processed at some time in the future.

This allows modules to maintain their own threads.

BitcoinPullTester · 2014-09-18T10:37:30Z

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4831_9cbaaf61ade4b91469f3d728795ec83859c25192/ for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

sipa · 2014-09-20T02:20:46Z

Going to test this.

sipa · 2014-09-29T03:52:46Z

Works without problems (even in valgrind, after running for a few days). I didn't actually check whether it fetches/relays things correctly, though.

Diapolo · 2014-09-29T08:43:35Z

src/netaskfor.cpp

Nit: This should go below our headers, so just flip this with the block below.

Diapolo · 2014-09-29T08:46:46Z

Just a general question, why are most/all comments beginning with ///?

fanquake · 2014-09-29T08:58:23Z

@Diapolo It's so that they'll be picked up by doxygen. It doesn't recognise comments starting with //

sipa · 2014-11-17T16:27:29Z

I hope you pick this up soon after 0.10 :)

rebroad · 2015-06-25T16:10:05Z

Needs rebase - is this still in progress?

jgarzik · 2015-07-23T18:12:45Z

Ping. Needs refresh.

I think the general consensus is that we want this, but needs more review? Seems to have positive noises in the security discussion and on here, but no ACKs.

gmaxwell · 2015-09-06T08:46:53Z

I tested this previously (at some version...) but it got put down after being punted out of 0.10. Seems to have been forgotten. Lets unforget it.

dcousens · 2015-09-07T13:25:36Z

concept ACK

jgarzik · 2015-09-15T17:14:37Z

concept ACK - let this not be forgotten

sipa · 2015-09-15T17:34:32Z

Perhaps this will be pushed back until after Cory's P2P refactor?

laanwj · 2015-09-18T03:58:22Z

Needs to be picked back up at some point, I'm not sure when.
Certainly don't want to interfere with Cory's libevent work.

rebroad reviewed Sep 4, 2014
View reviewed changes

laanwj force-pushed the 2014_09_request_handling branch from ff901f6 to 6256fdb Compare September 4, 2014 07:36

laanwj mentioned this pull request Sep 4, 2014

prevent peer flooding request queue for an inv #4547

Closed

laanwj force-pushed the 2014_09_request_handling branch from 9b13825 to a2afa1d Compare September 4, 2014 12:22

SergioDemianLerner reviewed Sep 5, 2014
View reviewed changes

laanwj reviewed Sep 6, 2014
View reviewed changes

laanwj reviewed Sep 10, 2014
View reviewed changes

laanwj mentioned this pull request Sep 10, 2014

Only need to subtract 1 (originally time was in seconds). #4828

Closed

This was referenced Sep 18, 2014

Remove tx from AlreadyAskedFor list once we receive it, not when we process it. #4460

Merged

Improve askfor logging (show time it will be requested by) #4827

Closed

ashleyholman and others added 3 commits September 18, 2014 12:14

sync: Add CTimeoutCondition

b9e5772

A condition that can wait for a specified timeout, this is useful when it is known in advance that events have to be processed at some time in the future.

net: add StartThreads and StopThreads signals

7ea2476

This allows modules to maintain their own threads.

net: Better inventory request management

9cbaaf6

laanwj force-pushed the 2014_09_request_handling branch from a2afa1d to 9cbaaf6 Compare September 18, 2014 10:23

Diapolo reviewed Sep 29, 2014
View reviewed changes

src/netaskfor.cpp

Copy link

Diapolo Sep 29, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should go below our headers, so just flip this with the block below.

sipa mentioned this pull request Oct 19, 2014

Move message processing to new 'procmsg' module. #4646

Closed

laanwj added this to the 0.11.0 milestone Nov 3, 2014

laanwj added the P2P label Nov 3, 2014

ajweiss mentioned this pull request Dec 18, 2014

Prevent DOS attacks on in-flight data structures #5507

Merged

laanwj modified the milestones: 0.11.0, 0.12.0 May 1, 2015

morcos mentioned this pull request Sep 16, 2015

Prevent peer flooding inv request queue (redux) #6306

Closed

laanwj closed this Sep 26, 2015

gmaxwell mentioned this pull request Nov 23, 2015

Prevent peer flooding inv request queue (redux) (redux) #7079

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

net: Better askfor request management #4831

net: Better askfor request management #4831

Uh oh!

Conversation

laanwj commented Sep 3, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SergioDemianLerner commented Sep 5, 2014

Uh oh!

sipa commented Sep 5, 2014

Uh oh!

sipa commented Sep 5, 2014

Uh oh!

laanwj commented Sep 5, 2014

Uh oh!

rebroad commented Sep 5, 2014

Uh oh!

rebroad commented Sep 6, 2014

Uh oh!

laanwj commented Sep 6, 2014

Uh oh!

rebroad commented Sep 6, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laanwj commented Sep 9, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BitcoinPullTester commented Sep 18, 2014

Uh oh!

sipa commented Sep 20, 2014

Uh oh!

sipa commented Sep 29, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Diapolo commented Sep 29, 2014

Uh oh!

fanquake commented Sep 29, 2014

Uh oh!

sipa commented Nov 17, 2014

Uh oh!

rebroad commented Jun 25, 2015

Uh oh!

jgarzik commented Jul 23, 2015

Uh oh!

gmaxwell commented Sep 6, 2015

Uh oh!

dcousens commented Sep 7, 2015

Uh oh!

jgarzik commented Sep 15, 2015

Uh oh!

sipa commented Sep 15, 2015 via email

Uh oh!

laanwj commented Sep 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants