Skip to content

Conversation

@laanwj
Copy link
Member

@laanwj laanwj commented Sep 3, 2014

  • Instead of naively planning requests every two minutes into (potentially) the far future w/ mapAlreadyAskedFor, actively keep track of requests, and which nodes offer the requested items
  • Better handles the case where nodes announce an inv and go away, or stop responding
  • Maintains state separate from CNode, which makes it easier to review and understand separately from the rest, as well as localizes changes if extension of the logic is necessary

I also think it can serve as an example of how to 'peel off' a single concern from net/main.

Uses CTimeoutCondition from @ashleyholman (#4230).

Cases to test:

  • Announce and don't reply to getdata, make sure retry chooses different node after REQUEST_TIMEOUT
  • Node disconnects and was the current node requested from - should immediately try to get from different node
  • Last node that had announced a certain transaction goes away - request should be discarded
    I've tested these cases using custom pynode clients, as well as by running this with full debugging on my node to see how it behaves,
  • Premature flushing if more than 1000 getdatas queued to a node (this is a very unlikely path but it needs to be tested)

TODO:

  • Slim down debug logging (this is initially useful for testing and checking, but too much in production)
  • Make sure FlushGetdata's handling of CNode locking is correct
  • Don't request from nodes whose sendbuffer is full

Known issues:

  • Doesn't currently group getdata requests (sends a getdata request per item instead of one getdata for many invs) - wouldn't be too difficult to fit in but I'm not sure it is worth the complexity, as usually only one inv is sent at a time (done now)

Supercedes: #4828 #4547 #4827

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"tx" instead of "netaskfor"? And should this be added to the command line syntax messages in init.cpp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be called 'tx' but I tried to keep it as general as possible. It requests inventory items.
ACK on adding it to command-line syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the trailing .first->second for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a leftover from the commented code above it. Basically, the CNodeAskForState had some initial assignments here so a reference to it was stored. This is not necessary at the moment so both the commented code and the .first->second should go.

@SergioDemianLerner
Copy link
Contributor

This is excellent. Could we unify the handling and scheduling of transactions getdata/invs with block getdata/invs ?
Because sooner or later block fetching will require a similar method to withstand malicious block invs.
Maybe that can be done later on top of this patch.

@sipa
Copy link
Member

sipa commented Sep 5, 2014

We are actually doing already something very similar for blocks, and the headersfirst branch extends it (#4468). It's a bit more complicated there, as we want a moving window of block fetching to limit out-of-orderness in which blocks arrive.

@sipa
Copy link
Member

sipa commented Sep 5, 2014

Big +1 on code organization: implementing independent pieces of protocol handling should definitely move to separate files, with separate data structures and separate locks (other examples that afaik could easily be converted into this style are ping/pong and alerts).

It's a bit unfortunate that the block fetching and tx fetching code are separated but implement similar functionality. It's a strict improvement though as they were already separate (originally my fault I guess, as I didn't touch the tx handling code when rewriting the block handling part), and unifying them now would probably interfere with other changes.

@laanwj
Copy link
Member Author

laanwj commented Sep 5, 2014

@SergioDemianLerner As I see it, blocks handling is much more closely bound to main/core than this (basically independent) inventory item fetcher, making it enough of a separate concern to warrant being a different module. I'm not against unifying them if it can be done sanely, of course. But block handling is essentially different especially after headers-first.

@rebroad
Copy link
Contributor

rebroad commented Sep 5, 2014

Malicious block invs? What is one of those?

@rebroad
Copy link
Contributor

rebroad commented Sep 6, 2014

Currently, the logic is such that large orphan txs are requested repeatedly and ignored repeatedly:-

2014-09-06 00:01:46 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=1
2014-09-06 00:01:46 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=11
2014-09-06 00:01:47 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=27
2014-09-06 00:01:48 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=23
2014-09-06 00:01:49 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=12
2014-09-06 00:01:50 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=26
2014-09-06 00:01:51 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=24
2014-09-06 00:01:53 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=20

Is it possible to make the node remember recently ignored txs so that they don't keep being requested? This was working in the previous implementation thanks to #4542

@laanwj
Copy link
Member Author

laanwj commented Sep 6, 2014

@rebroad This should simply work with #4542? It doesn't touch any part of the same code.

As I see it, the logic of whether to request something is outside scope of this module. If you ask netaskfor to retrieve a transaction for you, it will do so, until being told to stop :) (or until it runs out of peers)

@rebroad
Copy link
Contributor

rebroad commented Sep 6, 2014

@laanwj oh.. yes, you are right. Ok, I'll re-merge 4542 in that case (which didn't touch large orphan txs anyway I've just noticed!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should skip nodes whose sendbuffer is full.

What is the correct way to check this? This would be something like pnode->nSendSize < SendBufferSize(), but I'm not sure what lock is needed for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory you would need pnode->cs_vSend, but if you can argue that the system keeps working correctly even if the test returns the right result only most of the time, put a big comment, and use no lock...

@laanwj
Copy link
Member Author

laanwj commented Sep 9, 2014

Looks like the timeout could indeed be reduced.

2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7543
2014-09-09 15:15:27 ThreadHandleAskFor: processing item tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698
2014-09-09 15:15:27 QueueGetdata: Requesting item tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 from node 7543 (first request)
2014-09-09 15:15:27 FlushGetdata: peer=7543 getdata tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6150
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=633
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7455
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7018
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=330
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7471
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6306
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=978
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7436
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7596
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7504
2014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6215
2014-09-09 15:15:27 Completed: tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 peer=7543

In by-far most cases the transaction is returned in a few seconds. At the same time, invs for it are still coming in from other nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing in place at the moment to make sure that data items are requested in the same order that they're announced (std::multimap does not preserve insertion order). There probably needs to be.

OTOH when requesting from multiple nodes (or when timeouts are involved) there is no guarantee that the response will come in the same order as the requests. So maybe it's not an issue.

ashleyholman and others added 3 commits September 18, 2014 12:14
A condition that can wait for a specified timeout, this is useful when
it is known in advance that events have to be processed at some time in the
future.
This allows modules to maintain their own threads.
@laanwj laanwj force-pushed the 2014_09_request_handling branch from a2afa1d to 9cbaaf6 Compare September 18, 2014 10:23
@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4831_9cbaaf61ade4b91469f3d728795ec83859c25192/ for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@sipa
Copy link
Member

sipa commented Sep 20, 2014

Going to test this.

@sipa
Copy link
Member

sipa commented Sep 29, 2014

Works without problems (even in valgrind, after running for a few days). I didn't actually check whether it fetches/relays things correctly, though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should go below our headers, so just flip this with the block below.

@Diapolo
Copy link

Diapolo commented Sep 29, 2014

Just a general question, why are most/all comments beginning with ///?

@fanquake
Copy link
Member

@Diapolo It's so that they'll be picked up by doxygen. It doesn't recognise comments starting with //

@laanwj laanwj added this to the 0.11.0 milestone Nov 3, 2014
@laanwj laanwj added the P2P label Nov 3, 2014
@sipa
Copy link
Member

sipa commented Nov 17, 2014

I hope you pick this up soon after 0.10 :)

@rebroad
Copy link
Contributor

rebroad commented Jun 25, 2015

Needs rebase - is this still in progress?

@jgarzik
Copy link
Contributor

jgarzik commented Jul 23, 2015

Ping. Needs refresh.

I think the general consensus is that we want this, but needs more review? Seems to have positive noises in the security discussion and on here, but no ACKs.

@gmaxwell
Copy link
Contributor

gmaxwell commented Sep 6, 2015

I tested this previously (at some version...) but it got put down after being punted out of 0.10. Seems to have been forgotten. Lets unforget it.

@dcousens
Copy link
Contributor

dcousens commented Sep 7, 2015

concept ACK

@jgarzik
Copy link
Contributor

jgarzik commented Sep 15, 2015

concept ACK - let this not be forgotten

@sipa
Copy link
Member

sipa commented Sep 15, 2015 via email

@laanwj
Copy link
Member Author

laanwj commented Sep 18, 2015

Needs to be picked back up at some point, I'm not sure when.
Certainly don't want to interfere with Cory's libevent work.

@laanwj laanwj closed this Sep 26, 2015
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.