Skip to content

Conversation

@sipa
Copy link
Member

@sipa sipa commented Jun 22, 2013

Ping automatically every 2 minutes (unconditionally), instead of of after 30 minutes of no sending, for latency measurement and keep-alive. Also, disconnect if no reply arrives within 5 minutes, instead of after 90 minutes of inactivity.

This should make detection of stalled connections much faster.

@sipa
Copy link
Member Author

sipa commented Jun 22, 2013

@mikehearn Would this interact badly with BitcoinJ wallets? How frequently do they send pings?
(or anyone with an alternate implementation)

@mikehearn
Copy link
Contributor

bitcoinj sends pings every two seconds, it's much more aggressive than this patch. We could certainly reduce it. Ping times are used to pick which peer to download from.

@jgarzik
Copy link
Contributor

jgarzik commented Jun 23, 2013

The disconnect logic seems like it would negatively impact testnet.

@sipa
Copy link
Member Author

sipa commented Jun 23, 2013

Inverted the logic, so it now becomes a ping if nothing has been received for a while, rather than sent.

Tested that it indeed detects broken connections within one minute.

@mikehearn
Copy link
Contributor

Change looks good to me, although it will have the effect of disconnecting any nodes that don't respond to pings with pongs (or some other message). As pong messages were added in protocol version 60000 it would effectively EOL clients older than that and this may deserve an announcement somewhere.

However bitcoinj clients will be fine with it.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jul 7, 2013

So, this causes nodes to impose an upper bound on their peers response latency. Effectively, someone who will take a minute to respond can't participate in the network at all. This may adversely impact some anonymity protocols which cause high latency— for many kinds of Bitcoin usage the latency isn't all that critical. I'd suggest that such users should be using an alternative transport, except the bitcoin p2p protocol is already reasonably well designed for high latency usage (at least its highly asynchronous).

Some of the other side effects of timeouts is that they can reduce link stability during a DOS attack, and they can preclude some kinds of node high availability schemes which might cause tens of seconds of unresponsiveness (e.g. during a state resync or hardware migration). Connection slot filling attacks are more effective if you can overload a peer temporarily and make them drop all their good connections.

Some other protocols, like BGP, negotiate the heart-beating— 30, 90 being a common set of parameters in that case— to address the problem of potentially imposing too fast a response requirement. But since our network protocol is mostly stateless (though— it has become less so with bloom-filtering, and never was completely: e.g. inv caching) there is less of an issue as restarts aren't so bad.

I'd instead prefer a longer timeout and separately having peer rotation code that periodically slays the least recently active node from node out of the set which been heard from in >60 seconds. E.g. imposing the shorter timeout but only when we could potentially better use the slot.

TCP's design targets a maximum segment lifetime of 2 minutes. Many operating systems have an SO_KEEPALIVE timeout of around 10 minutes. Somewhere in that range

@mikehearn
Copy link
Contributor

It would be easy to adjust the timeout for onion addresses, but really, I'm not sure we want nodes in the network that can't respond within a minute. Slow nodes can have terrible effects on the user experience for SPV wallets. They will automatically drop to the bottom of the preference list in bitcoinj and not be used for much except observing broadcasts, for that reason.

That said, just dropping the slowest peer when we run out of slots on the bitcoind side indeed sounds reasonable.

@jgarzik
Copy link
Contributor

jgarzik commented Aug 25, 2013

How about a 5-minute timeout, and get this merged? Maybe 1 minute is too short, but 90 is far too long.

@sipa
Copy link
Member Author

sipa commented Oct 14, 2013

Suggestion: send pings every 2 minutes (even in case something was received recently, with the new ping-response-time-measurement that means we always get some useful latency information), and disconnect after receiving nothing for 5 minutes.

@gmaxwell
Copy link
Contributor

@sipa ACK unconditional 2 minute ping plus 5 minute disconnect.

@sipa
Copy link
Member Author

sipa commented Oct 14, 2013

Rebased & updated.

@gavinandresen
Copy link
Contributor

ACK.

@sipa
Copy link
Member Author

sipa commented Oct 14, 2013

There may be a bug in this code; will investigate later.

EDIT: fixed.

@jgarzik
Copy link
Contributor

jgarzik commented Oct 15, 2013

ACK

Optional nit: "5 * 60" is more self-documenting than "300", and the previous code used the "M * 60" notation. All modern compilers will automatically convert the more human-readable M*60 at compile time, so there is no added runtime cost.

@sipa
Copy link
Member Author

sipa commented Oct 15, 2013

Moved/added the ping/timeout constants to net.h (where they can be accessed by both net & main), and changed the timeout logic a bit: either there is an unanswered ping >5 minutes old, or there has been no message received at all for 5 minutes.

src/net.h Outdated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: Perhaps indent the comments to be in line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sipa
Copy link
Member Author

sipa commented Oct 16, 2013

Currently running this patch myself on bitcoin.sipa.be. I see a surprisingly high number of ping timeouts and inactivity timeouts (every 10 minutes or so, very irregularly). I'll investigate whether these are actual connections that go dead (in which case this patch is actually useful...) or a problem with the disconnection logic.

@luke-jr
Copy link
Member

luke-jr commented Feb 21, 2014

@sipa What did you find?

@sipa
Copy link
Member Author

sipa commented May 10, 2014

Rebased, and made some changes (better reporting, and don't enforce 5-minute receive timeout on pre-BIP31 peers).

I'm testing it again myself.

@sipa
Copy link
Member Author

sipa commented May 10, 2014

Strange, I have one peer (/Satoshi:0.8.6/, proto v70001) which seems to disconnect when it receives a ping message, and then immediately reconnects.

@sipa
Copy link
Member Author

sipa commented May 11, 2014

Seems the majority of timeouts detected are ping timeouts from getaddr.bitnodes.io (which apparently only recently implemented ping/pong - though probably not yet deployed).

Apart from that I do quite some others, from various clients and IPs, every ~10 minutes on average the past hours. These look like genuine connection problems.

@ayeowch
Copy link
Contributor

ayeowch commented May 11, 2014

@sipa I have just deployed the ping/pong update to getaddr.bitnodes.io. The crawler should be responding to incoming ping properly now.

@sipa
Copy link
Member Author

sipa commented May 11, 2014

@ayeowch Cool, good to know. I'm not seeing any pongs yet, though...

@ayeowch
Copy link
Contributor

ayeowch commented May 11, 2014

@sipa Hmm.. If the crawler (subver=/getaddr.bitnodes.io:0.1/) is already connected to your node. Try sending a ping with bitcoind ping. I did get pong from the crawler by doing this.

@sipa
Copy link
Member Author

sipa commented May 11, 2014

2014-05-11 10:27:53 Sending ping to [2a01:4f8:202:81b1::2]:48980 /getaddr.bitnodes.io:0.1/

No pong afterwards (while I do receive pongs from other nodes).

@sipa
Copy link
Member Author

sipa commented May 31, 2014

For reference: the problem with bitnodes.io was lingering connections from a crawler that didn't support BIP31 (but advertized a protocol version that did), which was fixed on their side now.

I'm seeing occassional disconnects because of the new ping timeouts, but I believe these are correct.

Any objections to merging?

@laanwj
Copy link
Member

laanwj commented Jun 5, 2014

@sipa No objections to pinging more, but I'm a bit wary about the 'disconnect nodes faster' part of this. Going from 90 minutes to 5 minutes is a big change. It wouldn't be the first time that something is introduced that can hang bitcoind for a few minutes.

@leofidus
Copy link

leofidus commented Jun 6, 2014

I think there are already scenarios where network pings are not answered within 5 minutes. For example the importprivkey rpc command can lock cs_main for many minutes during a rescan. If during that time a network message arrives which also requires a lock on cs_main (for example a harmless inv), this locks the network thread until the rescan is complete. This leaves the peer unable to answer pings for more than 10 minutes.

@sipa
Copy link
Member Author

sipa commented Jun 6, 2014

I think it's reasonable that peers disconnect you during that time. You can't use their service, and they can't use yours.

@laanwj
Copy link
Member

laanwj commented Jun 6, 2014

To me it just doesn't feel very robust to have the network fall apart when somehow 5 minutes of lag are introduced.

@sipa
Copy link
Member Author

sipa commented Jun 8, 2014

@laanwj What timeout would you find acceptable?

@laanwj
Copy link
Member

laanwj commented Jun 9, 2014

Let's pick uhm... 20 minutes. That slices the current inactivity timeout in four already. We can always lower the value later.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jun 9, 2014

@wumpus TCP itself will fail over long before that if the network itself is delayed. A node thats holding up our connection but not responding for >5 minutes is really broken as far as we should care. It can reconnect when it can start responding again. (and yes, in some cases bitocoind is 'broken', but thats okay... reconnecting is fairly cheap).

The flip side of a long timeout is that we'll leave connections up to busted peers— keeping ourselves offline (or our resources consumed) for up to the timeout— when we could otherwise have useful connections up.

Maybe it would make sense to have a conservative upper limit for now, I certantly wouldn't want to delay pulling this further to debate the timeout. I'm happy with anything 5 minutes or longer but would prefer closer to 5 minutes. Future peer rotation could prioritize punting long idle connection, which would address the problem that you might sit partitioned from the network for many minutes with no working connections.

@sipa
Copy link
Member Author

sipa commented Jun 9, 2014

Increased the timeout to 20 minutes.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jun 9, 2014

@sipa The commit message is now wrong.

... instead of after 30 minutes of no sending, for latency measurement
and keep-alive. Also, disconnect if no reply arrives within 20 minutes,
instead of 90 of inactivity (for peers supporting the 'pong' message).
@sipa
Copy link
Member Author

sipa commented Jun 9, 2014

Fixed.

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/f1920e86063d0ed008c6028d8223b0e21889bf75 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@laanwj
Copy link
Member

laanwj commented Jun 9, 2014

@gmaxwell I just want to be careful here.

@sipa Looks good to me now.

@laanwj laanwj merged commit f1920e8 into bitcoin:master Jun 12, 2014
laanwj added a commit that referenced this pull request Jun 12, 2014
f1920e8 Ping automatically every 2 minutes (unconditionally) (Pieter Wuille)
@cozz
Copy link
Contributor

cozz commented Jun 13, 2014

Just tested on mainnet:

  • start with maxconnections=1
  • initial pingtime under a second
  • wait for block sync to start
  • ping
  • pingtime=140s

The problem is the node is busy sending us blocks, so does not respond to the ping.
I think we should delay the ping when blocks are in flight.

ping

@gmaxwell
Copy link
Contributor

@cozz I don't understand your concern. Yes, pings will take a lot of time in the case, but it will not be disconnected.

@cozz
Copy link
Contributor

cozz commented Jun 13, 2014

Well, I thought if the pingtime is used for peer-selection later, then you probably dont want to ping, if you know that the ping will be much slower as normal, because blocks are in flight.
I dont have any concerns for now, just saying that if we are going to judge a peer by its periodically pingtime, a slow ping does not necessarily mean a slow node. In my case the node is a good node, just busy.

Bushstar pushed a commit to Bushstar/omnicore that referenced this pull request Apr 5, 2019
* Split out GetInstantSendLockHashByTxid from GetInstantSendLockByTxid

* Filter ISLOCK messages based on provided filter
@bitcoin bitcoin deleted a comment May 11, 2019
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.