[FIX] Fix retry config option#827
Conversation
There was a problem hiding this comment.
5 seconds seems very low for anything to change in circustances. I see the other project modified it to 30 seconds and that might work better with geth which limits incoming connection attempts from an IP to 30 seconds as well.
Another option would be to set this to 30 seconds and connect-max-retries to 0, so by default it won't waste time since we know the most probable reason for not connecting is TooManyPeers and just want to move on quickly to another node.
|
Okay I see that if the remote peer actually disconnected then this retry doesn't kick in. So it only applies if we failed to reach the other node completely? |
There was a problem hiding this comment.
Based on RLPxConnectionHandler it seems like the retry logic only kicks in if connection or sending a message fails, not when a disconnect message is received.
I think 5 seconds may be too low; perhaps it should be differentiated between initial connection failing vs a write failing to an already established connection, so you can quickly dismiss false leads and try to give more time to recover from temporary glitches with nodes that worked before but are perhaps restarting. But if this has a positive impact then I don't see much harm in it.
|
This reconnect kicks in when rlpx connection failed either:
If we received disconnect then we then we just finish up. One case it maybe useful to have the reconnect set to Thats why i thought
Overall as i mention whole mechanism should be improved a little, but taking into account upcoming release this change semed as low hanging fruit to speed up peer connection establishment. |
Description
Our
PeerActoruses this option to retry outgoing connection when initial try fails. Setting it to1 minutemeans that this actor sits and do nothing for 1 minute, while takingPeerManagerActoroutgoing connection slot. It can slow down acquiring new connections. In our other project we also tweaked that option.In longer perspective we should probably delete this mechanism, and make
PeerManagerActorone thing responsible for initiating connections and re-connection.