Skip to content

Conversation

@vishesh
Copy link
Contributor

@vishesh vishesh commented Jun 28, 2019

This patch adds various fixes and improvements to make clients cheaper. The gist is that we try to close unused connections by tweaking our connectionMonitor. Combined with previous PRs which removed failure monitor and tweaked monitorClientInfo to remove continuous OpenDatabaseRequest, we no longer have to keep connection open to ClusterController open until proxies fail. Same applies for any other unused peer.

Still testing this and have to figure out right values for idle timeout knobs, but in real world testing so far, it is behaving as expected.

@vishesh vishesh requested a review from etschannen June 28, 2019 20:59
@vishesh vishesh added this to the 6.2 milestone Jul 1, 2019
@vishesh vishesh force-pushed the task/cheap-clients branch 3 times, most recently from dbfd79b to b629aac Compare July 9, 2019 16:38
vishesh added 11 commits July 9, 2019 14:24
The constructor of FlowReceiver which handled reference counting
peerReferences relied on calling a virtual method from constructor
whose behaviour isn't correct. This patch, bubbles down result of that
virtual method from derived constructor to base contructor.
When simulation ends, all the actors are cancelled, and the
destructions which rely on `globals` may not have access to right
globals (instead of the default simulator process globals). This
patch, calls destroy on each process individually after we context
switch to that process so that the globals acceses in destructor are
its own.

This issue arised when trying to get `Peer::peerReferences` in
NetNotifiedQueue, resulting in decrementing the reference count of
peers in FlowTransport object of '0.0.0.0'.
RequestStream add another count to peerReference, which means as long
as ConnectionMonitor is alive, we'll never get peerReference=0 keeping
unnecessary connections potentially alive.
This patch does two changes to connection monitoring:

1. Connection monitoring at client side will check if the connection
has been stayed idle for some time. If connection is unused for a
while, we close the connection. There is some weirdness involved here
as ping messages are by themselves are connection traffic. We get over
this by making it two-phase process, first being checking idle
reliable traffic, followed by disabling pings and then checking for
idle unreliable traffic.

2. Connection monitoring of clients from server will no longer send
pings to clients. Instead, it keep monitor the received bytes and
close after certain period of inactivity.
This will not initiate request to get get new set of proxy unless we
know for a fact that endpoint has indeed failed, not just because the
connection to Peer was closed as it was sitting idle.
…data

* This will allow client to continue monitoring peer connections while
connection stays open, so that there is no period of "uncertainity"
without previous no-monitoring approach.

* Use multiplier for incoming connection idle timeout

* Update idle connection timeout values and leaked connection timeout in
simulator.
Instead try pinging the client and let that decide whether the client
is alive or not. Ideally, it should always be failed since a well
behaved client would have closed the connection.
Potentially for cases, where it goes up to 1 immediately.
It get us out of the ACTOR, never clearing the systemActors, and let
simulator call exit().

ACTOR static Future<Void> trackLeakedConnection( Sim2Conn* self ) {
wait( g_simulator.onProcess( self->process ) );
// SOMEDAY: Make this value variable? Dependent on buggification status?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to be obsolete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups. Created PR to remove it. #1822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants