Make clients cheaper #1768

vishesh · 2019-06-28T20:59:50Z

This patch adds various fixes and improvements to make clients cheaper. The gist is that we try to close unused connections by tweaking our connectionMonitor. Combined with previous PRs which removed failure monitor and tweaked monitorClientInfo to remove continuous OpenDatabaseRequest, we no longer have to keep connection open to ClusterController open until proxies fail. Same applies for any other unused peer.

Still testing this and have to figure out right values for idle timeout knobs, but in real world testing so far, it is behaving as expected.

fdbrpc/FlowTransport.actor.cpp

fdbserver/SimulatedCluster.actor.cpp

The constructor of FlowReceiver which handled reference counting peerReferences relied on calling a virtual method from constructor whose behaviour isn't correct. This patch, bubbles down result of that virtual method from derived constructor to base contructor.

When simulation ends, all the actors are cancelled, and the destructions which rely on `globals` may not have access to right globals (instead of the default simulator process globals). This patch, calls destroy on each process individually after we context switch to that process so that the globals acceses in destructor are its own. This issue arised when trying to get `Peer::peerReferences` in NetNotifiedQueue, resulting in decrementing the reference count of peers in FlowTransport object of '0.0.0.0'.

RequestStream add another count to peerReference, which means as long as ConnectionMonitor is alive, we'll never get peerReference=0 keeping unnecessary connections potentially alive.

This patch does two changes to connection monitoring: 1. Connection monitoring at client side will check if the connection has been stayed idle for some time. If connection is unused for a while, we close the connection. There is some weirdness involved here as ping messages are by themselves are connection traffic. We get over this by making it two-phase process, first being checking idle reliable traffic, followed by disabling pings and then checking for idle unreliable traffic. 2. Connection monitoring of clients from server will no longer send pings to clients. Instead, it keep monitor the received bytes and close after certain period of inactivity.

This will not initiate request to get get new set of proxy unless we know for a fact that endpoint has indeed failed, not just because the connection to Peer was closed as it was sitting idle.

…data * This will allow client to continue monitoring peer connections while connection stays open, so that there is no period of "uncertainity" without previous no-monitoring approach. * Use multiplier for incoming connection idle timeout * Update idle connection timeout values and leaked connection timeout in simulator.

Instead try pinging the client and let that decide whether the client is alive or not. Ideally, it should always be failed since a well behaved client would have closed the connection.

Potentially for cases, where it goes up to 1 immediately.

It get us out of the ACTOR, never clearing the systemActors, and let simulator call exit().

jzhou77 · 2019-07-09T23:31:09Z

fdbrpc/sim2.actor.cpp


 	ACTOR static Future<Void> trackLeakedConnection( Sim2Conn* self ) {
 		wait( g_simulator.onProcess( self->process ) );
 		// SOMEDAY: Make this value variable? Dependent on buggification status?


This comment seems to be obsolete.

Oups. Created PR to remove it. #1822

vishesh requested a review from etschannen June 28, 2019 20:59

vishesh assigned etschannen Jun 28, 2019

vishesh added this to the 6.2 milestone Jul 1, 2019

vishesh force-pushed the task/cheap-clients branch 3 times, most recently from dbfd79b to b629aac Compare July 9, 2019 16:38

etschannen reviewed Jul 9, 2019

View reviewed changes

fdbrpc/FlowTransport.actor.cpp Outdated Show resolved Hide resolved

fdbrpc/FlowTransport.actor.cpp Outdated Show resolved Hide resolved

fdbserver/SimulatedCluster.actor.cpp Outdated Show resolved Hide resolved

vishesh added 11 commits July 9, 2019 14:24

Trace: Add support to print pointers

705059d

fdbrpc: Don't use RequestStream for pings in ConnectionMonitor

7647d3e

RequestStream add another count to peerReference, which means as long as ConnectionMonitor is alive, we'll never get peerReference=0 keeping unnecessary connections potentially alive.

monitorClientInfo: Wait for master proxy endpoint failures than triggers

ae6c3e0

This will not initiate request to get get new set of proxy unless we know for a fact that endpoint has indeed failed, not just because the connection to Peer was closed as it was sitting idle.

fdbrpc: Don't drop idle connections from server

2267826

Instead try pinging the client and let that decide whether the client is alive or not. Ideally, it should always be failed since a well behaved client would have closed the connection.

fdbrpc: ConnectionMonitor should close unreferenced after delay

9833439

Potentially for cases, where it goes up to 1 immediately.

simulator: Just do a wait() in setupAndRun to avoid destruction

2f29b2c

It get us out of the ACTOR, never clearing the systemActors, and let simulator call exit().

fdbrpc: Move setStatus line in addPeerReference

4b8eb27

vishesh force-pushed the task/cheap-clients branch from b629aac to 4b8eb27 Compare July 9, 2019 22:01

etschannen merged commit 5851ad0 into apple:master Jul 9, 2019

This was referenced Jul 9, 2019

Reduce the number of connections opened by clients #1012

Closed

Clients should receive failure monitoring updates less frequently #376

Closed

jzhou77 reviewed Jul 9, 2019

View reviewed changes

vishesh mentioned this pull request Jul 10, 2019

sim2: Remove obsolete comment #1822

Merged

vishesh mentioned this pull request Jan 8, 2020

FailureMonitoring: Server processes no longer need to talk to ClusterController #2518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make clients cheaper #1768

Make clients cheaper #1768

Uh oh!

vishesh commented Jun 28, 2019 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jzhou77 Jul 9, 2019

Uh oh!

vishesh Jul 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make clients cheaper #1768

Make clients cheaper #1768

Uh oh!

Conversation

vishesh commented Jun 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jzhou77 Jul 9, 2019

Choose a reason for hiding this comment

Uh oh!

vishesh Jul 10, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vishesh commented Jun 28, 2019 •

edited

Loading