-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
To establish a connection, there needs to be at least 2 cycles of the event loop, the first to accept the connection and the second to read the first query from the client. If Redis is running very slowly, the time between these two interactions might exceed the clients timeout causing it to tear down the connection and try to connect again. This can cause a significant amount of churn and in some cases cause a connection storm which can dramatically impact the performance of the Redis process.
For TLS cases, this is even worse, since the TLS connection need multiple round trips to establish the connection (of which these round trips are CPU expensive because of the public/private crypto needed). We should prefer to establish some connections, and the let the remaining ones retry.
My suggestion is we "limit" the number of inflight connections at a given time to some value, lets say 1000 (since that is the current accept count), to bound the impact of this issue. This implicitly prioritizes some clients so that they are established more quickly and can start serving traffic. @yossigo would appreciate your thoughts on this.