Skip to content

Manage number of new connections per cycle#12178

Merged
madolson merged 9 commits intoredis:unstablefrom
AshMosh:ReduceNumberOfConnectionsAccepted
Jan 2, 2024
Merged

Manage number of new connections per cycle#12178
madolson merged 9 commits intoredis:unstablefrom
AshMosh:ReduceNumberOfConnectionsAccepted

Conversation

@AshMosh
Copy link
Contributor

@AshMosh AshMosh commented May 15, 2023

There are situations (especially in TLS) in which the engine gets too occupied managing a large number of new connections. Existing connections may time-out while the server is processing the new connections initial TLS handshakes, which may cause cause new connections to be established, perpetuating the problem. To better manage the tradeoff between new connection rate and other workloads, this change adds a new config to manage maximum number of new connections per event loop cycle, instead of using a predetermined number (currently 1000).

This change introduces two new configurations, max-new-connections-per-cycle and max-new-tls-connections-per-cycle. The default value of the tcp connections is 10 per cycle and the default value of tls connections per cycle is 1.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally discussed here: #10524. The main issue we are trying to prevent is TLS connection storms, which aren't being protected by default here. My preference would have just been to update the tlsAcceptHandler to only do 1 per cycle and leave TCP as is.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally discussed here: #10524. The main issue we are trying to prevent is TLS connection storms, which aren't being protected by default here. My preference would have just been to update the tlsAcceptHandler to only do 1 per cycle and leave TCP as is. I would like feedback from @yossigo though.

@madolson madolson linked an issue May 15, 2023 that may be closed by this pull request
@JimB123
Copy link
Contributor

JimB123 commented May 16, 2023

I believe the focus here should not be solely on TLS. TLS is quite obvious, but it is definitely not the only consideration.

Please read the discussion here first. All of my discussion/argument is still completely on-point: #10524

This boils down to a prioritization issue. When the event loop is saturated (CPU is 100% at the micro level), a prioritization is required. Do we delay new connections? or do we delay existing work? It's a binary decision. We must bias towards processing existing work - otherwise we risk timeout/retries. By increasing the number of connections accepted greater than 1, we are choosing to bias towards accepting connections - which is the wrong decision.

Side note: it's also valid to question if there are cases where the accept count should be reduced to something LESS THAN 1. This would bias us even more to existing work. In practice I have found this unnecessary and IMO it would add unnecessary complexity.

The issue with accept count is quite obvious in a TLS storm. Each accept implies the need for expensive negotiation messages. When we accept too much, the negotiation can't complete without timeout. Retries happen. Eventually we get to a point where NONE of the connections complete negotiation, and the engine is saturated with connection attempts. By prioritizing the negotiation over new accepts, SOME of the connections will complete without timeout. Over a short period of time, a TLS storm will dissipate.

REAL WORLD SCENARIOS that I have witnessed:

  1. We've been talking about TLS storms. I've personally witnessed TLS storms occurring when the connections-per-cycle has been adjusted down to 16. Once connections-per-cycle is reduced to 1, the problem is effectively eliminated. You could try to "tune" this to a number between 1 and 16, however IMO there's not much value to this. A higher setting does not result in significantly different performance, and just increases risk of storm in unexpected conditions.
  2. KEYS. We know that the KEYS command is dangerous. I witnessed a system where a large number of clients were attaching and immediately issuing a KEYS command. This command is expensive (like TLS negotiation) and once several clients were simultaneously trying to execute KEYS, they all started timing out. The event loop lengthened - to MINUTES. Redis was executing these expensive commands, only to find that the client had timed out before receiving the response. The clients immediately retried resulting in the same type of "storm" situation. By adjusting connections-per-cycle to 1, it allowed the KEYS commands that were in-flight to complete. This immediately made the event loop more responsive. Rather than 100% timeout/failure, the system was now responsive and successfully processing some of the commands.
  3. CLUSTER INFO. Some clients issue the CLUSTER INFO command immediately after connecting to collect topology information. Especially on large clusters, this can be an expensive command. I have witnessed a system where the event loop was saturated with CLUSTER INFO. The resultant behavior was identical to the TLS storm or the KEYS scenario above. The event loop slowed. 100% of the CLUSTER INFO commands timed out. Once connections-per-cycle was adjusted to 1, the system immediately recovered. The event loop became responsive. Connections were established.

In summary, if the event loop isn't saturated, this change makes no difference. The event loop spins quickly and connections are established. It's only in the case that the event loop is saturated, that connections-per-cycle has any impact. The impact is immediate and beneficial.

P.S. If it were up to me, I'd remove the configuration and inner connection loop completely. Over the last 2 years, I've seen no evidence that using a value greater than 1 is ever warranted. But I agree that a tunable parameter is the less controversial option.

@madolson
Copy link
Contributor

But I agree that a tunable parameter is the less controversial option.

As a note, this is often a more controversial option here compared to our internal AWS fork since it's a new config we have to expose, document, and maintain.

@yossigo
Copy link
Collaborator

yossigo commented May 17, 2023

In summary, if the event loop isn't saturated, this change makes no difference. The event loop spins quickly and connections are established. It's only in the case that the event loop is saturated, that connections-per-cycle has any impact. The impact is immediate and beneficial.

@JimB123 Can you confirm this is also the case when there's very large spike of inbound connections? I'm concerned the extra syscalls will reduce accept rate and potentially have an impact on the TCP backlog.

@JimB123
Copy link
Contributor

JimB123 commented May 23, 2023

@JimB123 Can you confirm this is also the case when there's very large spike of inbound connections? I'm concerned the extra syscalls will reduce accept rate and potentially have an impact on the TCP backlog.

@yossigo You are correct that there are more syscalls. If accepting connections was the only measure of success, you would be right to be concerned. However, the measure of success should be accepting connections AND doing work. When originally working on this, I did some rough measurements of pure event loop speed and found the event loop to spin much faster than any reasonable connection rate. In the TLS case, the extra overhead is very small in comparison to the TLS negotiation.

If I was to TRY to show a degradation in overall performance/throughput, I'd set up a scenario as follows:

  • No TLS
  • No AUTH
  • Each connection does a single simple GET command and then disconnects

I think in this case, there might be some measurable performance delta. However, I don't think we should optimize for such an anti-pattern.

@yossigo
Copy link
Collaborator

yossigo commented May 24, 2023

@JimB123 I wasn't suggesting to optimize for it, but use it as a test case to understand the worst case impact. Anyway, based on running memtier_benchmark --reconnect-interval 1 -n 5000 -x 3 it seems like we can expect around 2% worst case impact:

unstable:

Totals      13162.03         0.00     11964.29        12.56837        11.26300        29.82300        39.42300       558.46 

This PR:

Totals      12871.28         0.00     11700.00        12.85873        11.58300        30.20700        37.63100       546.12 

@JimB123
Copy link
Contributor

JimB123 commented May 24, 2023

Very nice @yossigo. 2%, worst case, is better than I was expecting. Of course, in practice, we're very unlikely to see any throughput impact. The connection (& disconnection) rate would have to be extremely high, and CPU would have to be saturated.

@madolson
Copy link
Contributor

@yossigo @JimB123 So I'm still going to restate that my preference is we should just update these values to 1 and not have a config.

@yossigo
Copy link
Collaborator

yossigo commented May 24, 2023

@madolson I'd play it safe and leave it configurable for now.

@madolson
Copy link
Contributor

@yossigo in that case are we okay with a default of 1? Otherwise people will need to be impacted before they know to fix it.

@yossigo
Copy link
Collaborator

yossigo commented May 28, 2023

@madolson Yes.

@madolson madolson added state:major-decision Requires core team consensus approval-needed Waiting for core team approval to be merged labels May 30, 2023
@AshMosh AshMosh force-pushed the ReduceNumberOfConnectionsAccepted branch 2 times, most recently from 087b5cb to d7d748d Compare June 1, 2023 07:11
@AshMosh AshMosh requested a review from JimB123 June 1, 2023 16:19
@AshMosh AshMosh requested a review from madolson June 4, 2023 06:02
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@redis/core-team A small change to help prevent connection storms in certain circumstances.

@oranagra
Copy link
Member

i have one more concern.
during loading, we process events only once in a while (e.g. 2mb of rdb data), when that happens, we spin the event loop 4 times:

 * It calls the event loop in order to process a few events. Specifically we
 * try to call the event loop 4 times as long as we receive acknowledge that
 * some event was processed, in order to go forward with the accept, read,
 * write, close sequence needed to serve a client.

this is necessary in order to still be responsive for PING and INFO.
with this change, it means we'll only be responsive for one (or actually two) newly connected clients.
i.e. if we have more than one script processing redis-cli ping watching us, some can hang / fail and consider redis is dead (maybe some kind of WD that will try to restart redis).

this isn't normally a problem since 2mb of rdb file are probably quite fast, and we'll get many of these per second, but there are cases where the rate of such calls to processEventsWhileBlocked drops.
let's imagine for a second that it drops to one call per 10 seconds.
so maybe we want to limit the rate of connections per second, rather than per event loop cycle?

@soloestoy
Copy link
Contributor

@JimB123 I wasn't suggesting to optimize for it, but use it as a test case to understand the worst case impact. Anyway, based on running memtier_benchmark --reconnect-interval 1 -n 5000 -x 3 it seems like we can expect around 2% worst case impact:

unstable:

Totals      13162.03         0.00     11964.29        12.56837        11.26300        29.82300        39.42300       558.46 

This PR:

Totals      12871.28         0.00     11700.00        12.85873        11.58300        30.20700        37.63100       546.12 

I think we need re-benchmark it, 13162.03 qps in unstable is too slow for Redis

@oranagra
Copy link
Member

we discussed this in a core-team meeting.

we dismissed the concern of processEventsWhileBlocked, considering it an edge case that would only cause issues when several types of abuse are combined (a monitoring software that re-connects each time, several of them, and a very slow rate of processEventsWhileBlocked calls).

we conceptually approved this pending on a re-benchmark (on a better hardware?) of the impact on clients that re-connect for each command they send.

@madolson
Copy link
Contributor

@AshMosh Can you help with the performance testing above? Ideally we want to test a more realworld scenarios than on the laptop which Yossi did.

@murphyjacob4
Copy link

With max-new-connections-per-cycle=5, reconnect-interval=10:

QPS P50 Latency P95 Latency P99 Latency P99.9 Latency P99.99 Latency P99.999 Latency
Client with no reconnections 53874 1.14300 1.70300 2.25500 3.27900 9.02300 36.60700
Client with heavy reconnections 49322 1.15900 1.96700 2.55900 4.44700 9.08700 36.86300

With max-new-connections-per-cycle=10, reconnect-interval=10:

QPS P50 Latency P95 Latency P99 Latency P99.9 Latency P99.99 Latency P99.999 Latency
Client with no reconnections 55293 1.11100 1.69500 2.20700 3.29500 9.59900 36.86300
Client with heavy reconnections 50382 1.12700 1.90300 2.41500 3.85500 9.79100 36.86300

I also reran with reconnect-interval = 1 for a more extreme connection storm simulation:

With max-new-connections-per-cycle=5, reconnect-interval=1:

QPS P50 Latency P95 Latency P99 Latency P99.9 Latency P99.99 Latency P99.999 Latency
Client with no reconnections 72802 0.83900 1.29500 1.59900 3.13500 16.51100 53.50300
Client with heavy reconnections 11564 4.73500 7.16700 8.95900 15.80700 40.44700 55.80700

With max-new-connections-per-cycle=10, reconnect-interval=1:

QPS P50 Latency P95 Latency P99 Latency P99.9 Latency P99.99 Latency P99.999 Latency
Client with no reconnections 65281 1.02300 1.42300 2.11100 3.75900 22.52700 53.50300
Client with heavy reconnections 14730 2.97500 5.59900 11.90300 18.81500 39.16700 70.65500

With max-new-connections-per-cycle=1000, reconnect-interval=1:

QPS P50 Latency P95 Latency P99 Latency P99.9 Latency P99.99 Latency P99.999 Latency
Client with no reconnections 57948 0.85500 2.35100 3.37500 5.05500 22.91100 47.10300
Client with heavy reconnections 15155 2.35100 5.47100 13.31100 19.83900 37.88700 69.11900

@madolson
Copy link
Contributor

So I am a bit inclined to update my position and say we should update the PR and use a default value of 10. It seems to still provide most of the protective benefits against TLS based connection storms while still providing minimal detriment to the workloads that open a lot of connections. @soloestoy Thoughts?

@JimB123
Copy link
Contributor

JimB123 commented Nov 16, 2023

Anything larger than 1 risks timeouts on your real workload and death loop.
We WANT connections to timeout/retry under these conditions. We don't want regular commands to start failing.

Set up a scenario with a high connection rate, using TLS, and a workload with small timeouts for regular commands. Make sure you drive CPU to 100%.

@madolson
Copy link
Contributor

madolson commented Dec 12, 2023

Decision for now is to split into two separate configs:

max-new-tcp-connections-per-cycle
max-new-tls-connections-per-cycle

With two different default values. Default of 1 for TLS and 10 for TCP.

The motivation was we wanted to be as defensive as possible for not introducing performance regression for TCP based workloads that do single client + single command workloads (like with PHP).

@madolson madolson force-pushed the ReduceNumberOfConnectionsAccepted branch from 63b85da to 4b50a2f Compare December 13, 2023 00:51
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@redis/core-team Any other concerns? I've updated the top comment as well.

@soloestoy
Copy link
Contributor

I have reconsidered, and if we are concerned about the potential time consumption of accept, perhaps we can set a time threshold for accept, such as 1000 microseconds. This way, we don't need to differentiate between TLS and TCP or worry about the number of accepted connections. Using time as a limit can be a better approach.

@soloestoy
Copy link
Contributor

#12178 (comment) @murphyjacob4 the results are based on this PR (limiting to 1 or 10) or unstable?

@soloestoy
Copy link
Contributor

Additionally, Redis initially accept only one connection, which was later increased to 1000 starting from version 2.8, see more details in f3d3c60. Does anyone have knowledge about this history?

As mentioned in the commit description, this change was made to optimize scenarios where a connect-disconnect pattern is frequently used. In this scenario, accepting a new connection should be also considered a high-priority event, even if the CPU usage is at 100%. Therefore, if we intend to reduce the default value now, we need to exercise caution and carefully consider the implications.

@oranagra
Copy link
Member

personally, i don't like the use of plaintext. none of the other configs we have that are different for TLS and plain TCP, have a plaintext marking (no to mention that it's not necessarily "text" that is transmitted, so i don't like that term at all for plain-TCP).
i'd go with max-new-connections-per-cycle and max-new-tls-connections-per-cycle-tls.

@madolson
Copy link
Contributor

madolson commented Dec 13, 2023

I have reconsidered, and if we are concerned about the potential time consumption of accept, perhaps we can set a time threshold for accept, such as 1000 microseconds. This way, we don't need to differentiate between TLS and TCP or worry about the number of accepted connections. Using time as a limit can be a better approach.

The problem isn't in the accept per-say it's that we're signing up for future work with the TLS negotiation. I think a better alternative, which I've mentioned before, is to limit the number of TLS connections in the handshaking state. Which bottlenecks incoming TCP connections much less since they move to accept immediately.

@oranagra I didn't like tcp either, since it also applies to unix, so I'm fine just differentiating it on tls. Your suggestion has tls twice, I still prefer the original name max-new-tls-connections-per-cycle as opposed to adding tls as a suffix though.

@soloestoy soloestoy added the release-notes indication that this issue needs to be mentioned in the release notes label Jan 2, 2024
@madolson madolson merged commit c3f8b54 into redis:unstable Jan 2, 2024
@madolson madolson removed the approval-needed Waiting for core team approval to be merged label Jan 2, 2024
roggervalf pushed a commit to roggervalf/redis that referenced this pull request Feb 11, 2024
There are situations (especially in TLS) in which the engine gets too occupied managing a large number of new connections. Existing connections may time-out while the server is processing the new connections initial TLS handshakes, which may cause cause new connections to be established, perpetuating the problem. To better manage the tradeoff between new connection rate and other workloads, this change adds a new config to manage maximum number of new connections per event loop cycle, instead of using a predetermined number (currently 1000).

This change introduces two new configurations, max-new-connections-per-cycle and max-new-tls-connections-per-cycle. The default value of the tcp connections is 10 per cycle and the default value of tls connections per cycle is 1.
---------

Co-authored-by: Madelyn Olson <[email protected]>
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
There are situations (especially in TLS) in which the engine gets too occupied managing a large number of new connections. Existing connections may time-out while the server is processing the new connections initial TLS handshakes, which may cause cause new connections to be established, perpetuating the problem. To better manage the tradeoff between new connection rate and other workloads, this change adds a new config to manage maximum number of new connections per event loop cycle, instead of using a predetermined number (currently 1000).

This change introduces two new configurations, max-new-connections-per-cycle and max-new-tls-connections-per-cycle. The default value of the tcp connections is 10 per cycle and the default value of tls connections per cycle is 1.
---------

Co-authored-by: Madelyn Olson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Better limit the rate of establishing connections at a given time

9 participants