Output buffer soft limit can postpone piplined clients by oranagra · Pull Request #7911 · redis/redis

oranagra · 2020-10-14T11:08:25Z

Till now, soft-limit was meaningless without it's time limit,
and a time limit of 0 meant the next unix time second.
Now when a client reach the soft limit, we'll stop executing its
pipelined command and continue only when the obuf goes back
below the limit.
This puts a back-pressure on the client, and at the very least lets us
stop the pipeline loop and get a chance to write some of the response
to the socket instead of triggering eviction.

The other change in this PR is that now client obuf limits can take
negative numbers, which are treated as an automatic % of the maxmemory
setting (0, i.e unlimited if maxmemory isn't set).
This is now used by default so that normal clients have soft limit (back-pressure)
of 10% of the maxmemory, with no time limit (won't get disconnected by default)

todo:

add more tests, specifically for the percentage feature.
add some introspection feature so we'll be able to see when this is the cause of latency.

Till now, softlimit was meaningless withotut it's time limit. Now when a client reache the soft limit, we'll stop executing its pipelined command and continue in the next event loop cycle. This lets us write at leat part of that response buffer to the socket instead of triggering eviction before the next command (of the same client, or other clients). This commit also adds a default value for normal client's soft limit.

oranagra · 2020-10-14T11:12:11Z

@redis/core-team there's still work to be done here to improve better back pressure (not just delay the client to the next event loop cycle, but possibly more, waiting for its output buffer to shrink).
But i wanted to get your feedback on the new use of the output buffer soft limit.
is that a good idea, or should we find another way to trigger this mechanism?

ShooterIT · 2020-10-14T14:36:34Z

Have no right, just interests
I think that is a good idea. We don't increate the rick to close the client only just delay to handler the client at the next event loop cycle, and I think that also may mitigate the risk to close for reaching client output buffer.

src/networking.c

yossigo · 2020-10-15T07:22:13Z

@oranagra I think this is a good idea (especially executed fully like you describe), but it may have some undesired side effects.

For example, consider a client that does bursts of GET for large keys followed by DELs. Throttling down query buffer handling may end up delaying those deletes that would actually contribute nothing to the output buffer and may even ease up on memory.

madolson · 2020-10-16T02:29:36Z

I also agree that it makes sense.

yoav-steinberg

@oranagra following our talk:
If I understand this correctly, what you're doing here is prioritizing writing response data over processing new pipelined commands while the client is over the soft limit.
This might improve a bit the memory pressure but also might not. It doesn't create any real back pressure on the client and also doesn't explicitly free memory.
I'm not sure it's worth the added complexity of the soft limit feature.

redis.conf

Co-authored-by: yoav-steinberg <[email protected]>

oranagra · 2021-04-28T13:30:26Z

@yoav-steinberg responding to your last comment, let's see if we can add a better back pressure.
currently the back-pressure is to postpone the next command to the next event loop, i think this by itself is in many cases very beneficial.
the only other possibilities i see are:

add an option for a negative soft-limit timeout which would mean that we delay the next command by x milliseconds.
add an option (maybe a soft-limit timeout of -1), which would mean that we don't resume commands until the output buffer is back below the soft-limit threshold.

i'm afraid that the second option would cause some (rare) cases of deadlock, in case the client won't read the responses until it's done sending the requests, so in some way the first option can be a good compromise. a very big negative value would be similar to the second option.
so that if for example you'll set it to -1000, we'll resume either after 1 second, or earlier if the buffers are again below the threshold.

WDYT?

…ackpressure

yoav-steinberg · 2021-04-29T07:30:10Z

1. add an option for a negative soft-limit timeout which would mean that we delay the next command by `x` milliseconds.

Maybe a negative time will mean back-pressure (stop processing new commands) for T time and if the COB remains above the soft limit after that time we disconnect. A positive T will leave the current implementation as is. A value of 0 will effectively be the same as the hard limit.

now obuf limit can be set to negative number which means a percentage of the maxmemory, this is used by default for normal clients. softlimit without a timeout postpones execution of commands until the buffer goes back below the limit, so that with a time limit we'll also disconnect the client if it didn't go back below the limit for long enough, and without a time limit, it's just a back pressure.

oranagra · 2021-04-29T12:00:38Z

I've updated the concept (and the top comment).

push-back is now until the obuf goes back to below the limit (not just to the next event loop).
limits can be set to a percentage of the maxmemory (default is 10% for normal clients).

yoav-steinberg

LGTM
Still need to add tests for verifying:

COB doesn't go (much) over the soft limit during pipeline. (verify back-pressure).
Percentage calculation of maxmemory is correct and works.

src/networking.c

oranagra · 2021-05-03T15:09:25Z

@redis/core-team please have a look at the top comment and tell me if you like this feature, and if you have any comments about the code or configuration.
then i'll write more tests for it and we can merge it for redis 7.0

madolson

Approach seems sane to me

src/networking.c

redis.conf

src/networking.c

JimB123 · 2021-05-05T00:34:38Z

src/blocked.c

+     * event loop. */
+    int count = listLength(server.unblocked_clients);
+
+    while (listLength(server.unblocked_clients) && count--) {


Side note: Regarding line 134/139 below, how is it possible that the client could be blocked? We've just unblocked the client, and put it onto a queue for processing. We haven't yet processed a command. I don't think it can become blocked until after calling processPendingCommandAndResetClient. What am I missing?

I see the comment about "the code is conceptually more correct this way", but it doesn't seem that is true.

don't know, but it is very much no purpose. there's an explicit commit doing just that: 2bc1527
need a deeper look in order to see what's the reason, maybe put an assertion and run the test suite.
but anyway, not in the scope of this PR.

The comment even says it's unnecessary but it's "conceptually more correct", someone and I argued about it recently and we decided to just leave it. The fact it was brought up again means we should probably just chuck it out.

src/networking.c

redis.conf

src/config.c

…ackpressure

* Separate between the ready unthrottled clients and the blocked.c mechanisms * When a client get throttled, remove it from the event loop, and re-add only when it's ready to resume processing. * move the un-throttling code from writeToClient to it's callers to avoid thread safety issues.

madolson · 2021-05-13T21:47:57Z

src/networking.c

 }

+/* Process clients that were previously throttled. */
+void processUnthrottledClients() {


I'm struggling to understand the modularity issue that Jim brought up. To me these just seem like a type of blocked clients, similar to paused clients. This code is nearly identical processUnblockedClients();, and I don't understand the semantic difference between these two types.

soloestoy · 2021-10-29T09:59:43Z

If I understand correctly, this idea is when a client's output buffer exceeds soft limit, delay execute the pipelined command to avoid client output buffer grows.

It makes sense from the memory POV, but in some other scenarios, users are very sensitive to latency, most redis users I think. I'm very worried about that, in our product environment, the latency problem is what users mostly concern about.

Moreover, does this PR implement the monitor method? I didn't look deep into it. If we really want this feature, we need the monitor method to tell users how many time delayed by the soft limit. Because it's not easy to figure it out what impact the latency in complex product environment.

BTW, seems the old value of client-output-buffer-limit can work on the new mechanism, I think we need an explicit way to avoid users misuse it, maybe add a new config.

oranagra · 2021-10-31T11:57:39Z

@soloestoy this PR is currently a bit stale, and i doubt i'll find time to promote it anytime soon, so it's likely not to make it into 7.0 anyway.
The idea behind this PR was to prevent data loss due to key eviction, but now that we have client-eviction (maxmemory-clients), this concern is somewhat covered (although not by default configuration).

you have a good point about some need for introspection (monitor) to understand what's causing latency. this PR doesn't handle it yet, but i'll add a todo for it at the top so if we'll promote this some day it'll not get forgotten.

You are right that an old value of 0 100 0 would be a valid config in the current code of the PR.
such a value (timeout of 0), is a very odd configuration, i think we can afford to change it's meaning.
i.e. IIRC in the past it would just be disconnecting a client at the next cron (not very useful), and now this just delay the client.

regarding your main concern (of causing latency):
i do think that when we proceed with this we want this to be enabled by default (unlike client-eviction), but since the main purpose of this is to cause back-pressure to prevent key-eviction, maybe we can change it to only kick in when the server approaches the maxmemory limit?
i.e. the users that are concerned about latency are probably more concerned about losing (all) their data, so that's what we should aim for to achieve by default.

any thoughts?

soloestoy · 2021-11-01T03:01:46Z

so it's likely not to make it into 7.0 anyway.

Agree, since we already have client-eviction, we can wait some users' feedback after 7.0 released, and then decide if we need more work on this feature.

CLAassistant · 2024-03-24T23:20:49Z

All committers have signed the CLA.

ShooterIT reviewed Oct 14, 2020

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

ShooterIT reviewed Oct 15, 2020

View reviewed changes

src/networking.c Show resolved Hide resolved

oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus labels Oct 28, 2020

oranagra added this to the Next minor backlog milestone Nov 17, 2020

oranagra mentioned this pull request Nov 25, 2020

[BUG] maxmemory-policy eviction with client tracking can cause feedback loop #8069

Open

oranagra mentioned this pull request Mar 1, 2021

[NEW] Client "eviction" - drop clients when their total buffer overhead is over a limit #7676

Closed

yoav-steinberg reviewed Mar 2, 2021

View reviewed changes

redis.conf Outdated Show resolved Hide resolved

Update redis.conf

d5a1f95

Co-authored-by: yoav-steinberg <[email protected]>

oranagra marked this pull request as ready for review April 28, 2021 13:19

oranagra added 2 commits April 28, 2021 17:35

wip

340803f

Merge remote-tracking branch 'origin/unstable' into obuf-soft-limit-b…

e2e996b

…ackpressure

yoav-steinberg suggested changes Apr 29, 2021

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

oranagra added the approval-needed Waiting for core team approval to be merged label May 3, 2021

madolson reviewed May 4, 2021

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

redis.conf Show resolved Hide resolved

JimB123 reviewed May 5, 2021

View reviewed changes

sundb reviewed May 5, 2021

View reviewed changes

redis.conf Outdated Show resolved Hide resolved

sundb reviewed May 5, 2021

View reviewed changes

src/config.c Show resolved Hide resolved

oranagra added 2 commits May 5, 2021 15:50

Merge remote-tracking branch 'origin/unstable' into obuf-soft-limit-b…

ee16995

…ackpressure

madolson reviewed May 13, 2021

View reviewed changes

oranagra mentioned this pull request Aug 26, 2021

Client eviction #8687

Merged

8 tasks

oranagra removed the approval-needed Waiting for core team approval to be merged label Oct 17, 2021

oranagra added the 7.0-not-after-rc1 label Oct 26, 2021

oranagra removed the 7.0-not-after-rc1 label Dec 6, 2021

oranagra mentioned this pull request Jul 27, 2023

Redis evict almost all the keys when reach maxmemory #4496

Open

oranagra closed this by deleting the head repository Aug 7, 2024

Conversation

oranagra commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Oct 14, 2020

Uh oh!

ShooterIT commented Oct 14, 2020

Uh oh!

Uh oh!

Uh oh!

yossigo commented Oct 15, 2020

Uh oh!

madolson commented Oct 16, 2020

Uh oh!

yoav-steinberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oranagra commented Apr 28, 2021

Uh oh!

yoav-steinberg commented Apr 29, 2021

Uh oh!

oranagra commented Apr 29, 2021

Uh oh!

yoav-steinberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oranagra commented May 3, 2021

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JimB123 May 5, 2021

Choose a reason for hiding this comment

Uh oh!

oranagra May 5, 2021

Choose a reason for hiding this comment

Uh oh!

madolson May 6, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson May 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soloestoy commented Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Oct 31, 2021

Uh oh!

soloestoy commented Nov 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Mar 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

oranagra commented Oct 14, 2020 •

edited

Loading

madolson May 13, 2021 •

edited

Loading

soloestoy commented Oct 29, 2021 •

edited

Loading

soloestoy commented Nov 1, 2021 •

edited

Loading

CLAassistant commented Mar 24, 2024 •

edited

Loading