Skip to content

Output buffer soft limit can postpone piplined clients#7911

Closed
oranagra wants to merge 7 commits intoredis:unstablefrom
oranagra:obuf-soft-limit-backpressure
Closed

Output buffer soft limit can postpone piplined clients#7911
oranagra wants to merge 7 commits intoredis:unstablefrom
oranagra:obuf-soft-limit-backpressure

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented Oct 14, 2020

Till now, soft-limit was meaningless without it's time limit,
and a time limit of 0 meant the next unix time second.
Now when a client reach the soft limit, we'll stop executing its
pipelined command and continue only when the obuf goes back
below the limit.
This puts a back-pressure on the client, and at the very least lets us
stop the pipeline loop and get a chance to write some of the response
to the socket instead of triggering eviction.

The other change in this PR is that now client obuf limits can take
negative numbers, which are treated as an automatic % of the maxmemory
setting (0, i.e unlimited if maxmemory isn't set).
This is now used by default so that normal clients have soft limit (back-pressure)
of 10% of the maxmemory, with no time limit (won't get disconnected by default)

todo:

  • add more tests, specifically for the percentage feature.
  • add some introspection feature so we'll be able to see when this is the cause of latency.

Till now, softlimit was meaningless withotut it's time limit.
Now when a client reache the soft limit, we'll stop executing its
pipelined command and continue in the next event loop cycle.
This lets us write at leat part of that response buffer to the socket
instead of triggering eviction before the next command (of the same
client, or other clients).

This commit also adds a default value for normal client's soft limit.
@oranagra
Copy link
Member Author

@redis/core-team there's still work to be done here to improve better back pressure (not just delay the client to the next event loop cycle, but possibly more, waiting for its output buffer to shrink).
But i wanted to get your feedback on the new use of the output buffer soft limit.
is that a good idea, or should we find another way to trigger this mechanism?

@ShooterIT
Copy link
Member

Have no right, just interests
I think that is a good idea. We don't increate the rick to close the client only just delay to handler the client at the next event loop cycle, and I think that also may mitigate the risk to close for reaching client output buffer.

@yossigo
Copy link
Collaborator

yossigo commented Oct 15, 2020

@oranagra I think this is a good idea (especially executed fully like you describe), but it may have some undesired side effects.

For example, consider a client that does bursts of GET for large keys followed by DELs. Throttling down query buffer handling may end up delaying those deletes that would actually contribute nothing to the output buffer and may even ease up on memory.

@madolson
Copy link
Contributor

I also agree that it makes sense.

Copy link
Contributor

@yoav-steinberg yoav-steinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oranagra following our talk:
If I understand this correctly, what you're doing here is prioritizing writing response data over processing new pipelined commands while the client is over the soft limit.
This might improve a bit the memory pressure but also might not. It doesn't create any real back pressure on the client and also doesn't explicitly free memory.
I'm not sure it's worth the added complexity of the soft limit feature.

Co-authored-by: yoav-steinberg <[email protected]>
@oranagra oranagra marked this pull request as ready for review April 28, 2021 13:19
@oranagra
Copy link
Member Author

@yoav-steinberg responding to your last comment, let's see if we can add a better back pressure.
currently the back-pressure is to postpone the next command to the next event loop, i think this by itself is in many cases very beneficial.
the only other possibilities i see are:

  1. add an option for a negative soft-limit timeout which would mean that we delay the next command by x milliseconds.
  2. add an option (maybe a soft-limit timeout of -1), which would mean that we don't resume commands until the output buffer is back below the soft-limit threshold.

i'm afraid that the second option would cause some (rare) cases of deadlock, in case the client won't read the responses until it's done sending the requests, so in some way the first option can be a good compromise. a very big negative value would be similar to the second option.
so that if for example you'll set it to -1000, we'll resume either after 1 second, or earlier if the buffers are again below the threshold.

WDYT?

@yoav-steinberg
Copy link
Contributor

1. add an option for a negative soft-limit timeout which would mean that we delay the next command by `x` milliseconds.

Maybe a negative time will mean back-pressure (stop processing new commands) for T time and if the COB remains above the soft limit after that time we disconnect. A positive T will leave the current implementation as is. A value of 0 will effectively be the same as the hard limit.

now obuf limit can be set to negative number which means a percentage of
the maxmemory, this is used by default for normal clients.

softlimit without a timeout postpones execution of commands until the
buffer goes back below the limit, so that with a time limit we'll also
disconnect the client if it didn't go back below the limit for long
enough, and without a time limit, it's just a back pressure.
@oranagra
Copy link
Member Author

I've updated the concept (and the top comment).

  • push-back is now until the obuf goes back to below the limit (not just to the next event loop).
  • limits can be set to a percentage of the maxmemory (default is 10% for normal clients).

Copy link
Contributor

@yoav-steinberg yoav-steinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Still need to add tests for verifying:

  • COB doesn't go (much) over the soft limit during pipeline. (verify back-pressure).
  • Percentage calculation of maxmemory is correct and works.

@oranagra
Copy link
Member Author

oranagra commented May 3, 2021

@redis/core-team please have a look at the top comment and tell me if you like this feature, and if you have any comments about the code or configuration.
then i'll write more tests for it and we can merge it for redis 7.0

@oranagra oranagra added the approval-needed Waiting for core team approval to be merged label May 3, 2021
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach seems sane to me

src/blocked.c Outdated
* event loop. */
int count = listLength(server.unblocked_clients);

while (listLength(server.unblocked_clients) && count--) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: Regarding line 134/139 below, how is it possible that the client could be blocked? We've just unblocked the client, and put it onto a queue for processing. We haven't yet processed a command. I don't think it can become blocked until after calling processPendingCommandAndResetClient. What am I missing?

I see the comment about "the code is conceptually more correct this way", but it doesn't seem that is true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know, but it is very much no purpose. there's an explicit commit doing just that: 2bc1527
need a deeper look in order to see what's the reason, maybe put an assertion and run the test suite.
but anyway, not in the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment even says it's unnecessary but it's "conceptually more correct", someone and I argued about it recently and we decided to just leave it. The fact it was brought up again means we should probably just chuck it out.

oranagra added 2 commits May 5, 2021 15:50
* Separate between the ready unthrottled clients and the blocked.c
  mechanisms
* When a client get throttled, remove it from the event loop, and re-add
  only when it's ready to resume processing.
* move the un-throttling code from writeToClient to it's callers to
  avoid thread safety issues.
}

/* Process clients that were previously throttled. */
void processUnthrottledClients() {
Copy link
Contributor

@madolson madolson May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to understand the modularity issue that Jim brought up. To me these just seem like a type of blocked clients, similar to paused clients. This code is nearly identical processUnblockedClients();, and I don't understand the semantic difference between these two types.

@oranagra oranagra mentioned this pull request Aug 26, 2021
8 tasks
@oranagra oranagra removed the approval-needed Waiting for core team approval to be merged label Oct 17, 2021
@soloestoy
Copy link
Contributor

soloestoy commented Oct 29, 2021

If I understand correctly, this idea is when a client's output buffer exceeds soft limit, delay execute the pipelined command to avoid client output buffer grows.

It makes sense from the memory POV, but in some other scenarios, users are very sensitive to latency, most redis users I think. I'm very worried about that, in our product environment, the latency problem is what users mostly concern about.

Moreover, does this PR implement the monitor method? I didn't look deep into it. If we really want this feature, we need the monitor method to tell users how many time delayed by the soft limit. Because it's not easy to figure it out what impact the latency in complex product environment.

BTW, seems the old value of client-output-buffer-limit can work on the new mechanism, I think we need an explicit way to avoid users misuse it, maybe add a new config.

@oranagra
Copy link
Member Author

@soloestoy this PR is currently a bit stale, and i doubt i'll find time to promote it anytime soon, so it's likely not to make it into 7.0 anyway.
The idea behind this PR was to prevent data loss due to key eviction, but now that we have client-eviction (maxmemory-clients), this concern is somewhat covered (although not by default configuration).

you have a good point about some need for introspection (monitor) to understand what's causing latency. this PR doesn't handle it yet, but i'll add a todo for it at the top so if we'll promote this some day it'll not get forgotten.

You are right that an old value of 0 100 0 would be a valid config in the current code of the PR.
such a value (timeout of 0), is a very odd configuration, i think we can afford to change it's meaning.
i.e. IIRC in the past it would just be disconnecting a client at the next cron (not very useful), and now this just delay the client.

regarding your main concern (of causing latency):
i do think that when we proceed with this we want this to be enabled by default (unlike client-eviction), but since the main purpose of this is to cause back-pressure to prevent key-eviction, maybe we can change it to only kick in when the server approaches the maxmemory limit?
i.e. the users that are concerned about latency are probably more concerned about losing (all) their data, so that's what we should aim for to achieve by default.

any thoughts?

@soloestoy
Copy link
Contributor

soloestoy commented Nov 1, 2021

so it's likely not to make it into 7.0 anyway.

Agree, since we already have client-eviction, we can wait some users' feedback after 7.0 released, and then decide if we need more work on this feature.

@CLAassistant
Copy link

CLAassistant commented Mar 24, 2024

CLA assistant check
All committers have signed the CLA.

@oranagra oranagra closed this by deleting the head repository Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

9 participants