Output buffer soft limit can postpone piplined clients#7911
Output buffer soft limit can postpone piplined clients#7911oranagra wants to merge 7 commits intoredis:unstablefrom oranagra:obuf-soft-limit-backpressure
Conversation
Till now, softlimit was meaningless withotut it's time limit. Now when a client reache the soft limit, we'll stop executing its pipelined command and continue in the next event loop cycle. This lets us write at leat part of that response buffer to the socket instead of triggering eviction before the next command (of the same client, or other clients). This commit also adds a default value for normal client's soft limit.
|
@redis/core-team there's still work to be done here to improve better back pressure (not just delay the client to the next event loop cycle, but possibly more, waiting for its output buffer to shrink). |
|
Have no right, just interests |
|
@oranagra I think this is a good idea (especially executed fully like you describe), but it may have some undesired side effects. For example, consider a client that does bursts of |
|
I also agree that it makes sense. |
yoav-steinberg
left a comment
There was a problem hiding this comment.
@oranagra following our talk:
If I understand this correctly, what you're doing here is prioritizing writing response data over processing new pipelined commands while the client is over the soft limit.
This might improve a bit the memory pressure but also might not. It doesn't create any real back pressure on the client and also doesn't explicitly free memory.
I'm not sure it's worth the added complexity of the soft limit feature.
Co-authored-by: yoav-steinberg <[email protected]>
|
@yoav-steinberg responding to your last comment, let's see if we can add a better back pressure.
i'm afraid that the second option would cause some (rare) cases of deadlock, in case the client won't read the responses until it's done sending the requests, so in some way the first option can be a good compromise. a very big negative value would be similar to the second option. WDYT? |
Maybe a negative time will mean back-pressure (stop processing new commands) for T time and if the COB remains above the soft limit after that time we disconnect. A positive T will leave the current implementation as is. A value of 0 will effectively be the same as the hard limit. |
now obuf limit can be set to negative number which means a percentage of the maxmemory, this is used by default for normal clients. softlimit without a timeout postpones execution of commands until the buffer goes back below the limit, so that with a time limit we'll also disconnect the client if it didn't go back below the limit for long enough, and without a time limit, it's just a back pressure.
|
I've updated the concept (and the top comment).
|
yoav-steinberg
left a comment
There was a problem hiding this comment.
LGTM
Still need to add tests for verifying:
- COB doesn't go (much) over the soft limit during pipeline. (verify back-pressure).
- Percentage calculation of
maxmemoryis correct and works.
|
@redis/core-team please have a look at the top comment and tell me if you like this feature, and if you have any comments about the code or configuration. |
madolson
left a comment
There was a problem hiding this comment.
Approach seems sane to me
src/blocked.c
Outdated
| * event loop. */ | ||
| int count = listLength(server.unblocked_clients); | ||
|
|
||
| while (listLength(server.unblocked_clients) && count--) { |
There was a problem hiding this comment.
Side note: Regarding line 134/139 below, how is it possible that the client could be blocked? We've just unblocked the client, and put it onto a queue for processing. We haven't yet processed a command. I don't think it can become blocked until after calling processPendingCommandAndResetClient. What am I missing?
I see the comment about "the code is conceptually more correct this way", but it doesn't seem that is true.
There was a problem hiding this comment.
don't know, but it is very much no purpose. there's an explicit commit doing just that: 2bc1527
need a deeper look in order to see what's the reason, maybe put an assertion and run the test suite.
but anyway, not in the scope of this PR.
There was a problem hiding this comment.
The comment even says it's unnecessary but it's "conceptually more correct", someone and I argued about it recently and we decided to just leave it. The fact it was brought up again means we should probably just chuck it out.
* Separate between the ready unthrottled clients and the blocked.c mechanisms * When a client get throttled, remove it from the event loop, and re-add only when it's ready to resume processing. * move the un-throttling code from writeToClient to it's callers to avoid thread safety issues.
| } | ||
|
|
||
| /* Process clients that were previously throttled. */ | ||
| void processUnthrottledClients() { |
There was a problem hiding this comment.
I'm struggling to understand the modularity issue that Jim brought up. To me these just seem like a type of blocked clients, similar to paused clients. This code is nearly identical processUnblockedClients();, and I don't understand the semantic difference between these two types.
|
If I understand correctly, this idea is when a client's output buffer exceeds soft limit, delay execute the pipelined command to avoid client output buffer grows. It makes sense from the memory POV, but in some other scenarios, users are very sensitive to latency, most redis users I think. I'm very worried about that, in our product environment, the latency problem is what users mostly concern about. Moreover, does this PR implement the monitor method? I didn't look deep into it. If we really want this feature, we need the monitor method to tell users how many time delayed by the soft limit. Because it's not easy to figure it out what impact the latency in complex product environment. BTW, seems the old value of |
|
@soloestoy this PR is currently a bit stale, and i doubt i'll find time to promote it anytime soon, so it's likely not to make it into 7.0 anyway. you have a good point about some need for introspection (monitor) to understand what's causing latency. this PR doesn't handle it yet, but i'll add a todo for it at the top so if we'll promote this some day it'll not get forgotten. You are right that an old value of regarding your main concern (of causing latency): any thoughts? |
Agree, since we already have client-eviction, we can wait some users' feedback after 7.0 released, and then decide if we need more work on this feature. |
Till now, soft-limit was meaningless without it's time limit,
and a time limit of 0 meant the next unix time second.
Now when a client reach the soft limit, we'll stop executing its
pipelined command and continue only when the obuf goes back
below the limit.
This puts a back-pressure on the client, and at the very least lets us
stop the pipeline loop and get a chance to write some of the response
to the socket instead of triggering eviction.
The other change in this PR is that now client obuf limits can take
negative numbers, which are treated as an automatic % of the maxmemory
setting (0, i.e unlimited if maxmemory isn't set).
This is now used by default so that normal clients have soft limit (back-pressure)
of 10% of the maxmemory, with no time limit (won't get disconnected by default)
todo: