Skip to content

resize query buffer more accurately#5013

Merged
oranagra merged 2 commits intoredis:unstablefrom
soloestoy:resize-query-buffer
Jul 5, 2021
Merged

resize query buffer more accurately#5013
oranagra merged 2 commits intoredis:unstablefrom
soloestoy:resize-query-buffer

Conversation

@soloestoy
Copy link
Contributor

@soloestoy soloestoy commented Jun 12, 2018

After commit cec404f there are still some problem we should fix I think:

  1. c->querybuf_peak has not been updated correctly in readQueryFromClient.

    void readQueryFromClient(aeEventLoop *el, int fd, void *privdata, int mask) {
        qblen = sdslen(c->querybuf);
        if (c->querybuf_peak < qblen) c->querybuf_peak = qblen;
        c->querybuf = sdsMakeRoomFor(c->querybuf, readlen);
        nread = read(fd, c->querybuf+qblen, readlen);
        ...
        sdsIncrLen(c->querybuf,nread);

    As we can see, we update c->querybuf_peak before read and sdsIncrLen, it means that qblen here is the last c->querybuf length. But the length of c->querybuf after processInputBuffer is always 0, unless we didn't read the whole request at one time.

    So, we should update c->querybuf_peak after sdsIncrLen I think.

  2. We should use sdsalloc instead of sdsAllocSize.

    Because we wanna get c->querybuf alloc space except for header, the sdsAllocSize value is always 32 * 1024 + sdsHdrSize + 1 grate than 32 * 1024.

  3. Check if query buffer is > BIG_ARG and too big for latest peak only when c->querybuf_peak is not 0.
    Function clientsCronResizeQueryBuffer reset c->querybuf_peak to be 0 no matter resize the query buffer or not.
    If we don't get any requests after that, next time we will trigger the condition definitely.

  4. Query buffer shrinking improvements
    when tracking the peak, don't reset the peak to 0, reset it to the maximum of the current used, and the planned to be used by the current arg.
    when shrining, split the two separate conditions. the idle time shrinking will remove all free space. but the peak based shrinking will keep room for the current arg.
    when we resize due to a peak (rahter than idle time), don't trim all unused space, let the qbuf keep a size that's sufficient for the currently process bulklen, and the current peak.

@sundb
Copy link
Collaborator

sundb commented Jun 17, 2021

@oranagra @yoav-steinberg
Due to the change in #9003, a long-standing bug was raised under valgrind.
This bug can cause the master-slave sync to take a very long time, causing the pendingquerybuf.tcl test to fail.
This problem does not only occur in master-slave sync, it is triggered when the big arg is greater than 32k.
step:

dd if=/dev/zero of=bigfile bs=1M count=32
./src/redis-cli -x hset a a < bigfile
  1. Make room for querybuf in processMultibulkBuffer, now the alloc of querybuf will be more than 32k.
    c->querybuf = sdsMakeRoomForNonGreedy(c->querybuf, readlen);
  2. If this happens to trigger the clientsCronResizeQueryBuffer, querybuf will be resized to 0.
  3. Finally, in readQueryFromClient, we expand the querybuf non-greedily, from 0 to 32k.
    Old code, make room for querybuf is greedy, so it only need 11 times to expand to 32M(16k*(2^11)), but now we need 2048(32*1024/16) times to reach it, due to the slow allocation under valgrind tha exposed the problem.

@oranagra
Copy link
Member

regarding #9003, shouldn't this line be:

-        if (remaining > 0 && (size_t)remaining < readlen) readlen = remaining;
+        if (remaining > 0 && (size_t)remaining > readlen) readlen = remaining;

or even just:

-        if (remaining > 0 && (size_t)remaining < readlen) readlen = remaining;
+        readlen = remaining;

i.e. this readlen is what's triggering the (non greedy) allocation, we do want to make room for the entire big arg.

regardless, i have a new realization about the shrinking.
as i said, we have to completely separate triggers, idle, and peak.
the peak comes to shrink big query buffers that are a left over from previous commands, it should NOT shrink back a query buffer that's needed for the argument we're currently reading (the current c->bulklen).

then, for the case of an idle client that did initiate sending a big arg and then became idle, the idle time (2 seconds), can shrink the query buffer despite of c->bulklen.

i suppose that fixing either of these two problems will solve the failing pendingquerybuf.tcl test, but we wanna do both.

@yoav-steinberg
Copy link
Contributor

the peak comes to shrink big query buffers that are a left over from previous commands, it should NOT shrink back a query buffer that's needed for the argument we're currently reading (the current c->bulklen).

then, for the case of an idle client that did initiate sending a big arg and then became idle, the idle time (2 seconds), can shrink the query buffer despite of c->bulklen.

If I follow this logic through, then I'd say that peak based shrinking should only happen in case the query buf is empty sdslen(c->querybuf) == 0. In all other cases we're in the middle of a command and should never shrink (regardless of c->bulklen). If the client is idle it'll shrink regardless of sdslen(c->querybuf).

@sundb
Copy link
Collaborator

sundb commented Jun 17, 2021

@yoav-steinberg There is still a problem with sdslen(c->querybuf), we expand querybuf to the size of big arg in processMultibulkBuffer, at this point sdslen(c->querybuf) maybe 0, use c->bulklen would be safer.

@yoav-steinberg
Copy link
Contributor

yoav-steinberg commented Jun 17, 2021

@yoav-steinberg There is still a problem with sdslen(c->querybuf), we expand querybuf to the size of big arg in processMultibulkBuffer, at this point sdslen(c->querybuf) maybe 0, use c->bulklen would be safer.

You're right. How about sdslen(c->querybuf) == 0 && c->bulklen == -1? This way we won't shrink the qb in case of a partial inline protocol.

@sundb
Copy link
Collaborator

sundb commented Jun 17, 2021

@yoav-steinberg querybuf is also shrinked in case of inline protocol, because the default value of c->bulklen is -1.

Copy link
Contributor

@yoav-steinberg yoav-steinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safer casting..

oranagra pushed a commit that referenced this pull request Jun 17, 2021
…Followup for #9003) (#9100)

Due to the change in #9003, a long-standing bug was raised under `valgrind`.
This bug can cause the master-slave sync to take a very long time, causing the `pendingquerybuf.tcl` test to fail.
This problem does not only occur in master-slave sync, it is triggered when the big arg is greater than 32k.
step:
```sh
dd if=/dev/zero of=bigfile bs=1M count=32
./src/redis-cli -x hset a a < bigfile
```

1) Make room for querybuf in processMultibulkBuffer, now the alloc of querybuf will be more than 32k.
2) If this happens to trigger the `clientsCronResizeQueryBuffer`, querybuf will be resized to 0.
3) Finally, in readQueryFromClient, we expand the querybuf non-greedily, from 0 to 32k.
    Old code, make room for querybuf is greedy, so it only needs 11 times to expand to 32M(16k*(2^11)),
    but now we need 2048(32*1024/16) times to reach it, due to the slow allocation under valgrind that exposed the problem.

The fix for the excessive shrinking of the query buf to 0, will be handled in #5013 (that other change on it's own can fix failing test too), but the fix in this PR will also fix the failing test.

The fix in this PR will makes the reading in `readQueryFromClient` more aggressive when working on a big arg (so that it is in par with the same code in `processMultibulkBuffer` (i.e. the two calls to `sdsMakeRoomForNonGreedy` should both use the bulk size).
In the code before this fix the one in readQueryFromClient always has `readlen = PROTO_IOBUF_LEN`
@oranagra
Copy link
Member

@sundb @yoav-steinberg @soloestoy please have a look at the last change i pushed.

@oranagra oranagra requested a review from yossigo July 1, 2021 13:10
@oranagra oranagra added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Jul 1, 2021
Copy link
Contributor

@yoav-steinberg yoav-steinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding my comments about doing the resize only when sdslen(c->querybuf)==0, I now see I was wrong. Even if we're not idle and we have excess qbuf size due to previous command we still might be in the middle of processing the current command with a non zero qbuf and want to resize.
See below my updated remarks.

@oranagra
Copy link
Member

oranagra commented Jul 1, 2021

@soloestoy can you take a look at my abuse of your PR?

Copy link
Contributor

@yoav-steinberg yoav-steinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix bug when growing to a larger sds type.

yossigo
yossigo previously approved these changes Jul 4, 2021
oranagra
oranagra previously approved these changes Jul 4, 2021
@oranagra
Copy link
Member

oranagra commented Jul 4, 2021

oranagra
oranagra previously approved these changes Jul 5, 2021
1. querybuf_peak has not been updated correctly in readQueryFromClient.
2. qbuf shrinking uses sdsalloc instead of sdsAllocSize

see more details in issue redis#4983
oranagra
oranagra previously approved these changes Jul 5, 2021
when tracking the peak, don't reset the peak to 0, reset it to the
maximum of the current used, and the planned to be used by the current
arg.

when shrining, split the two separate conditions.
the idle time shrinking will remove all free space.
but the peak based shrinking will keep room for the current arg.

when we resize due to a peak (rahter than idle time), don't trim all
unused space, let the qbuf keep a size that's sufficient for the
currently process bulklen, and the current peak.

Co-authored-by: sundb <[email protected]>
Co-authored-by: yoav-steinberg <[email protected]>
@oranagra oranagra merged commit ec582cc into redis:unstable Jul 5, 2021
@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes and removed release-notes indication that this issue needs to be mentioned in the release notes labels Jul 5, 2021
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
…Followup for redis#9003) (redis#9100)

Due to the change in redis#9003, a long-standing bug was raised under `valgrind`.
This bug can cause the master-slave sync to take a very long time, causing the `pendingquerybuf.tcl` test to fail.
This problem does not only occur in master-slave sync, it is triggered when the big arg is greater than 32k.
step:
```sh
dd if=/dev/zero of=bigfile bs=1M count=32
./src/redis-cli -x hset a a < bigfile
```

1) Make room for querybuf in processMultibulkBuffer, now the alloc of querybuf will be more than 32k.
2) If this happens to trigger the `clientsCronResizeQueryBuffer`, querybuf will be resized to 0.
3) Finally, in readQueryFromClient, we expand the querybuf non-greedily, from 0 to 32k.
    Old code, make room for querybuf is greedy, so it only needs 11 times to expand to 32M(16k*(2^11)),
    but now we need 2048(32*1024/16) times to reach it, due to the slow allocation under valgrind that exposed the problem.

The fix for the excessive shrinking of the query buf to 0, will be handled in redis#5013 (that other change on it's own can fix failing test too), but the fix in this PR will also fix the failing test.

The fix in this PR will makes the reading in `readQueryFromClient` more aggressive when working on a big arg (so that it is in par with the same code in `processMultibulkBuffer` (i.e. the two calls to `sdsMakeRoomForNonGreedy` should both use the bulk size).
In the code before this fix the one in readQueryFromClient always has `readlen = PROTO_IOBUF_LEN`
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
when tracking the peak, don't reset the peak to 0, reset it to the
maximum of the current used, and the planned to be used by the current
arg.

when shrining, split the two separate conditions.
the idle time shrinking will remove all free space.
but the peak based shrinking will keep room for the current arg.

when we resize due to a peak (rahter than idle time), don't trim all
unused space, let the qbuf keep a size that's sufficient for the
currently process bulklen, and the current peak.

Co-authored-by: sundb <[email protected]>
Co-authored-by: yoav-steinberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants