Skip to content

Make readQueryFromClient more aggressive when reading big arg again (Followup for #9003)#9100

Merged
oranagra merged 2 commits intoredis:unstablefrom
sundb:temp-fix-pendingquerybuf
Jun 17, 2021
Merged

Make readQueryFromClient more aggressive when reading big arg again (Followup for #9003)#9100
oranagra merged 2 commits intoredis:unstablefrom
sundb:temp-fix-pendingquerybuf

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Jun 17, 2021

@oranagra @yoav-steinberg
Due to the change in #9003, a long-standing bug was raised under valgrind.
This bug can cause the master-slave sync to take a very long time, causing the pendingquerybuf.tcl test to fail.
This problem does not only occur in master-slave sync, it is triggered when the big arg is greater than 32k.
step:

dd if=/dev/zero of=bigfile bs=1M count=32
./src/redis-cli -x hset a a < bigfile
  1. Make room for querybuf in processMultibulkBuffer, now the alloc of querybuf will be more than 32k.
    c->querybuf = sdsMakeRoomForNonGreedy(c->querybuf, readlen);
  2. If this happens to trigger the clientsCronResizeQueryBuffer, querybuf will be resized to 0.
  3. Finally, in readQueryFromClient, we expand the querybuf non-greedily, from 0 to 32k.
    Old code, make room for querybuf is greedy, so it only needs 11 times to expand to 32M(16k*(2^11)), but now we need 2048(32*1024/16) times to reach it, due to the slow allocation under valgrind tha exposed the problem.

The fix for the excessive shrinking of the query buf to 0, will be handled in #5013 (that other change on it's own can fix failing test too), but the fix in this PR will also fix the failing test.

The fix in this PR will make the reading in readQueryFromClient more aggressive when working on a big arg (so that it is in par with the same code in processMultibulkBuffer (i.e. the two calls to sdsMakeRoomForNonGreedy should both use the bulk size).
In the current code (before this fix) the one in readQueryFromClient always has readlen = PROTO_IOBUF_LEN

@sundb sundb changed the title temporarily fix pendingquerybuf test failure temporarily fix pendingquerybuf test failure under valgrind Jun 17, 2021
@oranagra oranagra changed the title temporarily fix pendingquerybuf test failure under valgrind Make readQueryFromClient more aggressive when reading big arg again (Followup for #9003) Jun 17, 2021
@oranagra oranagra merged commit 1a22445 into redis:unstable Jun 17, 2021
@sundb sundb deleted the temp-fix-pendingquerybuf branch June 30, 2021 01:59
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
…Followup for redis#9003) (redis#9100)

Due to the change in redis#9003, a long-standing bug was raised under `valgrind`.
This bug can cause the master-slave sync to take a very long time, causing the `pendingquerybuf.tcl` test to fail.
This problem does not only occur in master-slave sync, it is triggered when the big arg is greater than 32k.
step:
```sh
dd if=/dev/zero of=bigfile bs=1M count=32
./src/redis-cli -x hset a a < bigfile
```

1) Make room for querybuf in processMultibulkBuffer, now the alloc of querybuf will be more than 32k.
2) If this happens to trigger the `clientsCronResizeQueryBuffer`, querybuf will be resized to 0.
3) Finally, in readQueryFromClient, we expand the querybuf non-greedily, from 0 to 32k.
    Old code, make room for querybuf is greedy, so it only needs 11 times to expand to 32M(16k*(2^11)),
    but now we need 2048(32*1024/16) times to reach it, due to the slow allocation under valgrind that exposed the problem.

The fix for the excessive shrinking of the query buf to 0, will be handled in redis#5013 (that other change on it's own can fix failing test too), but the fix in this PR will also fix the failing test.

The fix in this PR will makes the reading in `readQueryFromClient` more aggressive when working on a big arg (so that it is in par with the same code in `processMultibulkBuffer` (i.e. the two calls to `sdsMakeRoomForNonGreedy` should both use the bulk size).
In the code before this fix the one in readQueryFromClient always has `readlen = PROTO_IOBUF_LEN`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants