Skip to content

Optimize deferred replies to use shared objects instead of sprintf#10334

Merged
oranagra merged 9 commits intoredis:unstablefrom
filipecosta90:deferred.opt
Feb 23, 2022
Merged

Optimize deferred replies to use shared objects instead of sprintf#10334
oranagra merged 9 commits intoredis:unstablefrom
filipecosta90:deferred.opt

Conversation

@filipecosta90
Copy link
Contributor

This was raised on #10310 (comment) in a discussion with @oranagra .
image

Given that sprintf is consuming 1.6% of CPU cycles of the process on pipeline 1 tests, trying to avoid it will benefit any command that uses deferred replies. Pipelining will make the difference even more evident.
ZREVRANGE results:

  • pipeline 1:

    • unstable: 488aecb : 138479 ops/sec. p50(ms)=1.43900
    • this PR: 143115 ops/sec. p50(ms)=1.39100. %change = 3.4%
  • pipeline 16 ( reduces the relative percentage of __GI___writev and makes more evident the command performance ):

    • unstable: 488aecb : 621587 ops/sec. p50(ms)=4.25500
    • this PR: 680124 ops/sec. p50(ms)=3.83900. %change = 9.4%

To reproduce:

rm dump.rdb ; src/redis-server --save "" &
redis-cli zadd zz 0 a 1 b 2 c 3 d 4 e 5 f
memtier_benchmark --pipeline 16 --command "zrange zz 0 -1" --hide-histogram

@filipecosta90 filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Feb 23, 2022
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it makes sense to extend this optimization for RESP3 sets and maps.
here and also in the non deferred reply.
looking at the code, we have some 30 calls to setDeferredArrayLen and 16 for setDeferredMapLen (probably less commonly used, but still could make a big impact for someone)

Co-authored-by: Oran Agra <[email protected]>
@filipecosta90
Copy link
Contributor Author

WRT to:

i think it makes sense to extend this optimization for RESP3 sets and maps.

this would imply creating:

  • shared.sethdr: coming from sdscatprintf(sdsempty(),"%%%d\r\n",j));
  • shared.maphdr: coming from sdscatprintf(sdsempty(),"~%d\r\n",j));

agree?

@oranagra
Copy link
Member

yes

@filipecosta90
Copy link
Contributor Author

yes

@oranagra I've added the map and set precomputed headers. After that change ( 0e00bed ) I saw a drop in the best results we've got ( still an improvement from unstable but I believe we want to keep as much optimizations as possible ).

Results of 0e00bed pipeline 1

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Zranges    139113.33         1.43710         1.42300         1.90300         2.54300     11547.49 
Totals     139113.33         1.43710         1.42300         1.90300         2.54300     11547.49

Results of 0e00bed pipeline 16

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Zranges    655828.50         4.86750         3.96700         7.13500         7.67900     54438.89 
Totals     655828.50         4.86750         3.96700         7.13500         7.67900     54438.89 

As you can see it's still better then unstable: 620K to 655K but If we reduce the number of conditional branches (if/else if/else if) and precomputed the shared conditions used we can get back to the best results while adding maps and sets:

Results of 0e00bed06cc708221f4b12ec4c0b9a84e36ca6c6 pipeline 1

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Zranges    142047.48         1.40747         1.39100         1.85500         2.55900     11791.05 
Totals     142047.48         1.40747         1.39100         1.85500         2.55900     11791.05

Results of 0e00bed06cc708221f4b12ec4c0b9a84e36ca6c6 pipeline 16

LL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Zranges    666106.37         4.79225         3.90300         6.97500         7.42300     55292.03 
Totals     666106.37         4.79225         3.90300         6.97500         7.42300     55292.03 

@filipecosta90
Copy link
Contributor Author

@oranagra WRT to RESP3 maps I've experimented with redis-benchmark (using #10335) and STREAMs as follow:

rm dump.rdb ; src/redis-server --save "" &
redis-cli  XADD mystream 1526919030474-55 message "Hello,"
# resp2
redis-benchmark -n 10000000 xread COUNT 1 STREAMS mystream 0-0
# resp3
redis-benchmark -3 -n 10000000 xread COUNT 1 STREAMS mystream 0-0
COMMAND RESP unstable ops/sec (pipeline1) 488aecb this PR ops/sec (pipeline1) % change
XREAD COUNT 1 STREAMS mystream 0-0 2 151715 157968 4.12%
XREAD COUNT 1 STREAMS mystream 0-0 3 147260 159084 8.03%

@filipecosta90 filipecosta90 changed the title Avoid sprintf on setDeferredAggregateLen() when we can used shared objects Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects Feb 23, 2022
Co-authored-by: Oran Agra <[email protected]>
@oranagra oranagra changed the title Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects Optimize deferred replies to use shared objects instead of sprintf Feb 23, 2022
@oranagra oranagra merged commit b857928 into redis:unstable Feb 23, 2022
@filipecosta90 filipecosta90 deleted the deferred.opt branch February 23, 2022 22:39
oranagra added a commit that referenced this pull request Apr 27, 2022
…10334)

Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects.
In some pipelined workloads this achieves about 10% improvement.

Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit b857928)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action:run-benchmark Triggers the benchmark suite for this Pull Request

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants