Optimize deferred replies to use shared objects instead of sprintf by filipecosta90 · Pull Request #10334 · redis/redis

filipecosta90 · 2022-02-23T12:42:35Z

This was raised on #10310 (comment) in a discussion with @oranagra .

Given that sprintf is consuming 1.6% of CPU cycles of the process on pipeline 1 tests, trying to avoid it will benefit any command that uses deferred replies. Pipelining will make the difference even more evident.
ZREVRANGE results:

pipeline 1:
- unstable: 488aecb : 138479 ops/sec. p50(ms)=1.43900
- this PR: 143115 ops/sec. p50(ms)=1.39100. %change = 3.4%
pipeline 16 ( reduces the relative percentage of __GI___writev and makes more evident the command performance ):
- unstable: 488aecb : 621587 ops/sec. p50(ms)=4.25500
- this PR: 680124 ops/sec. p50(ms)=3.83900. %change = 9.4%

To reproduce:

rm dump.rdb ; src/redis-server --save "" &
redis-cli zadd zz 0 a 1 b 2 c 3 d 4 e 5 f
memtier_benchmark --pipeline 16 --command "zrange zz 0 -1" --hide-histogram

…jects

oranagra

i think it makes sense to extend this optimization for RESP3 sets and maps.
here and also in the non deferred reply.
looking at the code, we have some 30 calls to setDeferredArrayLen and 16 for setDeferredMapLen (probably less commonly used, but still could make a big impact for someone)

src/networking.c

Co-authored-by: Oran Agra <[email protected]>

filipecosta90 · 2022-02-23T13:32:27Z

WRT to:

i think it makes sense to extend this optimization for RESP3 sets and maps.

this would imply creating:

shared.sethdr: coming from sdscatprintf(sdsempty(),"%%%d\r\n",j));
shared.maphdr: coming from sdscatprintf(sdsempty(),"~%d\r\n",j));

agree?

oranagra · 2022-02-23T13:40:24Z

yes

…erredAggregateLen()

…ferred.opt

…erredAggregateLen()

…setDeferredAggregateLen()

filipecosta90 · 2022-02-23T14:09:10Z

yes

@oranagra I've added the map and set precomputed headers. After that change ( 0e00bed ) I saw a drop in the best results we've got ( still an improvement from unstable but I believe we want to keep as much optimizations as possible ).

Results of 0e00bed pipeline 1

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Zranges    139113.33         1.43710         1.42300         1.90300         2.54300     11547.49 
Totals     139113.33         1.43710         1.42300         1.90300         2.54300     11547.49

Results of 0e00bed pipeline 16

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Zranges    655828.50         4.86750         3.96700         7.13500         7.67900     54438.89 
Totals     655828.50         4.86750         3.96700         7.13500         7.67900     54438.89

As you can see it's still better then unstable: 620K to 655K but If we reduce the number of conditional branches (if/else if/else if) and precomputed the shared conditions used we can get back to the best results while adding maps and sets:

Results of 0e00bed06cc708221f4b12ec4c0b9a84e36ca6c6 pipeline 1

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Zranges    142047.48         1.40747         1.39100         1.85500         2.55900     11791.05 
Totals     142047.48         1.40747         1.39100         1.85500         2.55900     11791.05

Results of 0e00bed06cc708221f4b12ec4c0b9a84e36ca6c6 pipeline 16

LL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Zranges    666106.37         4.79225         3.90300         6.97500         7.42300     55292.03 
Totals     666106.37         4.79225         3.90300         6.97500         7.42300     55292.03

…LongWithPrefix()

src/networking.c

filipecosta90 · 2022-02-23T15:47:18Z

@oranagra WRT to RESP3 maps I've experimented with redis-benchmark (using #10335) and STREAMs as follow:

rm dump.rdb ; src/redis-server --save "" &
redis-cli  XADD mystream 1526919030474-55 message "Hello,"
# resp2
redis-benchmark -n 10000000 xread COUNT 1 STREAMS mystream 0-0
# resp3
redis-benchmark -3 -n 10000000 xread COUNT 1 STREAMS mystream 0-0

COMMAND	RESP	unstable ops/sec (pipeline1) `488aecb`	this PR ops/sec (pipeline1)	% change
XREAD COUNT 1 STREAMS mystream 0-0	2	151715	157968	4.12%
XREAD COUNT 1 STREAMS mystream 0-0	3	147260	159084	8.03%

src/server.h

Co-authored-by: Oran Agra <[email protected]>

…10334) Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects. In some pipelined workloads this achieves about 10% improvement. Co-authored-by: Oran Agra <[email protected]> (cherry picked from commit b857928)

Avoid sprintf on setDeferredAggregateLen() when we can used shared ob…

d04aa27

…jects

filipecosta90 requested a review from oranagra February 23, 2022 12:42

filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Feb 23, 2022

oranagra reviewed Feb 23, 2022

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

Update src/networking.c

b9e827f

Co-authored-by: Oran Agra <[email protected]>

filipecosta90 added 4 commits February 23, 2022 13:50

Included RESP3 Map type and Set type headers optimization into setDef…

ab2cd7a

…erredAggregateLen()

Merge branch 'deferred.opt' of github.com:filipecosta90/redis into de…

a9edac3

…ferred.opt

Included RESP3 Map type and Set type headers optimization into setDef…

0e00bed

…erredAggregateLen()

Reduce branching and conditional checks on optimized header usage in …

804d15f

…setDeferredAggregateLen()

filipecosta90 requested a review from oranagra February 23, 2022 14:09

Using precomputed headers for RESP3 Map and Set types in addReplyLong…

326e961

…LongWithPrefix()

oranagra reviewed Feb 23, 2022

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

Fixes per PR review: introduce macro for shared header lengths

3fed086

filipecosta90 requested a review from oranagra February 23, 2022 15:57

oranagra approved these changes Feb 23, 2022

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

filipecosta90 changed the title ~~Avoid sprintf on setDeferredAggregateLen() when we can used shared objects~~ Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects Feb 23, 2022

Update src/server.h

f7f7158

Co-authored-by: Oran Agra <[email protected]>

oranagra changed the title ~~Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects~~ Optimize deferred replies to use shared objects instead of sprintf Feb 23, 2022

oranagra merged commit b857928 into redis:unstable Feb 23, 2022

filipecosta90 deleted the deferred.opt branch February 23, 2022 22:39

This was referenced Feb 24, 2022

[BUG] ZREVRANGE 50% slower after upgrading from 5.0.7 to 6.2.6 #10310

Closed

Optimization: Avoid deferred array reply on ZRANGE commands BYRANK #10337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize deferred replies to use shared objects instead of sprintf#10334

Optimize deferred replies to use shared objects instead of sprintf#10334
oranagra merged 9 commits intoredis:unstablefrom
filipecosta90:deferred.opt

filipecosta90 commented Feb 23, 2022

Uh oh!

oranagra left a comment

Uh oh!

Uh oh!

filipecosta90 commented Feb 23, 2022

Uh oh!

oranagra commented Feb 23, 2022

Uh oh!

filipecosta90 commented Feb 23, 2022

Uh oh!

Uh oh!

filipecosta90 commented Feb 23, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

filipecosta90 commented Feb 23, 2022

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

filipecosta90 commented Feb 23, 2022

Uh oh!

oranagra commented Feb 23, 2022

Uh oh!

filipecosta90 commented Feb 23, 2022

Uh oh!

Uh oh!

filipecosta90 commented Feb 23, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants