Optimize deferred replies to use shared objects instead of sprintf#10334
Optimize deferred replies to use shared objects instead of sprintf#10334oranagra merged 9 commits intoredis:unstablefrom
Conversation
oranagra
left a comment
There was a problem hiding this comment.
i think it makes sense to extend this optimization for RESP3 sets and maps.
here and also in the non deferred reply.
looking at the code, we have some 30 calls to setDeferredArrayLen and 16 for setDeferredMapLen (probably less commonly used, but still could make a big impact for someone)
Co-authored-by: Oran Agra <[email protected]>
|
WRT to:
this would imply creating:
agree? |
|
yes |
…erredAggregateLen()
…erredAggregateLen()
…setDeferredAggregateLen()
@oranagra I've added the map and set precomputed headers. After that change ( 0e00bed ) I saw a drop in the best results we've got ( still an improvement from unstable but I believe we want to keep as much optimizations as possible ). Results of 0e00bed pipeline 1 Results of 0e00bed pipeline 16 As you can see it's still better then unstable: 620K to 655K but If we reduce the number of conditional branches (if/else if/else if) and precomputed the shared conditions used we can get back to the best results while adding maps and sets: Results of 0e00bed06cc708221f4b12ec4c0b9a84e36ca6c6 pipeline 1 Results of 0e00bed06cc708221f4b12ec4c0b9a84e36ca6c6 pipeline 16 |
|
@oranagra WRT to RESP3 maps I've experimented with redis-benchmark (using #10335) and STREAMs as follow:
|
Co-authored-by: Oran Agra <[email protected]>
…10334) Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects. In some pipelined workloads this achieves about 10% improvement. Co-authored-by: Oran Agra <[email protected]> (cherry picked from commit b857928)
This was raised on #10310 (comment) in a discussion with @oranagra .

Given that
sprintfis consuming 1.6% of CPU cycles of the process on pipeline 1 tests, trying to avoid it will benefit any command that uses deferred replies. Pipelining will make the difference even more evident.ZREVRANGE results:
pipeline 1:
pipeline 16 ( reduces the relative percentage of
__GI___writevand makes more evident the command performance ):To reproduce: