Optimize the HELLO command reply #13490

fcostaoliveira · 2024-08-26T15:16:19Z

Overall improvement

TBD ( current is approximately 6% on the achievable ops/sec), coming from:

In case of no module we can skip 1.3% CPU cycles on dict Iterator creation/deletion
Use addReplyBulkCBuffer instead of addReplyBulkCString to avoid runtime strlen overhead within HELLO reply on string constants.

Optimization 1: In case of no module we can skip 1.3% CPU cycles on dict Iterator creation/deletion.

Function Stack	CPU Time: Total	CPU Time: Self	Module	Function (Full)	Source File	Start Address
addReplyLoadedModules	1.3%	0.064s	redis-server	addReplyLoadedModules	module.c	0x82e09
  dictNext	0.5%	0.032s	redis-server	dictNext	dict.c	0x94c50
  dictGetIterator	0.4%	0.008s	redis-server	dictGetIterator	dict.c	0x94920

Manual validation:

Unstable (2b88db9):

$ memtier_benchmark --command "HELLO 2 SETNAME __key__" --test-time 60 --json-out-file unstable.json --hide-histogram
Json file unstable.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:     7210047 ops,  126085 (avg:  120139) ops/sec, 25.36MB/sec (avg: 24.17MB/sec),  1.58 (avg:  1.66) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Hellos     120147.56         1.66119         1.50300         3.45500        10.36700     24747.84 
Totals     120147.56         1.66119         1.50300         3.45500        10.36700     49495.68 
Json file closed.

This PR (4bc619a) :

$ memtier_benchmark --command "HELLO 2 SETNAME __key__" --test-time 60 --json-out-file pr-3.1.json --hide-histogram
Json file pr-3.1.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:     7286407 ops,  129223 (avg:  121414) ops/sec, 25.92MB/sec (avg: 24.36MB/sec),  1.54 (avg:  1.64) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Hellos     121501.80         1.64361         1.51100         3.34300        10.43100     24960.86 
Totals     121501.80         1.64361         1.51100         3.34300        10.43100     49921.72 
Json file closed.

Optimization 2: Use addReplyBulkCBuffer instead of addReplyBulkCString to avoid runtime strlen overhead within HELLO reply on string constants.

This PR commit a88f45a

$ memtier_benchmark --command "HELLO 2 SETNAME __key__" --test-time 60 --json-out-file pr-3.2.json --hide-histogram
Json file pr-3.2.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:     7659586 ops,  129055 (avg:  127635) ops/sec, 25.89MB/sec (avg: 25.61MB/sec),  1.55 (avg:  1.56) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Hellos     127639.64         1.56376         1.43100         3.15100         9.27900     26221.85 
Totals     127639.64         1.56376         1.43100         3.15100         9.27900     52443.71 
Json file closed.

TODO:

Add HELLO command benchmark to the spec. https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-connection-hello.yml

…me strlen overhead within HELLO reply on string constants

sundb · 2024-08-27T03:52:35Z

HELLO command is not a hot path, it it worth to optimize it by reducing some readability.

fcostaoliveira · 2024-08-27T22:20:17Z

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

Using platform named: intel64-ubuntu22.04-redis-icx1 to do the comparison.

In summary:

Detected a total of 1 improvements above the improvement water line.

You can check a comparison in detail via the grafana link

Comparison between unstable and optimize.hello.

Time Period from 5 months ago. (environment used: oss-standalone)

Improvements Table

Test Case	Baseline unstable (median obs. +- std.dev)	Comparison optimize.hello (median obs. +- std.dev)	% change (higher-better)	Note
memtier_benchmark-connection-hello	169883	174951	3.0%	IMPROVEMENT

Improvements test regexp names: memtier_benchmark-connection-hello

Full Results table:

Test Case	Baseline unstable (median obs. +- std.dev)	Comparison optimize.hello (median obs. +- std.dev)	% change (higher-better)	Note
memtier_benchmark-connection-hello	169883	174951	3.0%	IMPROVEMENT

oranagra

LGTM. not sure if it's an important command to optimize.
p.s. maybe the helloCommand code can look a bit nicer with a macro (not repeating the string twice)

…ng string twice

fcostaoliveira · 2024-09-02T13:54:54Z

LGTM. not sure if it's an important command to optimize. p.s. maybe the helloCommand code can look a bit nicer with a macro (not repeating the string twice)

@oranagra I've added it in c8b3817. Please check if you agree.

fcostaoliveira · 2024-09-02T13:57:27Z

CE Performance Automation : step 1 of 2 (build) STARTING...

This comment was automatically generated given a benchmark was triggered.
Started building at 2024-09-02 13:57:27.152553
You can check each build/benchmark progress in grafana:

git hash: c8b3817
git branch: filipecosta90:optimize.hello
commit date and time: n/a
commit summary: n/a
test filters:
- command priority lower limit: 0
- command priority upper limit: 10000
- test name regex: .*
- command group regex: .*

fcostaoliveira · 2024-09-02T13:58:56Z

CE Performance Automation : step 2 of 2 (benchmark) FINISHED.

This comment was automatically generated given a benchmark was triggered.

Started benchmark suite at 2024-10-20 12:34:02.211680 and took 5009.071449 seconds to finish.
Status: [################################################################################] 100.0% completed.

In total will run 135 benchmarks.
- 0 pending.
- 135 completed:
- 0 successful.
- 135 failed.
You can check a the status in detail via the grafana link

oranagra

LGTM

### New Features in binary distributions - 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter, Count-min sketch, Top-k, t-digest - Redis scalable query engine (including vector search) ### Potentially breaking changes - #12272 `GETRANGE` returns an empty bulk when the negative end index is out of range - #12395 Optimize `SCAN` command when matching data type ### Bug fixes - #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed - #13489 `ACL CAT` - return module commands - #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx` - #13473 Fix incorrect lag due to trimming stream via `XTRIM` command - #13338 Fix incorrect lag field in `XINFO` when tombstone is after the `last_id` of the consume group - #13470 On `HDEL` of last field - update the global hash field expiration data structure - #13465 Cluster: Pass extensions to node if extension processing is handled by it - #13443 Cluster: Ensure validity of myself when loading cluster config - #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array ### Modules API - #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and `RM_RegisterDefragCallbacks` - defrag API to allocate and free raw memory ### Performance and resource utilization improvements - #13503 Avoid overhead of comparison function pointer calls in listpack `lpFind` - #13505 Optimize `STRING` datatype write commands - #13499 Optimize `SMEMBERS` command - #13494 Optimize `GEO*` commands reply - #13490 Optimize `HELLO` command - #13488 Optimize client query buffer - #12395 Optimize `SCAN` command when matching data type - #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands - #13516 Optimize `LRANGE` and other commands that perform several writes to client buffers per call - #13431 Avoid `used_memory` contention when updating from multiple threads ### Other general improvements - #13495 Reply `-LOADING` on replica while flushing the db ### CLI tools - #13411 redis-cli: Fix wrong `dbnum` showed after the client reconnected ### Notes - No backward compatibility for replication or persistence. - Additional distributions, upgrade paths, features, and improvements will be introduced in upcoming pre-releases. - With the GA release of 8.0 we will deprecate Redis Stack.

fcostaoliveira added 2 commits August 26, 2024 16:13

Optimize the HELLO command in case of no module loaded

4bc619a

Use addReplyBulkCBuffer instead of addReplyBulkCString to avoid runti…

a88f45a

…me strlen overhead within HELLO reply on string constants

fcostaoliveira changed the title ~~Optimize the HELLO command in case of no module loaded~~ Optimize the HELLO command reply Aug 26, 2024

fcostaoliveira added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Aug 27, 2024

fcostaoliveira requested a review from sundb August 27, 2024 17:05

sundb previously approved these changes Aug 28, 2024

View reviewed changes

fcostaoliveira requested a review from oranagra September 2, 2024 12:43

oranagra previously approved these changes Sep 2, 2024

View reviewed changes

Using ADD_REPLY_BULK_CBUFFER_STRING_CONSTANT macro instead of repeati…

c8b3817

…ng string twice

filipecosta90 dismissed stale reviews from oranagra and sundb via c8b3817 September 2, 2024 13:54

fcostaoliveira requested review from sundb and oranagra September 2, 2024 13:55

oranagra approved these changes Sep 2, 2024

View reviewed changes

sundb approved these changes Sep 3, 2024

View reviewed changes

sundb merged commit a31b516 into redis:unstable Sep 3, 2024
14 checks passed

YaacovHazan mentioned this pull request Sep 11, 2024

Redis 8.0 M01 #13538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the HELLO command reply #13490

Optimize the HELLO command reply #13490

fcostaoliveira commented Aug 26, 2024 •

edited

Loading

sundb commented Aug 27, 2024

fcostaoliveira commented Aug 27, 2024

oranagra left a comment

fcostaoliveira commented Sep 2, 2024

fcostaoliveira commented Sep 2, 2024

fcostaoliveira commented Sep 2, 2024 •

edited

Loading

oranagra left a comment

Optimize the HELLO command reply #13490

Optimize the HELLO command reply #13490

Conversation

fcostaoliveira commented Aug 26, 2024 • edited Loading

Overall improvement

Optimization 1: In case of no module we can skip 1.3% CPU cycles on dict Iterator creation/deletion.

Optimization 2: Use addReplyBulkCBuffer instead of addReplyBulkCString to avoid runtime strlen overhead within HELLO reply on string constants.

sundb commented Aug 27, 2024

fcostaoliveira commented Aug 27, 2024

Automated performance analysis summary

Comparison between unstable and optimize.hello.

Improvements Table

oranagra left a comment

Choose a reason for hiding this comment

fcostaoliveira commented Sep 2, 2024

fcostaoliveira commented Sep 2, 2024

CE Performance Automation : step 1 of 2 (build) STARTING...

fcostaoliveira commented Sep 2, 2024 • edited Loading

CE Performance Automation : step 2 of 2 (benchmark) FINISHED.

oranagra left a comment

Choose a reason for hiding this comment

fcostaoliveira commented Aug 26, 2024 •

edited

Loading

fcostaoliveira commented Sep 2, 2024 •

edited

Loading