Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the HELLO command reply #13490

Merged
merged 3 commits into from
Sep 3, 2024
Merged

Conversation

fcostaoliveira
Copy link
Collaborator

@fcostaoliveira fcostaoliveira commented Aug 26, 2024

Overall improvement

TBD ( current is approximately 6% on the achievable ops/sec), coming from:

  • In case of no module we can skip 1.3% CPU cycles on dict Iterator creation/deletion
  • Use addReplyBulkCBuffer instead of addReplyBulkCString to avoid runtime strlen overhead within HELLO reply on string constants.

Optimization 1: In case of no module we can skip 1.3% CPU cycles on dict Iterator creation/deletion.

Function Stack	CPU Time: Total	CPU Time: Self	Module	Function (Full)	Source File	Start Address
addReplyLoadedModules	1.3%	0.064s	redis-server	addReplyLoadedModules	module.c	0x82e09
  dictNext	0.5%	0.032s	redis-server	dictNext	dict.c	0x94c50
  dictGetIterator	0.4%	0.008s	redis-server	dictGetIterator	dict.c	0x94920

Manual validation:

Unstable (2b88db9):

$ memtier_benchmark --command "HELLO 2 SETNAME __key__" --test-time 60 --json-out-file unstable.json --hide-histogram
Json file unstable.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:     7210047 ops,  126085 (avg:  120139) ops/sec, 25.36MB/sec (avg: 24.17MB/sec),  1.58 (avg:  1.66) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Hellos     120147.56         1.66119         1.50300         3.45500        10.36700     24747.84 
Totals     120147.56         1.66119         1.50300         3.45500        10.36700     49495.68 
Json file closed.

This PR (4bc619a) :

$ memtier_benchmark --command "HELLO 2 SETNAME __key__" --test-time 60 --json-out-file pr-3.1.json --hide-histogram
Json file pr-3.1.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:     7286407 ops,  129223 (avg:  121414) ops/sec, 25.92MB/sec (avg: 24.36MB/sec),  1.54 (avg:  1.64) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Hellos     121501.80         1.64361         1.51100         3.34300        10.43100     24960.86 
Totals     121501.80         1.64361         1.51100         3.34300        10.43100     49921.72 
Json file closed.

Optimization 2: Use addReplyBulkCBuffer instead of addReplyBulkCString to avoid runtime strlen overhead within HELLO reply on string constants.

image

This PR commit a88f45a

$ memtier_benchmark --command "HELLO 2 SETNAME __key__" --test-time 60 --json-out-file pr-3.2.json --hide-histogram
Json file pr-3.2.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:     7659586 ops,  129055 (avg:  127635) ops/sec, 25.89MB/sec (avg: 25.61MB/sec),  1.55 (avg:  1.56) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Hellos     127639.64         1.56376         1.43100         3.15100         9.27900     26221.85 
Totals     127639.64         1.56376         1.43100         3.15100         9.27900     52443.71 
Json file closed.

TODO:

@fcostaoliveira fcostaoliveira changed the title Optimize the HELLO command in case of no module loaded Optimize the HELLO command reply Aug 26, 2024
@sundb
Copy link
Collaborator

sundb commented Aug 27, 2024

HELLO command is not a hot path, it it worth to optimize it by reducing some readability.

@fcostaoliveira fcostaoliveira added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Aug 27, 2024
@fcostaoliveira fcostaoliveira requested a review from sundb August 27, 2024 17:05
@fcostaoliveira
Copy link
Collaborator Author

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

Using platform named: intel64-ubuntu22.04-redis-icx1 to do the comparison.

In summary:

  • Detected a total of 1 improvements above the improvement water line.

You can check a comparison in detail via the grafana link

Comparison between unstable and optimize.hello.

Time Period from 5 months ago. (environment used: oss-standalone)

Improvements Table

Test Case Baseline unstable (median obs. +- std.dev) Comparison optimize.hello (median obs. +- std.dev) % change (higher-better) Note
memtier_benchmark-connection-hello 169883 174951 3.0% IMPROVEMENT

Improvements test regexp names: memtier_benchmark-connection-hello

Full Results table:
Test Case Baseline unstable (median obs. +- std.dev) Comparison optimize.hello (median obs. +- std.dev) % change (higher-better) Note
memtier_benchmark-connection-hello 169883 174951 3.0% IMPROVEMENT

sundb
sundb previously approved these changes Aug 28, 2024
oranagra
oranagra previously approved these changes Sep 2, 2024
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. not sure if it's an important command to optimize.
p.s. maybe the helloCommand code can look a bit nicer with a macro (not repeating the string twice)

@filipecosta90 filipecosta90 dismissed stale reviews from oranagra and sundb via c8b3817 September 2, 2024 13:54
@fcostaoliveira
Copy link
Collaborator Author

LGTM. not sure if it's an important command to optimize. p.s. maybe the helloCommand code can look a bit nicer with a macro (not repeating the string twice)

@oranagra I've added it in c8b3817. Please check if you agree.

@fcostaoliveira
Copy link
Collaborator Author

CE Performance Automation : step 1 of 2 (build) STARTING...

This comment was automatically generated given a benchmark was triggered.
Started building at 2024-09-02 13:57:27.152553
You can check each build/benchmark progress in grafana:

  • git hash: c8b3817
  • git branch: filipecosta90:optimize.hello
  • commit date and time: n/a
  • commit summary: n/a
  • test filters:
    • command priority lower limit: 0
    • command priority upper limit: 10000
    • test name regex: .*
    • command group regex: .*

@fcostaoliveira
Copy link
Collaborator Author

fcostaoliveira commented Sep 2, 2024

CE Performance Automation : step 2 of 2 (benchmark) FINISHED.

This comment was automatically generated given a benchmark was triggered.

Started benchmark suite at 2024-10-20 12:34:02.211680 and took 5009.071449 seconds to finish.
Status: [################################################################################] 100.0% completed.

In total will run 135 benchmarks.
- 0 pending.
- 135 completed:
- 0 successful.
- 135 failed.
You can check a the status in detail via the grafana link

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sundb sundb merged commit a31b516 into redis:unstable Sep 3, 2024
14 checks passed
@YaacovHazan YaacovHazan mentioned this pull request Sep 11, 2024
YaacovHazan added a commit that referenced this pull request Sep 12, 2024
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- #12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- #12395 Optimize `SCAN` command when matching data type

### Bug fixes

- #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- #13489 `ACL CAT` - return module commands
- #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- #13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- #13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- #13470 On `HDEL` of last field - update the global hash field
expiration data structure
- #13465 Cluster: Pass extensions to node if extension processing is
handled by it
- #13443 Cluster: Ensure validity of myself when loading cluster config
- #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- #13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- #13505 Optimize `STRING` datatype write commands
- #13499 Optimize `SMEMBERS` command
- #13494 Optimize `GEO*` commands reply
- #13490 Optimize `HELLO` command
- #13488 Optimize client query buffer
- #12395 Optimize `SCAN` command when matching data type
- #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- #13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- #13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- #13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- #13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action:run-benchmark Triggers the benchmark suite for this Pull Request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants