Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize SET/INCR*/DECR*/SETRANGE/APPEND by reducing duplicate computation #13505

Merged
merged 15 commits into from
Sep 4, 2024

Conversation

fcostaoliveira
Copy link
Collaborator

@fcostaoliveira fcostaoliveira commented Aug 29, 2024

  • Avoid addReplyLongLong (which converts back to string) the value we already have as a robj, by using addReplyProto + addReply
  • Avoid doing dbFind Twice for the same dictEntry on INCR*/DECR*/SETRANGE/APPEND commands.
  • Avoid multiple sdslen calls with the same input on setrangeCommand and appendCommand
  • Introduce setKeyWithDictEntry, which is like setKey(), but accepts an optional dictEntry input: Avoids the second dictFind in SET command

TODO:


Hotspot data that triggered this effort

Hotspots sample on incrDecr* commands (we can save 3.3% on the 2nd dictFind)

image

Hotspots on set command (we can save 2.1% on the 2nd dictFind)
image

Measuring the improvements

pip3 install redis-benchmarks-specification==0.1.222
redis-benchmarks-spec-client-runner --tests-regexp "memtier_benchmark-1Mkeys-string-append-1-100B|memtier_benchmark-1Mkeys-string-incrby|memtier_benchmark-1Mkeys-string-incrbyfloat|memtier_benchmark-1Mkeys-string-decr|memtier_benchmark-1Mkeys-string-setrange-100B|memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10|memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10" --flushall_on_every_test_start --flushall_on_every_test_end --override-memtier-test-time 60 --cpuset_start_pos 1

Improvements on STRING SET command

Throughput

Test Name baseline unstable ( 3fcddfb ) comparison( 93fae76 ) % change
----------------------------------------------------------------- -----------: -----------: -----------:
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 578505 592740 2.46%
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10 644028 663759 3.06%

Avg. Latency

Test Name baseline unstable ( 3fcddfb ) comparison( 93fae76 ) % change
----------------------------------------------------------------- -----------: -----------: -----------:
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 3.455 3.372 2.40%
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10 3.103 3.011 2.96%

Improvements on INCRBY command

Baseline

root@hpe10:~/redis# taskset -c 1-5 memtier_benchmark --pipeline 10 --test-time 60  --hide-histogram --command='INCRBY __key__ 1'
(...)
[RUN #1 100%,  60 secs]  0 threads:    45616940 ops,  750371 (avg:  760212) ops/sec, 36.42MB/sec (avg: 36.91MB/sec),  2.66 (avg:  2.63) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Incrbys    760212.36         2.62864         2.55900         5.15100         9.98300     37795.06 
Totals     760212.36         2.62864         2.55900         5.15100         9.98300     37795.06 

Comparison

root@hpe10:~/redis# taskset -c 1-5 memtier_benchmark --pipeline 10 --test-time 60  --hide-histogram --command='INCRBY __key__ 1'
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:    47423740 ops,  789698 (avg:  790317) ops/sec, 37.92MB/sec (avg: 37.95MB/sec),  2.53 (avg:  2.53) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Incrbys    790316.92         2.52839         2.44700         4.92700         9.91900     38860.38 
Totals     790316.92         2.52839         2.44700         4.92700         9.91900     38860.38 

@fcostaoliveira fcostaoliveira changed the title Optimize INCRBY/DECRBY by reducing duplicate computation Optimize INCR*/DECR*/SETRANGE/APPEND by reducing duplicate computation Aug 30, 2024
…void multiple sdslen calls with the same input on setrangeCommand and appendCommand.
… optional dictEntry input: Avoids the second dictFind in SET command
@fcostaoliveira fcostaoliveira changed the title Optimize INCR*/DECR*/SETRANGE/APPEND by reducing duplicate computation Optimize SET/INCR*/DECR*/SETRANGE/APPEND by reducing duplicate computation Aug 30, 2024
@fcostaoliveira
Copy link
Collaborator Author

fcostaoliveira commented Aug 30, 2024

Automated performance analysis summary

This comment was automatically generated given there is performance data available.

Using platform named: intel64-ubuntu22.04-redis-icx1 to do the comparison.

In summary:

  • Detected a total of 7 improvements above the improvement water line.

You can check a comparison in detail via the grafana link

Comparison between unstable and perf.incrby.

Time Period from 5 months ago. (environment used: oss-standalone)

Improvements Table

Test Case Baseline unstable (median obs. +- std.dev) Comparison perf.incrby (median obs. +- std.dev) % change (higher-better) Note
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 636952 +- 1.4% (7 datapoints) 658638 3.4% IMPROVEMENT
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10 753946 +- 2.0% (7 datapoints) 786432 4.3% waterline=2.0%. IMPROVEMENT
memtier_benchmark-1Mkeys-string-append-1-100B-pipeline-10 767185 792867 3.3% IMPROVEMENT
memtier_benchmark-1Mkeys-string-incrby-pipeline-10 891079 906416 1.7% IMPROVEMENT
memtier_benchmark-1Mkeys-string-incrbyfloat-pipeline-10 449176 457083 1.8% IMPROVEMENT
memtier_benchmark-1Mkeys-string-setex-100B-pipeline-10 614291 686197 11.7% IMPROVEMENT
memtier_benchmark-1Mkeys-string-setrange-100B-pipeline-10 812508 826357 1.7% IMPROVEMENT

Improvements test regexp names: memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10|memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10|memtier_benchmark-1Mkeys-string-append-1-100B-pipeline-10|memtier_benchmark-1Mkeys-string-incrby-pipeline-10|memtier_benchmark-1Mkeys-string-incrbyfloat-pipeline-10|memtier_benchmark-1Mkeys-string-setex-100B-pipeline-10|memtier_benchmark-1Mkeys-string-setrange-100B-pipeline-10

Full Results table:
Test Case Baseline unstable (median obs. +- std.dev) Comparison perf.incrby (median obs. +- std.dev) % change (higher-better) Note
memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10 636952 +- 1.4% (7 datapoints) 658638 3.4% IMPROVEMENT
memtier_benchmark-1Mkeys-load-string-with-10B-values-pipeline-10 753946 +- 2.0% (7 datapoints) 786432 4.3% waterline=2.0%. IMPROVEMENT
memtier_benchmark-1Mkeys-string-append-1-100B-pipeline-10 767185 792867 3.3% IMPROVEMENT
memtier_benchmark-1Mkeys-string-incrby-pipeline-10 891079 906416 1.7% IMPROVEMENT
memtier_benchmark-1Mkeys-string-incrbyfloat-pipeline-10 449176 457083 1.8% IMPROVEMENT
memtier_benchmark-1Mkeys-string-setex-100B-pipeline-10 614291 686197 11.7% IMPROVEMENT
memtier_benchmark-1Mkeys-string-setrange-100B-pipeline-10 812508 826357 1.7% IMPROVEMENT

…ts an optional dictEntry input: Avoids the second kvstoreDictFind in SETEX command
oranagra
oranagra previously approved these changes Sep 2, 2024
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. (assuming all comments are addressed)

@fcostaoliveira fcostaoliveira requested a review from sundb September 2, 2024 13:43
@fcostaoliveira fcostaoliveira requested a review from sundb September 2, 2024 13:59
sundb
sundb previously approved these changes Sep 2, 2024
@sundb sundb merged commit 05aed4c into redis:unstable Sep 4, 2024
14 checks passed
@YaacovHazan YaacovHazan added the release-notes indication that this issue needs to be mentioned in the release notes label Sep 5, 2024
@YaacovHazan YaacovHazan mentioned this pull request Sep 11, 2024
YaacovHazan added a commit that referenced this pull request Sep 12, 2024
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- #12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- #12395 Optimize `SCAN` command when matching data type

### Bug fixes

- #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- #13489 `ACL CAT` - return module commands
- #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- #13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- #13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- #13470 On `HDEL` of last field - update the global hash field
expiration data structure
- #13465 Cluster: Pass extensions to node if extension processing is
handled by it
- #13443 Cluster: Ensure validity of myself when loading cluster config
- #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- #13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- #13505 Optimize `STRING` datatype write commands
- #13499 Optimize `SMEMBERS` command
- #13494 Optimize `GEO*` commands reply
- #13490 Optimize `HELLO` command
- #13488 Optimize client query buffer
- #12395 Optimize `SCAN` command when matching data type
- #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- #13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- #13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- #13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- #13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
} else {
dbAdd(c->db,c->argv[1],new);
}
}
signalModifiedKey(c,c->db,c->argv[1]);
notifyKeyspaceEvent(NOTIFY_STRING,"incrby",c->argv[1],c->db->id);
server.dirty++;
addReplyLongLong(c, value);
addReplyLongLongFromStr(c,new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change can cause use-after-free in the tests/unit/moduleapi/keyspace_events.tcl test, when the test uses RM_StringDMA to modify the value from within the KSN, and this command uses it after it was released.
the reason it doesn't happen is because the eviction policy allows for shared integers, so despite "releasing" this object from within the KSN. it doesn't free the memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be solved by moving the reply a few lines upwards.
we can also argue that the module is violating redis, but it used to work so far (specifically for INCR), and the optimization broke it, so shall we fix that?
WDYT?

oranagra added a commit to oranagra/redis that referenced this pull request Mar 24, 2025
in redis#13505, we changed the code to use the string value of the key rather
than the integer value on the stack, but we have a test in
unit/moduleapi/keyspace_events that uses keyspace notification hook to
modify the value with RM_StringDMA, which can cause this value to be
released before used. the reason it didn't happen so far is because we
were using shared integers, so releasing the object doesn't free it.
oranagra added a commit that referenced this pull request Mar 24, 2025
in #13505, we changed the code to use the string value of the key rather
than the integer value on the stack, but we have a test in
unit/moduleapi/keyspace_events that uses keyspace notification hook to
modify the value with RM_StringDMA, which can cause this value to be
released before used. the reason it didn't happen so far is because we
were using shared integers, so releasing the object doesn't free it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes indication that this issue needs to be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants