Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance for scan command when matching pattern or data type #12209

Merged
merged 38 commits into from
Jun 27, 2023

Conversation

judeng
Copy link
Contributor

@judeng judeng commented May 22, 2023

Optimized the performance of the SCAN command in a few ways:

  1. Move the key filtering (by MATCH pattern) in the scan callback, so as to avoid collecting them for later filtering.
  2. Reduce a many memory allocations and copying (use a reference to the original sds, instead of creating an robj, an excessive 2 mallocs and one string duplication)
  3. Compare TYPE filter directly (as integers), instead of inefficient string compare per key.
  4. fixed a small bug: when scan zset and hash types, maxiterations uses a more accurate number to avoid wrong double maxiterations.

Changes postponed for a later version (8.0):

  1. Prepare to move the TYPE filtering to the scan callback as well. this was put on hold since it has side effects that can be considered a breaking change, which is that we will not attempt to do lazy expire (delete) a key that was filtered by not matching the TYPE (changing it would mean TYPE filter starts behaving the same as MATCH filter already does in that respect).
  2. when the specified key TYPE filter is an unknown type, server will reply a error immediately instead of doing a full scan that comes back empty handed.

Benchmark result:
For different scenarios, we obtained about 30% more performance improvement:

  • scene 1: scan the key space with different filter
  no filter scan pattern and 100% unmatched scan pattern and 50% unmatched scan type and 100% matched scan type and 100% unmatched scan type and 50% matched
unstable 31173 50279 36174 22936 32931 27603
pr 40264 110661 53689 32028 50437 37736
improvement 29.16% 120.09% 48.42% 39.64% 53.16% 36.71%
  • scene 2: hscan key which use ht encoding
hscan(hashtable encoding) 100% matched hscan pattern and 100% unmatched
unstable 17740 32457
this pr 22734 108945
improvement 28.15% 235.66%
  • scene 3: sscan key which use intset encoding
sscan intset 100% matched sscan pattern and 20% matched
unstable 7529 14567
pr 7418 21763
imporovement -1.47% 49.40%
  • scene 4: zscan key which use listpack encoding
zscan listpack 100% matched zscan pattern and 10% matched
unstable 18422 28857
this pr 18826 83565
imporovement 2.19% 189.58%

@judeng
Copy link
Contributor Author

judeng commented May 22, 2023

CI failed again and seem that nothing to with this pr...
but I didn't find out why, @enjoy-binbin could you help me to have a look? thanks! refer https://github.com/redis/redis/actions/runs/5044212812/jobs/9046965683?pr=12209#step:6:1253

@enjoy-binbin
Copy link
Collaborator

don't worry, it looks like the failure has nothing to do with this PR.
the test framework occasionally does this.

@judeng judeng changed the title avoid double checking expire time for scan command when command specifying a definite type improve performance for scan command when command specifying a definite data type May 23, 2023
@vitarb
Copy link
Contributor

vitarb commented May 23, 2023

Do we have any confirmation from benchmarks or profiling which show that this provides any performance benefits?
Also it looks like module code is already using something similar https://github.com/redis/redis/blob/unstable/src/module.c#L10895 should we try to unify our approach?

@judeng
Copy link
Contributor Author

judeng commented May 29, 2023

Do we have any confirmation from benchmarks or profiling which show that this provides any performance benefits?
Also it looks like module code is already using something similar https://github.com/redis/redis/blob/unstable/src/module.c#L10895 should we try to unify our approach?

@vitarb thank you and sorry for the late reply, we indeed need a benchmark and I will test it when all advises are fixed.
I'm not good at aboult module, IMO the code is reusable enough, both scan command and module use the dictScan function, but have a different callback function.

@judeng judeng changed the title improve performance for scan command when command specifying a definite data type improve performance for scan command when matching pattern or data type May 30, 2023
@judeng
Copy link
Contributor Author

judeng commented May 31, 2023

@vitarb @oranagra hi, here is the perfomance result:

  1. scan keys which have expirtion time
scan  match all scan pattern and 100% unmatched scan pattern and 50% unmatched scan type and 100% matched scan type and 100% unmatched scan type and 50% matched
unstable 24785 60916 33435 16693 26946 21249
this pr 24925 169997 40288 22913 130979 42820
  1. hscan key which use dict encoding
hscan 100% matched hscan pattern and 100% unmatched
unstable 17949 31693
this pr 18088 143322
  1. sscan key wich use intset encoding
sscan intset 100% matched sscan pattern and 20% matched
unstable 7529 10255
this pr 7819 22122
  1. zscan key which use listpack encoding
zscan listpack 100% matched zscan pattern and 10% matched
unstable 18794 28605
this pr 20004 81147

Co-authored-by: sundb <[email protected]>
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. mostly LGTM>

@judeng
Copy link
Contributor Author

judeng commented Jun 16, 2023

@oranagra I just finished it according to your last review and compare the benchmark performance bettwen before and after the lastest commit.

benchmark command: scan 0 type string count 1000

scan 100% type matched 100% type unmatched
before 2432 7811
after 2491 8537

benchmark command:sscan set1 0 count 1000, set1 is a key with intset encoding.
benchmark command:sscan hash1 0 count 1000, hash1 is a key with hash encoding, and every field is bigger the 44bytes(avoid
use emb str object).

command sscan hscan
before 18959 114058
after 19099 121752

result analysis:

  1. when type string convert to interger, if the key 100% match the type, the performance bottleneck seems to be on the reply, and the performance increase is negligible. But when all key can't match the type, that is, reply is always empty, we gain a 9% increase.
  2. Using the sds instread of robj, when filed is small that always used the emb string object, there is nothing different in performance. But when the robj is raw string at before, we gain a 7% increase.

@oranagra
Copy link
Member

@judeng thanks for following all my requests (both the tips to incrementally further improve SCAN compared to what this PR initially meant to do), and also the rollback process, tests and comments.
i'm quite happy with the final result.

one last comment to handle IMHO is #12209 (comment)
i.e. next to the currently disabled SCAN unknown type test, add a temporary one that does the opposite (prove that unknown types don't filter anything).

after that, please refresh the top comment, which i'll use as a squash-merge commit comment. if you can also include some raw benchmark numbers to give an idea of the impact of this PR on performance. and then i'll merge it.

@judeng
Copy link
Contributor Author

judeng commented Jun 26, 2023

Looks all set, I'm going to retest the performance impact tomorrow.

@oranagra
Copy link
Member

@judeng I updated the top comment. let me know if you see any problem.
i.e. the current version doesn't have any behavior changes (we moved them all to 8.0)

@oranagra
Copy link
Member

i checked out this PR, to do two things:

  1. run coverage report, in which i found that the listDelNode after expireIfNeeded was uncovered (apparently all the tests for expiring keys used TYPE filter).
  2. run all the new tests against the old version, to make sure we didn't have any unnoticed behavior change.

@oranagra
Copy link
Member

oranagra commented Jun 26, 2023

@oranagra
Copy link
Member

  • run coverage report, in which i found that the listDelNode after expireIfNeeded was uncovered (apparently all the tests for expiring keys used TYPE filter).

i now realize that this was already covered by expire.tcl (i only tested coverage by running scan.tcl).
but that test seems to be focused on MULTI-EXEC and replication, so i suppose we better keep the one i added.

Co-authored-by: sundb <[email protected]>
@judeng
Copy link
Contributor Author

judeng commented Jun 27, 2023

@oranagra I have updated the performance test results in the top comment, it seems that except for the type filter, the other results have not changed much

@oranagra
Copy link
Member

Thanks.
is the first column, the one who says "match all" completely unfiltered?
i understand that none of these tests use volatile keys, right?

@judeng
Copy link
Contributor Author

judeng commented Jun 27, 2023

is the first column, the one who says "match all" completely unfiltered?

yes, I used the match *, actually it is eqaul with no any filters.

i understand that none of these tests use volatile keys, right?

yes, I used the no-volatile keys to better display the perfermance improvement :-)

@oranagra oranagra merged commit 07ed0ea into redis:unstable Jun 27, 2023
@oranagra
Copy link
Member

merged.
@judeng thank you for your dedicated work.
feel free to issue another PR to delete / uncomment lines, which i'll merge to unstable once 7.2 is out. (just so that we don't forget).
feel free to re-do some benchmark to show what additional improvements it brigs.

@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes and removed state:major-decision Requires core team consensus labels Jun 27, 2023
@oranagra
Copy link
Member

oranagra commented Jul 2, 2023

@judeng can you make a PR? i'm afraid to forget...

@judeng
Copy link
Contributor Author

judeng commented Jul 3, 2023

@oranagra Sorry, I didn't forget about it, just had a couple of nasty accidents last week so I didn't have time, I'll to work on it this week.

@oranagra oranagra mentioned this pull request Jul 10, 2023
sundb pushed a commit that referenced this pull request Aug 20, 2024
Move the TYPE filtering to the scan callback so that avoided the
`lookupKey` operation. This is the follow-up to #12209 . In this thread
we introduced two breaking changes:
1. we will not attempt to do lazy expire (delete) a key that was
filtered by not matching the TYPE (like we already do for MATCH
pattern).
2. when the specified key TYPE filter is an unknown type, server will
reply a error immediately instead of doing a full scan that comes back
empty handed.
sundb pushed a commit to sundb/redis that referenced this pull request Aug 29, 2024
…2395)

Move the TYPE filtering to the scan callback so that avoided the
`lookupKey` operation. This is the follow-up to redis#12209 . In this thread
we introduced two breaking changes:
1. we will not attempt to do lazy expire (delete) a key that was
filtered by not matching the TYPE (like we already do for MATCH
pattern).
2. when the specified key TYPE filter is an unknown type, server will
reply a error immediately instead of doing a full scan that comes back
empty handed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes indication that this issue needs to be mentioned in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants