Optimize SCAN with MATCH when pattern implies cluster slot by zuiderkwast · Pull Request #12536 · redis/redis

zuiderkwast · 2023-08-31T22:23:23Z

This is the optimization part of #2702. No API changes, just optimization.

Release notes
Optimize the performance of the SCAN command when a match pattern can only contain keys from a single slot in cluster mode.

hpatro

Nice optimization.

There has been some new changes with dbScan method being introduced. For this optimization, we need to invoke addSlotIdToCursor prior to the dbScan call in scanGenericCommand.

#11695 is greenlit for merge, we can do a rebase/merge after it's in. 👍

src/cluster.c

zuiderkwast · 2023-10-13T19:14:55Z

There has been some new changes with dbScan method being introduced. For this optimization, we need to invoke addSlotIdToCursor prior to the dbScan call in scanGenericCommand.

Thanks for the advice! It makes it easier to update this.

#11695 is greenlit for merge, we can do a rebase/merge after it's in. 👍

Yes, I saw that! Finally! Yes, I'll update this PR when it is merged change the target to unstable.

madolson · 2023-10-16T21:08:42Z

@zuiderkwast Yeah, just coming here to say it's merged now so please update.

zuiderkwast · 2023-10-17T13:05:39Z

@zuiderkwast Yeah, just coming here to say it's merged now so please update.

Done. Consider this ready for review. (I won't force push again.)

src/db.c

tests/unit/scan.tcl

hpatro · 2023-10-30T18:52:08Z

@madolson Could you please help close this out?

madolson · 2023-10-31T04:04:08Z

Sure, will take a look tomorrow.

madolson

LGTM

madolson · 2023-11-01T07:07:49Z

@zuiderkwast Do you think we should document this optimization in the SCAN command? Part of me thinks we shouldn't, I'm not sure how much I want people to be relying on it.

soloestoy · 2023-11-01T09:56:47Z

I'm not sure if this optimization is the right direction. It implicitly narrows down the scan range from the entire database to a specific slot, which seems to introduce a special syntax: using a hashtag in the pattern to specify a slot. However, we have never officially declared or documented this usage, which can lead to confusion and abuse.

Additionally, we have encountered some interesting cases that are similar to the optimization in this PR. Inappropriate optimizations can sometimes backfire and have unintended consequences.

There was a user who used KEYS abc to check if the key "abc" exists, which is, of course, a very incorrect usage. However, in order to better accommodate their business needs, we optimized this specific scenario. When the argument of KEYS is not a pattern but an exact key, we directly return the existence of that key without iterating through the entire database. The time complexity is reduced from O(N) to O(1). Everything seemed to be working fine, but later when they migrated their business to another service that didn't have this optimization, it caused a failure.

Now looking at the optimization for SCAN in this PR, it is very similar to the optimization for KEYS mentioned earlier. I believe there is a relatively high potential risk (the performance difference between cluster mode and standalone mode results in different iterations required to scan the same pattern). If a user really wants to specify a slot for scanning, it would be better to provide a dedicated command for slot scanning rather than using such a implicit approach.

I think it might be appropriate to revert this commit until we reach a consensus on the resolution for #2702. Also ping @oranagra @yossigo

zuiderkwast · 2023-11-01T10:51:40Z

@madolson It's a good question, but yes, I think we should document it.

Hash tags is the abstraction we want the end users to rely on, when they need to use multi-key commands or transactions with multiple related keys. We don't want them to be aware of the slot number of the key (except possibly advanced users and client developers), that seems to be the conclusion of #2702. So we provide a way to iterate efficiently over these related keys too using the same abstraction.

In the docs, we already tell them use keys like user:{123}:profile and user:{123}:account, so I think we can add a line like "In Redis 8 and later, you can iterate efficiently over the keys related to this user using SCAN with a pattern like user:{123}:*" in the section about hash tags in the cluster docs.

The cluster docs about hash tags

Redis Cluster supports multiple key operations as long as all of the keys involved in a single command execution (or whole transaction, or Lua script execution) belong to the same hash slot. The user can force multiple keys to be part of the same hash slot by using a feature called hash tags.

Hash tags are documented in the Redis Cluster specification, but the gist is that if there is a substring between {} brackets in a key, only what is inside the string is hashed. For example, the keys user:{123}:profile and user:{123}:account are guaranteed to be in the same hash slot because they share the same hash tag. As a result, you can operate on these two keys in the same multi-key operation.

(From https://redis.io/docs/management/scaling/#redis-cluster-data-sharding)

@soloestoy That's an interesting point about KEYS abc, that even an optimization like that can cause trouble. Of course it is not the right usage for KEYS, but maybe it is not the same in this case of SCAN. On the other hand, if code which relies on SCAN {tag}* runs on an older Redis version (or a hosted Redis fork which doesn't have this optimization) there will be a problem, so probably the docs need to be clear that this is efficient only on Redis 8 and later.

soloestoy · 2023-11-02T02:42:25Z

@zuiderkwast This is not just a performance issue, my main concern is with the syntax: if a pattern contains a hashtag, it will be parsed and mapped to a slot.

However, this is not a good practice, as the pattern is not a key and it does not trigger a "-MOVED" redirection in the cluster protocol. This special usage in SCAN makes it disconnected from other uses, such as the KEYS command which can also include a pattern. And if patterns can be used for slot mapping, should we also allow BY and GET parameters in the SORT command in cluster mode? We have never supported such usage and have not officially defined syntax for pattern-to-slot mapping. Until this syntax is determined, I do not recommend only implementing this in SCAN. It is just the tip of the iceberg, defining a new syntax requires careful consideration.

oranagra · 2023-11-02T05:05:25Z

First, I don't think the KEYS example is related. There, it was a complete (and silly) misuse of KEYS. And here it is a valid use of SCAN.
Secondly, there, it was O(n) to O(1), but here it remains a similar complexity.

I think this optimization will also work quite nicely with cluster routing (even if MOVED isn't handled), i.e. the client will SCAN all the nodes, and all but one will return very quickly. Of course, it doesn't support keys being moved during the scan, but that isn't/ wasn't supported anyway.

I think it is ok to document this.

soloestoy · 2023-11-02T05:42:04Z

Then how should these commands and usages be handled:

KEYS {tag}*
SORT {tag}uid BY {tag}user_level_*
SORT {tag}uid GET {tag}user_name_*

oranagra · 2023-11-02T07:08:51Z

for KEYS, we can implement a similar optimization (again reducing the complexity from O(n) to a slightly smaller O(n).)
for SORT (putting aside the fact that it's a horrible command that should be deprecated), i think these (GET and BY) should be allowed in cluster mode, when the pattern is tagged (to the same slot the sorted key belongs).

soloestoy · 2023-11-02T08:00:46Z

for SORT (putting aside the fact that it's a horrible command that should be deprecated), i think these (GET and BY) should be allowed in cluster mode, when the pattern is tagged (to the same slot the sorted key belongs).

Some of our users have requested this feature, and if we consider using a hashtag in patterns as standard syntax, we would be happy to implement it. Please go ahead @CharlesChen888

soloestoy · 2023-11-02T08:03:17Z

for KEYS, we can implement a similar optimization (again reducing the complexity from O(n) to a slightly smaller O(n).)

If KEYS also supports pattern-to-slot mapping, it is highly likely to trigger the issues I mentioned earlier. The time complexity of KEYS remains O(n), but the meaning of "n" has changed:

If the pattern does not use a hashtag, then N is the total number of keys.
If the pattern uses a hashtag, then N is the number of keys in the slot corresponding to that hashtag.

However, compared to the total number of keys, it is difficult for users to have control over the number of keys in a slot. Different slots in the same Redis instance may have varying numbers of keys, and if the difference is significant, it can lead to significant differences in performance.

There is a case that can easily cause a significant performance difference: KEYS {tag}*. If the slot is on the current node, it will traverse that slot. If the slot is not on the current node, there is no need to traverse it. This means that the time complexity changes from O(n) to O(1). And it is also not possible to perform the MOVE reply, this introduces a great deal of instability to the smooth operation of the business.

src/cluster.c

oranagra · 2023-11-02T08:50:56Z

However, compared to the total number of keys, it is difficult for users to have control over the number of keys in a slot. Different slots in the same Redis instance may have varying numbers of keys, and if the difference is significant, it can lead to significant differences in performance.

i think that argument is about KEYS being unpredictable is more about KEYS * and not about KEYS {tag}*.
i.e. KEYS * can have complexity fluctuations due to resharding, while the one with tag is more constant.

also, i'd argue that when a client executes KEYS * or KEYS {tag}* on an entire cluster (node by node), one using tag would still have rather consistent complexity. i.e. most nodes will return with O(1), and one will return with O(n), so overall (putting round trip latency aside), it's still O(n).

soloestoy · 2023-11-02T09:09:44Z

If that's the case, using hashtags in patterns would need to be defined as a standard practice in the cluster protocol. We would need detailed documentation to explain its correct usage (it would be especially challenging to explain to users in cases like the one I encountered with KEYS), the documentation should aim to cover all affected commands and provide guidance on constructing patterns correctly, such as ensuring that the hashtag is placed as a prefix before the wildcard.

Additionally, since we allow the use of hashtags in patterns, I think we should also confirm whether slot redirection (-MOVED) is necessary before executing the command. This can help avoid ineffective execution of commands on nodes that do not hold the specified slot.

oranagra · 2023-11-02T10:02:07Z

@yossigo please comment

madolson · 2023-11-02T21:33:39Z

Well, I wasn't expecting this to be so contentious 🐙.

I'm inclined to say we should just document it as an optimization, but you really shouldn't be using SCAN all that much in cluster mode anyways. It is for infrequent O(N) operations that can be orchestrated externally, and this is just an optimization to speed it up.

Additionally, since we allow the use of hashtags in patterns, I think we should also confirm whether slot redirection (-MOVED) is necessary before executing the command. This can help avoid ineffective execution of commands on nodes that do not hold the specified slot.

I don't think there is any way to do this without asking clients to do work and introducing a new concept for hashable pattern. I'm not materially against it, we discussed something similar for SORT in cluster mode, and decided it wasn't worth it. We haven't really gotten materially feedback on SORT in a long time, so I think we should ignore it.

zuiderkwast · 2023-11-13T07:41:14Z

In retrospect, I don't know how useful this actually is. If you have a bunch of keys linked to a user, user:{123}:profile, user:{123}:score, etc. it's typically just a handful known keys, not something you need to discover using SCAN.

The more relevant use case for cluster SCAN is rather the one with SCAN guarantees over the whole cluster in the precence of migrations, discussed in #2702.

#12754) in #12536 we made a similar optimization for SCAN, now that hashtags in patterns. When we can make sure all keys matching the pettern will be in the same slot, we can limit the iteration to run only one one.

madolson · 2023-12-08T22:29:14Z

In retrospect, I don't know how useful this actually is. If you have a bunch of keys linked to a user, user:{123}:profile, user:{123}:score, etc. it's typically just a handful known keys, not something you need to discover using SCAN.

In many cases, the data stored under a user might be more denormalized than this though. They might do something user:{123}:data_1, where data_1 is just one of the many pieces of data they have. I know of at least one user that runs a nightly job scanning through Redis to delete data for GDPR compliance, maybe this would also them to better scope down their accesses to just the users they know they need to delete data from.

…lies slot. (#12728) The by/get options of sort/sort_ro command used to be forbidden in cluster mode, since we are not sure which slot the pattern may be in. As the optimization done in #12536, patterns now can be mapped to slots, we should allow by/get options in cluster mode when the pattern maps to the same slot as the key.

Adding to the cluster spec documentation of the following Redis features: redis/redis#12754 (SCAN) redis/redis#12536 (KEYS) redis/redis#12728 (SORT, SORT_RO)

hpatro reviewed Oct 13, 2023

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

zuiderkwast commented Oct 13, 2023

View reviewed changes

src/cluster.c Outdated Show resolved Hide resolved

zuiderkwast changed the base branch from dict-split-by-slot to unstable October 17, 2023 08:09

Optimize SCAN with MATCH when pattern implies cluster slot

e11f487

zuiderkwast force-pushed the scan-with-slot-pattern branch from ef85b6e to e11f487 Compare October 17, 2023 11:30

Add test case

a9b364d

hpatro reviewed Oct 19, 2023

View reviewed changes

src/db.c Show resolved Hide resolved

hpatro reviewed Oct 27, 2023

View reviewed changes

tests/unit/scan.tcl Outdated Show resolved Hide resolved

Test case: Add some keys in a slot after foo's slot

9453543

hpatro approved these changes Oct 30, 2023

View reviewed changes

madolson approved these changes Nov 1, 2023

View reviewed changes

madolson added the release-notes indication that this issue needs to be mentioned in the release notes label Nov 1, 2023

madolson merged commit 8878817 into redis:unstable Nov 1, 2023

CharlesChen888 reviewed Nov 2, 2023

View reviewed changes

src/cluster.c Show resolved Hide resolved

CharlesChen888 mentioned this pull request Nov 6, 2023

Support by/get options for sort(_ro) in cluster mode when pattern implies slot. #12728

Merged

CharlesChen888 mentioned this pull request Nov 13, 2023

Optimize KEYS when pattern includes hashtag and implies a single slot. #12754

Merged

zuiderkwast deleted the scan-with-slot-pattern branch December 1, 2023 07:56

zuiderkwast mentioned this pull request Jan 17, 2024

Explain patterns with hash tags in cluster spec redis/redis-doc#2646

Merged

soloestoy mentioned this pull request Aug 28, 2024

Make KEYS to be an exact match if there is no pattern valkey-io/valkey#792

Merged

Conversation

zuiderkwast commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hpatro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Oct 13, 2023

Uh oh!

madolson commented Oct 16, 2023

Uh oh!

zuiderkwast commented Oct 17, 2023

Uh oh!

Uh oh!

Uh oh!

hpatro commented Oct 30, 2023

Uh oh!

madolson commented Oct 31, 2023

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

madolson commented Nov 1, 2023

Uh oh!

soloestoy commented Nov 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast commented Nov 1, 2023

Uh oh!

soloestoy commented Nov 2, 2023

Uh oh!

oranagra commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soloestoy commented Nov 2, 2023

Uh oh!

oranagra commented Nov 2, 2023

Uh oh!

soloestoy commented Nov 2, 2023

Uh oh!

soloestoy commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

oranagra commented Nov 2, 2023

Uh oh!

soloestoy commented Nov 2, 2023

Uh oh!

oranagra commented Nov 2, 2023

Uh oh!

madolson commented Nov 2, 2023

Uh oh!

zuiderkwast commented Nov 13, 2023

Uh oh!

madolson commented Dec 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zuiderkwast commented Aug 31, 2023 •

edited

Loading

soloestoy commented Nov 1, 2023 •

edited

Loading

oranagra commented Nov 2, 2023 •

edited

Loading

soloestoy commented Nov 2, 2023 •

edited

Loading