Skip to content

Optimize SCAN with MATCH when pattern implies cluster slot#12536

Merged
madolson merged 3 commits intoredis:unstablefrom
zuiderkwast:scan-with-slot-pattern
Nov 1, 2023
Merged

Optimize SCAN with MATCH when pattern implies cluster slot#12536
madolson merged 3 commits intoredis:unstablefrom
zuiderkwast:scan-with-slot-pattern

Conversation

@zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented Aug 31, 2023

This is the optimization part of #2702. No API changes, just optimization.

Release notes
Optimize the performance of the SCAN command when a match pattern can only contain keys from a single slot in cluster mode.

Copy link
Contributor

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization.

There has been some new changes with dbScan method being introduced. For this optimization, we need to invoke addSlotIdToCursor prior to the dbScan call in scanGenericCommand.

#11695 is greenlit for merge, we can do a rebase/merge after it's in. 👍

@zuiderkwast
Copy link
Contributor Author

There has been some new changes with dbScan method being introduced. For this optimization, we need to invoke addSlotIdToCursor prior to the dbScan call in scanGenericCommand.

Thanks for the advice! It makes it easier to update this.

#11695 is greenlit for merge, we can do a rebase/merge after it's in. 👍

Yes, I saw that! Finally! Yes, I'll update this PR when it is merged change the target to unstable.

@madolson
Copy link
Contributor

@zuiderkwast Yeah, just coming here to say it's merged now so please update.

@zuiderkwast zuiderkwast changed the base branch from dict-split-by-slot to unstable October 17, 2023 08:09
@zuiderkwast zuiderkwast force-pushed the scan-with-slot-pattern branch from ef85b6e to e11f487 Compare October 17, 2023 11:30
@zuiderkwast
Copy link
Contributor Author

@zuiderkwast Yeah, just coming here to say it's merged now so please update.

Done. Consider this ready for review. (I won't force push again.)

@hpatro
Copy link
Contributor

hpatro commented Oct 30, 2023

@madolson Could you please help close this out?

@madolson
Copy link
Contributor

Sure, will take a look tomorrow.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@madolson madolson added the release-notes indication that this issue needs to be mentioned in the release notes label Nov 1, 2023
@madolson madolson merged commit 8878817 into redis:unstable Nov 1, 2023
@madolson
Copy link
Contributor

madolson commented Nov 1, 2023

@zuiderkwast Do you think we should document this optimization in the SCAN command? Part of me thinks we shouldn't, I'm not sure how much I want people to be relying on it.

@soloestoy
Copy link
Contributor

soloestoy commented Nov 1, 2023

I'm not sure if this optimization is the right direction. It implicitly narrows down the scan range from the entire database to a specific slot, which seems to introduce a special syntax: using a hashtag in the pattern to specify a slot. However, we have never officially declared or documented this usage, which can lead to confusion and abuse.

Additionally, we have encountered some interesting cases that are similar to the optimization in this PR. Inappropriate optimizations can sometimes backfire and have unintended consequences.

There was a user who used KEYS abc to check if the key "abc" exists, which is, of course, a very incorrect usage. However, in order to better accommodate their business needs, we optimized this specific scenario. When the argument of KEYS is not a pattern but an exact key, we directly return the existence of that key without iterating through the entire database. The time complexity is reduced from O(N) to O(1). Everything seemed to be working fine, but later when they migrated their business to another service that didn't have this optimization, it caused a failure.

Now looking at the optimization for SCAN in this PR, it is very similar to the optimization for KEYS mentioned earlier. I believe there is a relatively high potential risk (the performance difference between cluster mode and standalone mode results in different iterations required to scan the same pattern). If a user really wants to specify a slot for scanning, it would be better to provide a dedicated command for slot scanning rather than using such a implicit approach.

I think it might be appropriate to revert this commit until we reach a consensus on the resolution for #2702. Also ping @oranagra @yossigo

@zuiderkwast
Copy link
Contributor Author

@madolson It's a good question, but yes, I think we should document it.

Hash tags is the abstraction we want the end users to rely on, when they need to use multi-key commands or transactions with multiple related keys. We don't want them to be aware of the slot number of the key (except possibly advanced users and client developers), that seems to be the conclusion of #2702. So we provide a way to iterate efficiently over these related keys too using the same abstraction.

In the docs, we already tell them use keys like user:{123}:profile and user:{123}:account, so I think we can add a line like "In Redis 8 and later, you can iterate efficiently over the keys related to this user using SCAN with a pattern like user:{123}:*" in the section about hash tags in the cluster docs.

The cluster docs about hash tags

Redis Cluster supports multiple key operations as long as all of the keys involved in a single command execution (or whole transaction, or Lua script execution) belong to the same hash slot. The user can force multiple keys to be part of the same hash slot by using a feature called hash tags.

Hash tags are documented in the Redis Cluster specification, but the gist is that if there is a substring between {} brackets in a key, only what is inside the string is hashed. For example, the keys user:{123}:profile and user:{123}:account are guaranteed to be in the same hash slot because they share the same hash tag. As a result, you can operate on these two keys in the same multi-key operation.

(From https://redis.io/docs/management/scaling/#redis-cluster-data-sharding)

@soloestoy That's an interesting point about KEYS abc, that even an optimization like that can cause trouble. Of course it is not the right usage for KEYS, but maybe it is not the same in this case of SCAN. On the other hand, if code which relies on SCAN {tag}* runs on an older Redis version (or a hosted Redis fork which doesn't have this optimization) there will be a problem, so probably the docs need to be clear that this is efficient only on Redis 8 and later.

@soloestoy
Copy link
Contributor

@zuiderkwast This is not just a performance issue, my main concern is with the syntax: if a pattern contains a hashtag, it will be parsed and mapped to a slot.

However, this is not a good practice, as the pattern is not a key and it does not trigger a "-MOVED" redirection in the cluster protocol. This special usage in SCAN makes it disconnected from other uses, such as the KEYS command which can also include a pattern. And if patterns can be used for slot mapping, should we also allow BY and GET parameters in the SORT command in cluster mode? We have never supported such usage and have not officially defined syntax for pattern-to-slot mapping. Until this syntax is determined, I do not recommend only implementing this in SCAN. It is just the tip of the iceberg, defining a new syntax requires careful consideration.

@oranagra
Copy link
Member

oranagra commented Nov 2, 2023

First, I don't think the KEYS example is related. There, it was a complete (and silly) misuse of KEYS. And here it is a valid use of SCAN.
Secondly, there, it was O(n) to O(1), but here it remains a similar complexity.

I think this optimization will also work quite nicely with cluster routing (even if MOVED isn't handled), i.e. the client will SCAN all the nodes, and all but one will return very quickly. Of course, it doesn't support keys being moved during the scan, but that isn't/ wasn't supported anyway.

I think it is ok to document this.

@soloestoy
Copy link
Contributor

Then how should these commands and usages be handled:

  1. KEYS {tag}*
  2. SORT {tag}uid BY {tag}user_level_*
  3. SORT {tag}uid GET {tag}user_name_*

@oranagra
Copy link
Member

oranagra commented Nov 2, 2023

for KEYS, we can implement a similar optimization (again reducing the complexity from O(n) to a slightly smaller O(n).)
for SORT (putting aside the fact that it's a horrible command that should be deprecated), i think these (GET and BY) should be allowed in cluster mode, when the pattern is tagged (to the same slot the sorted key belongs).

@soloestoy
Copy link
Contributor

for SORT (putting aside the fact that it's a horrible command that should be deprecated), i think these (GET and BY) should be allowed in cluster mode, when the pattern is tagged (to the same slot the sorted key belongs).

Some of our users have requested this feature, and if we consider using a hashtag in patterns as standard syntax, we would be happy to implement it. Please go ahead @CharlesChen888

@soloestoy
Copy link
Contributor

soloestoy commented Nov 2, 2023

for KEYS, we can implement a similar optimization (again reducing the complexity from O(n) to a slightly smaller O(n).)

If KEYS also supports pattern-to-slot mapping, it is highly likely to trigger the issues I mentioned earlier. The time complexity of KEYS remains O(n), but the meaning of "n" has changed:

  1. If the pattern does not use a hashtag, then N is the total number of keys.
  2. If the pattern uses a hashtag, then N is the number of keys in the slot corresponding to that hashtag.

However, compared to the total number of keys, it is difficult for users to have control over the number of keys in a slot. Different slots in the same Redis instance may have varying numbers of keys, and if the difference is significant, it can lead to significant differences in performance.

There is a case that can easily cause a significant performance difference: KEYS {tag}*. If the slot is on the current node, it will traverse that slot. If the slot is not on the current node, there is no need to traverse it. This means that the time complexity changes from O(n) to O(1). And it is also not possible to perform the MOVE reply, this introduces a great deal of instability to the smooth operation of the business.

@oranagra
Copy link
Member

oranagra commented Nov 2, 2023

However, compared to the total number of keys, it is difficult for users to have control over the number of keys in a slot. Different slots in the same Redis instance may have varying numbers of keys, and if the difference is significant, it can lead to significant differences in performance.

i think that argument is about KEYS being unpredictable is more about KEYS * and not about KEYS {tag}*.
i.e. KEYS * can have complexity fluctuations due to resharding, while the one with tag is more constant.

also, i'd argue that when a client executes KEYS * or KEYS {tag}* on an entire cluster (node by node), one using tag would still have rather consistent complexity. i.e. most nodes will return with O(1), and one will return with O(n), so overall (putting round trip latency aside), it's still O(n).

@soloestoy
Copy link
Contributor

If that's the case, using hashtags in patterns would need to be defined as a standard practice in the cluster protocol. We would need detailed documentation to explain its correct usage (it would be especially challenging to explain to users in cases like the one I encountered with KEYS), the documentation should aim to cover all affected commands and provide guidance on constructing patterns correctly, such as ensuring that the hashtag is placed as a prefix before the wildcard.

Additionally, since we allow the use of hashtags in patterns, I think we should also confirm whether slot redirection (-MOVED) is necessary before executing the command. This can help avoid ineffective execution of commands on nodes that do not hold the specified slot.

@oranagra
Copy link
Member

oranagra commented Nov 2, 2023

@yossigo please comment

@madolson
Copy link
Contributor

madolson commented Nov 2, 2023

Well, I wasn't expecting this to be so contentious 🐙.

I'm inclined to say we should just document it as an optimization, but you really shouldn't be using SCAN all that much in cluster mode anyways. It is for infrequent O(N) operations that can be orchestrated externally, and this is just an optimization to speed it up.

Additionally, since we allow the use of hashtags in patterns, I think we should also confirm whether slot redirection (-MOVED) is necessary before executing the command. This can help avoid ineffective execution of commands on nodes that do not hold the specified slot.

I don't think there is any way to do this without asking clients to do work and introducing a new concept for hashable pattern. I'm not materially against it, we discussed something similar for SORT in cluster mode, and decided it wasn't worth it. We haven't really gotten materially feedback on SORT in a long time, so I think we should ignore it.

@zuiderkwast
Copy link
Contributor Author

In retrospect, I don't know how useful this actually is. If you have a bunch of keys linked to a user, user:{123}:profile, user:{123}:score, etc. it's typically just a handful known keys, not something you need to discover using SCAN.

The more relevant use case for cluster SCAN is rather the one with SCAN guarantees over the whole cluster in the precence of migrations, discussed in #2702.

@zuiderkwast zuiderkwast deleted the scan-with-slot-pattern branch December 1, 2023 07:56
oranagra pushed a commit that referenced this pull request Dec 5, 2023
#12754)

in #12536 we made a similar optimization for SCAN, now that hashtags in
patterns. When we can make sure all keys matching the pettern will be in
the same slot, we can limit the iteration to run only one one.
@madolson
Copy link
Contributor

madolson commented Dec 8, 2023

In retrospect, I don't know how useful this actually is. If you have a bunch of keys linked to a user, user:{123}:profile, user:{123}:score, etc. it's typically just a handful known keys, not something you need to discover using SCAN.

In many cases, the data stored under a user might be more denormalized than this though. They might do something user:{123}:data_1, where data_1 is just one of the many pieces of data they have. I know of at least one user that runs a nightly job scanning through Redis to delete data for GDPR compliance, maybe this would also them to better scope down their accesses to just the users they know they need to delete data from.

oranagra pushed a commit that referenced this pull request Dec 13, 2023
…lies slot. (#12728)

The by/get options of sort/sort_ro command used to be forbidden in
cluster mode, since we are not sure which slot the pattern may be in.

As the optimization done in #12536, patterns now can be mapped to slots,
we should allow by/get options in cluster mode when the pattern maps to
the same slot as the key.
zuiderkwast added a commit to redis/redis-doc that referenced this pull request Jan 18, 2024
Adding to the cluster spec documentation of the following Redis features:

redis/redis#12754 (SCAN)
redis/redis#12536 (KEYS)
redis/redis#12728 (SORT, SORT_RO)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants