SCAN/RANDOMKEY and lazy-expire#11788
Conversation
Starting from Redis 7.0 we wrap everything a command propagates with MULTI/EXEC the problem is that both SCAN and RANDOMKEY can lazy-expire random keys (similar behavior to active-expire) 1. When these commands are called without a parent exec-unit we avoid a transaction (for the same reasons active-expire avoids a transaction) 2. When these commands are called with a parent exec-unit we can't avoid a transaction. In order to prevent CROSSSLOT in cluster mode, we just avoid lazy-expiry in thi scenario.
|
It is introduced in #9890. I don't think this PR is a right way, as I know lots of users are using What we should do is decouple the expire delete propagation and the command propagation, just as what eviction does. A way to fix it is call |
the change here doesn't prevent SCAN from reclaiming expired keys, it just avoids the transaction (similar outcome to what eviction does). i updated the top comment, please have a look and fix. i think that the outcome we're aiming for here is ok, but i'm not certain about the implementation. |
|
comment regarding my last commit (f71801c): |
|
@oranagra @soloestoy there's also the matter of |
src/module.c
Outdated
| else if (!strcasecmp(t,"no-cluster")) flags |= CMD_MODULE_NO_CLUSTER; | ||
| else if (!strcasecmp(t,"no-mandatory-keys")) flags |= CMD_NO_MANDATORY_KEYS; | ||
| else if (!strcasecmp(t,"allow-busy")) flags |= CMD_ALLOW_BUSY; | ||
| else if (!strcasecmp(t,"access-arbitrary-keys")) flags |= CMD_TOUCHES_ARBITRARY_KEYS; |
There was a problem hiding this comment.
this needs to get documented.
also, this means the PR became a "major decision" (due to interface change)
|
not sure we can afford mark SORT with the ARBITRARY_KEYS flag (it does access two declared keys). |
I don't think so, we should guarantee all propagations are valid, I can give some use cases:
|
|
TBH, I don't like introduce the if we touch a expired key in a write command, like but the right replication data should be the lazy expire deletion should be stripped from command also from transaction IMO, we need find a way to finish it. |
| * to avoid using a transaction (much like active-expire) */ | ||
| if (server.current_client && | ||
| server.current_client->cmd && | ||
| server.current_client->cmd->flags & CMD_TOUCHES_ARBITRARY_KEYS) |
There was a problem hiding this comment.
seems if call SCAN inside MULTI/EXEC, we can still get a CROSSSLOT transaction?
|
we also have a tool that replicates data from the replication stream into masters, that was the trigger for this PR. I'm not certain that the case of lazy expire inside a SET command is right to be propagated without a transaction. there may be complex cases where touching multiple keys and the DELs are interleaved with other propagations, and taking them out means sending thing out of the order in which they happened. i fear there might be a case where this can break something. |
yes, you are right, but #9890 expanded impact, I mean a single SCAN outside scripts or transaction is a very common case, I'm afraid this may cause many new problems to other project. As you said SCAN calls from scripts or transaction can lead to the problem too, seems this usage is not a normal case. But I still wanna fix it completely, especially this PR points out the problem now.
I understand your worry, the order is clear, should propagate DELs before the read commands, but it's not easy to guarantee, since we have too many special cases... but from another hand, I think this is a good chance to figure it out and sort out the mess all cases about propagation, what do you think? |
|
maybe we can take it in steps then... as we realized that the change in redis 7.0 for scan can cause issues (not for redis, but for the ecosystem), maybe we should merge this fix as is now and even backport that to the next release of 7.0. |
makes sense, I opened an issue #11792 to track the problem |
|
@redis/core-team please approve. |
|
uh... sorry I forgot module flag, if we agree this is just a temporary method, I think we don't need expose the flag to module |
|
In theory, the flag can be kept, and even exposed to users even if we solve the lazy expiry issue. I.e. that flag really describes the command behavior. |
|
Top comment seems OK to me, but I noticed the major decision was removed so not going to spend more time looking at it. |
Starting from Redis 7.0 (#9890) we started wrapping everything a command propagates with MULTI/EXEC. The problem is that both SCAN and RANDOMKEY can lazy-expire arbitrary keys (similar behavior to active-expire), and put DELs in a transaction. Fix: When these commands are called without a parent exec-unit (e.g. not in EVAL or MULTI) we avoid wrapping their DELs in a transaction (for the same reasons active-expire and eviction avoids a transaction) This PR adds a per-command flag that indicates that the command may touch arbitrary keys (not the ones in the arguments), and uses that flag to avoid the MULTI-EXEC. For now, this flag is internal, since we're considering other solutions for the future. Note for cluster mode: if SCAN/RANDOMKEY is inside EVAL/MULTI it can still cause the same situation (as it always did), but it won't cause a CROSSSLOT because replicas and AOF do not perform slot checks. The problem with the above is mainly for 3rd party ecosystem tools that propagate commands from master to master, or feed an AOF file with redis-cli into a master. This PR aims to fix the regression in redis 7.0, and we opened #11792 to try to handle the bigger problem with lazy expire better for another release. (cherry picked from commit fd82bcc)
Starting from Redis 7.0 (redis#9890) we started wrapping everything a command propagates with MULTI/EXEC. The problem is that both SCAN and RANDOMKEY can lazy-expire arbitrary keys (similar behavior to active-expire), and put DELs in a transaction. Fix: When these commands are called without a parent exec-unit (e.g. not in EVAL or MULTI) we avoid wrapping their DELs in a transaction (for the same reasons active-expire and eviction avoids a transaction) This PR adds a per-command flag that indicates that the command may touch arbitrary keys (not the ones in the arguments), and uses that flag to avoid the MULTI-EXEC. For now, this flag is internal, since we're considering other solutions for the future. Note for cluster mode: if SCAN/RANDOMKEY is inside EVAL/MULTI it can still cause the same situation (as it always did), but it won't cause a CROSSSLOT because replicas and AOF do not perform slot checks. The problem with the above is mainly for 3rd party ecosystem tools that propagate commands from master to master, or feed an AOF file with redis-cli into a master. This PR aims to fix the regression in redis 7.0, and we opened redis#11792 to try to handle the bigger problem with lazy expire better for another release.
Starting from Redis 7.0 (#9890) we started wrapping everything a command propagates with MULTI/EXEC.
The problem is that both SCAN and RANDOMKEY can lazy-expire arbitrary keys (similar behavior to active-expire), and put DELs in a transaction.
Fix: When these commands are called without a parent exec-unit (e.g. not in EVAL or MULTI) we avoid wrapping their DELs in a transaction (for the same reasons active-expire and eviction avoids a transaction)
This PR adds a per-command flag that indicates that the command may touch arbitrary keys (not the ones in the arguments), and uses that flag to avoid the MULTI-EXEC.
For now, this flag is internal, since we're considering other solutions for the future.
Note for cluster mode: if SCAN/RANDOMKEY is inside EVAL/MULTI it can still cause the same situation (as it always did), but it won't cause a CROSSSLOT because replicas and AOF do not perform slot checks.
The problem with the above is mainly for 3rd party ecosystem tools that propagate commands from master to master, or feed an AOF file with redis-cli into a master.
This PR aims to fix the regression in redis 7.0, and we opened #11792 to try to handle the bigger problem with lazy expire better for another release.