Support delay trimming slots after finishing migrating slots#14567
Support delay trimming slots after finishing migrating slots#14567ShooterIT merged 22 commits intoredis:unstablefrom
Conversation
src/cluster_asm.c
Outdated
| /* On migrating side, the replica is unaware of the slot migration, so it | ||
| * doesn't know when the migration is completed (start having unowned keys), | ||
| * We delay to fire the event until the TRIMSLOTS is propagated to the replica. | ||
| * Maybe optimize it in the future. */ |
There was a problem hiding this comment.
Just thinking out loud, let's say migrating side replica becomes master before receiving TRIMSLOTS.
- If it happens after replica receives config update via bus, then the new master will be unaware of unowned keys.
- If it happens before replica receives config update via bus, then, on the config update, the new master will do a blocking trim as there is no asm task to associate.
I assume we don't have these issues on importing side. Also, I wonder what we can do to improve these two.
|
Hi @oranagra This PR introduce two APIs:
At the moment, the only supported capability is CLUSTER_MODULE_POLICY_NO_TRIM, which prevents key trimming after atomic slot migration. These APIs are intentionally generic so that additional server capabilities can be introduced in the future, instead of providing dedicated functions such as Do you think these APIs make sense? |
|
LGTM except for the terminology. both the "acquire" and more importantly the "capability" terms don't seem good to me. |
|
Thank you @oranagra
yes, I had a long discussion with chatpgt, and did not get satisfactory conclusion, the reason why I use maybe a dedicated API is not a bad idea, and I even prefer it, The word “reshard” may be confused with our internal resharding approach, how about |
|
I was the one mainly insisted on generic API as I thought we could need something similar in future. Though, now I feel like it causes more confusion and each one of us has different taste :) So, I vote for dedicated API as well. I think all of the cluster functions include |
So I would use whatever, but I agree that this generic API name can feel a little confusing and an overkill trying to generalize for now. So dedicated API seems best to me |
|
Thank you all, let's use dedicated APIs, |
This PR introduces a mechanism that allows a module to temporarily
disable trimming after an ASM migration operation so it can safely
finish ongoing asynchronous jobs that depend on keys in migrating (and
about to be trimmed) slots.
1. **ClusterDisableTrim/ClusterEnableTrim**
We introduce `ClusterDisableTrim/ClusterEnableTrim` Module APIs to allow
module to disable/enable slot migration
```
/* Disable automatic slot trimming. */
int RM_ClusterDisableTrim(RedisModuleCtx *ctx)
/* Enable automatic slot trimming */
int RM_ClusterEnableTrim(RedisModuleCtx *ctx)
```
**Please notice**: Redis will not start any subsequent import or migrate
ASM operations while slot trimming is disabled, so modules must
re-enable trimming immediately after completing their pending work.
The only valid and meaningful time for a module to disable trimming
appears to be after the MIGRATE_COMPLETED event.
2. **REDISMODULE_OPEN_KEY_ACCESS_TRIMMED**
Added REDISMODULE_OPEN_KEY_ACCESS_TRIMMED to RM_OpenKey() so that module
can operate with these keys in the unowned slots after trim is paused.
And now we don't delete the key if it is in trim job when we access it.
And `expireIfNeeded` returns `KEY_VALID` if
`EXPIRE_ALLOW_ACCESS_TRIMMED` is set, otherwise, returns `KEY_TRIMMED`
without deleting key.
3. **REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS**
We also extend RM_GetContextFlags() to include a flag
REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS indicating whether a trimming job
is pending (due to trim pause) or in progress. Modules could
periodically poll this flag to synchronize their internal state, e.g.,
if a trim job was delayed or if the module incorrectly assumed trimming
was still active.
Bugfix: RM_SetClusterFlags could not clear a flag after enabling it first.
---------
Co-authored-by: Ozan Tezcan <[email protected]>
This PR introduces a mechanism that allows a module to temporarily disable trimming after an ASM migration operation so it can safely finish ongoing asynchronous jobs that depend on keys in migrating (and about to be trimmed) slots.
ClusterDisableTrim/ClusterEnableTrim
We introduce
ClusterDisableTrim/ClusterEnableTrimModule APIs to allow module to disable/enable slot migrationPlease notice: Redis will not start any subsequent import or migrate ASM operations while slot trimming is disabled, so modules must re-enable trimming immediately after completing their pending work.
The only valid and meaningful time for a module to disable trimming appears to be after the MIGRATE_COMPLETED event.
REDISMODULE_OPEN_KEY_ACCESS_TRIMMED
Added REDISMODULE_OPEN_KEY_ACCESS_TRIMMED to RM_OpenKey() so that module can operate with these keys in the unowned slots after trim is paused.
And now we don't delete the key if it is in trim job when we access it. And
expireIfNeededreturnsKEY_VALIDifEXPIRE_ALLOW_ACCESS_TRIMMEDis set, otherwise, returnsKEY_TRIMMEDwithout deleting key.REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS
We also extend RM_GetContextFlags() to include a flag REDISMODULE_CTX_FLAGS_TRIM_IN_PROGRESS indicating whether a trimming job is pending (due to trim pause) or in progress. Modules could periodically poll this flag to synchronize their internal state, e.g., if a trim job was delayed or if the module incorrectly assumed trimming was still active.
Bugfix: RM_SetClusterFlags could not clear a flag after enabling it first