Module API to allow writes after key space notification hooks by MeirShpilraien · Pull Request #11199 · redis/redis

MeirShpilraien · 2022-08-28T16:00:39Z

Summary of API additions

RedisModule_AddPostNotificationJob - new API to call inside a key space notification (and on more locations in the future) and allow to add a post job as describe above.
New module option, REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS, allows to disable Redis protection of nested key-space notifications.
RedisModule_GetModuleOptionsAll - gets the mask of all supported module options so a module will be able to check if a given option is supported by the current running Redis instance.

Background

The following PR is a proposal of handling write operations inside module key space notifications.

After a lot of discussions we came to a conclusion that module should not perform any write operations on key space notification.

Some examples of issues that such write operation can cause are describe on the following links:

Bad replication oreder - Fix replication inconsistency on modules that uses key space notifications #10969
Used after free - Fix replication inconsistency on modules that uses key space notifications #10969 (comment)
Used after free - Add a special notification unlink available only for modules #9406 (comment)

There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic
(commands logic, eviction, expire). Redis do not assume that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviour.

One solution is to go over all those places (that assumes no data changes) and fix them such that they will be tolerable to data changes. This approach will probably require a lot of changes all around the code and it will be very hard to know whether or not we got all those issues. Moreover, it will be very hard to maintain such code and be sure that we are not adding more places that assume no data changes.

Another approach (which presented in this PR) is to state that modules should not perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API , RedisModule_AddPostNotificationJob, that allows to register a callback that will be called by Redis when the following conditions hold:

It is safe to perform any write operation.
The job will be called atomically along side the operation that triggers it (in our case, key space notification).

Module can use this new API to safely perform any write operation and still achieve atomicity between the notification and the write.

Notice that another approach that was suggested was to collect all the key space notifications and fire them at the end (where it is safe to write). This solution might be simpler to the module but it has some limitations:

It can be considered a breaking change because the notifications will be fired at the end and not synchronously when they happened
If the key space notification will be fire at the end, the module might miss data that it has before, one example is this PR that adds unlink notification that allows the module to read the data just before it is deleted.

We believe that the suggested API gives a good abstraction (module do not need to know the internal for Redis and decide when it safe to write, Redis calls the module when such conditions hold) with good flexibility (module can still perform logic synchronously with the notification and read the state of the database as it was at the time of the notification).

Though currently the API is supported on key space notifications, the API is written in a generic way so that in the future we will be able to use it on other places (server events for example).

Technical Details

Whenever a module uses RedisModule_AddPostNotificationJob the callback is added to a list of callbacks (called modulePostExecUnitJobs) that need to be invoke after the current execution unit ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on postExecutionUnitOperations (which was propagatePendingCommands before this PR). The new function fires the post jobs and then calls propagatePendingCommands.

If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of modulePostExecUnitJobs list and will be invoke atomically after the current callback ends. This raises a concerns of entering an infinite loops, we consider infinite loops as a logical bug that need to be fixed in the module, an attempt to protect against infinite loops by halting the execution could result in violation of the feature correctness and so Redis will make no attempt to protect the module from infinite loops

In addition, currently key space notifications are not nested. Some modules might want to allow nesting key-space notifications. To allow that and keep backward compatibility, we introduce a new module option called REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS. Setting this option will disable the Redis key-space notifications nesting protection and will pass this responsibility to the module.

Redis infrastructure

This PR promotes the existing propagatePendingCommands to an "Execution Unit" concept, which is called after each atomic unit of execution,

Cherry pick considerations and followup PR

One of the goal of this PR was to make it as safe as possible to cherry-pick to 7.0. While working on the PR we realize the code has some state duplication when it comes to command propagation to replica and AOF. We realize that server.core_propagation is probably redundant and we probably can also unify server.module_ctx_nesting and server.in_nested_call. Such changes are probably much more risky so we avoid doing them on this PR. A followup PR will cover those topics and will try to rearrange this part of the code.

TODO

Decide whether the approach taken on this PR makes sense and if there are more edge cases we might have missed
Decide whether or not we want to force modules not to write inside key space notifications - no such validation is require. Document this limitation is good enough.
Decide whether or not we want to force modules to use the new API only on key space notifications - no
Handle RM_StringDMA which is an example of read operation that might change the data (see issue describe in this comment Add a special notification unlink available only for modules #9406 (comment)) - after checking the issue we see that there is no really a problem using RM_StringDMA and its better not to change it to avoid breaking change. Test was added to verify it.
Consider whether or not propagatePendingCommands is the right place to fire the post notifications callbacks.
Move max action limitation to be a parameter that can be set by the module itself - we decided we will not have any limits
Improve testing to the new API

Related PRs: redis#10969, redis#11178 The following PR is a proposal of handling write operations inside key space notifications. The PR is not yet finished (see todo list at the end) but I would like to start getting opinions whether or not the approach make sense and whether there are any issues that we might have missed. After a lot of discussions we came to a conclussion that module should not perform any write operations on key space notifcation. Some examples of issues that such write operation can cause are describe on the following links: * Bad replication oreder - redis#10969 * Used after free - redis#10969 (comment) * Used after free - redis#9406 (comment) There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic (commands logic, eviction, expire). Redis **do not assume** that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviure. One solution is to go over all those places (that assumes no data changes) and fix them such that they will be tollerable to data changes. This apporach will probably require a lot of changes all around the code and it will be very hard to know whether or not we got all those issues. Moreover, it will be very hard to maintain such code and be sure that we are not adding more places that assume no data changes. Another approach (which presented in this PR) is to state that modules **should not** perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API that, `RedisModule_AddPostNotificationJob`, allows to register a callback that will be called by Redis when the following conditions hold: * It is safe to perform any write operation. * The job will be called atomically along side the notification (the notification effect and the job effect will be replicated to the replication and the aof wrapped with multi exec). Module can use this new API to safely perform any write operation and still achieve atomicity betweent the notification and the write. ### Technical Details Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostKeyspaceJobs`) that need to be invoke after the current logic ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `propagatePendingCommands` just before we actually propagating the effects. If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostKeyspaceJobs` list and will be invoke atomically after the current callback ends. In order to avoid infinite loop we limit the number of jobs that can be registered in a single logic unit (command, eviction, active expire). This limitation is configurable using `maxpostnotificationsjobs` configuration value.

oranagra

commenting on the todo comment on the top

Decide whether the approach taken on this PR makes sense and if there are more edge cases we might have missed

i'm in favor of this approach

Decide whether or not we want to force modules not to write inside key space notifications

I'd like that, but it'll be a breaking change, so i think we should skip it.

Decide whether or not we want to force modules to use the new API only on key space notifications

I don't think we should bother.

Decide how to handle RM_StringDMA which is an example of read operation that might change the data

I personally think that we should allow String DMC reads (useful for huge string), but we should error (rather than corrupt memory), in case the string is not sds encoded, and document that in this case the module should resort to a non DMA read.

Consider whether or not propagatePendingCommands is the right place to fire the post notifications callbacks.

I think it's right, until the day we want to allow propagating from a non-write keyspace notification (aka, key-miss), but i think that's probably an invalid use.

src/config.c

src/module.c

tests/modules/postnotifications.c

tests/unit/moduleapi/postnotifications.tcl

Co-authored-by: Oran Agra <[email protected]>

MeirShpilraien · 2022-09-06T11:56:13Z

Thanks @oranagra, I have fixed your comments / replied to them. Waiting for more comments from the community.

yossigo

@MeirShpilraien I think this is a good direction to address the problem. I have a feeling there could be more edge cases we did not consider yet though.

I don't think we can/should prevent modules from writing inside a notification hook, clearly documenting it is enough IMHO.

src/server.c

tests/unit/moduleapi/postnotifications.tcl

tests/modules/postnotifications.c

tests/unit/moduleapi/postnotifications.tcl

src/config.c

…w nested keyspace notifications.

src/module.c

src/server.c

src/cluster.c

src/redismodule.h

src/module.c

src/server.c

oranagra · 2022-10-18T16:47:41Z

conceptually approved in a core-team meeting. @yossigo please take a quick look to be on the safe side.

yossigo · 2022-10-31T13:46:57Z

src/redismodule.h


+ /* Declare that the module want to get nested key space notifications.
+  * If enabled, the module is responsible to break endless loop. */
+ #define REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS    (1<<3)


Not directly related to this PR, but I think we're missing a way for a module to determine if this option is supported or not, like RM_Get**FlagsAll.

i'd argue that it is related to this PR since it adds a new option and a a compatibility issue.
it's real quick to add it, i don't see a need for another PR or issue.

@oranagra added to this PR and updated the top comment.

src/redismodule.h

src/module.c

oranagra · 2022-11-09T08:26:15Z

PR was conceptually approved in a core-team meeting, with the exception of #11199 (comment).
it was also approved to be eventually backported to 7.0 in due time.

Co-authored-by: Yossi Gottlieb <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>

src/module.c

sundb · 2022-11-26T14:26:38Z

src/cluster.c

+        postExecutionUnitOperations();
        decrRefCount(key);
-        propagatePendingCommands();
+        postExecutionUnitOperations();


@MeirShpilraien isn't redundant?

yes, certainly redundant, i wonder how it slipped in....
admittedly, i only did incremental reviews, and didn't do a top to bottom one before merging.
please make a PR to fix.

Yes, totally redudent, bad merge from unstable on my side (funny, I believe I also did the PR that caused the conflict and I still missed it)
Will make a PR to remove it.

@MeirShpilraien i see sundb already made the PR: #11547

Ohh right, greate, thanks.

Accidentally introduced when merging unstable in #11199

…11199) ### Summary of API additions * `RedisModule_AddPostNotificationJob` - new API to call inside a key space notification (and on more locations in the future) and allow to add a post job as describe above. * New module option, `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`, allows to disable Redis protection of nested key-space notifications. * `RedisModule_GetModuleOptionsAll` - gets the mask of all supported module options so a module will be able to check if a given option is supported by the current running Redis instance. ### Background The following PR is a proposal of handling write operations inside module key space notifications. After a lot of discussions we came to a conclusion that module should not perform any write operations on key space notification. Some examples of issues that such write operation can cause are describe on the following links: * Bad replication oreder - redis#10969 * Used after free - redis#10969 (comment) * Used after free - redis#9406 (comment) There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic (commands logic, eviction, expire). Redis **do not assume** that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviour. The solution is to state that modules **should not** perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API , `RedisModule_AddPostNotificationJob`, that allows to register a callback that will be called by Redis when the following conditions hold: * It is safe to perform any write operation. * The job will be called atomically along side the operation that triggers it (in our case, key space notification). Module can use this new API to safely perform any write operation and still achieve atomicity between the notification and the write. Although currently the API is supported on key space notifications, the API is written in a generic way so that in the future we will be able to use it on other places (server events for example). ### Technical Details Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostExecUnitJobs`) that need to be invoke after the current execution unit ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `postExecutionUnitOperations` (which was `propagatePendingCommands` before this PR). The new function fires the post jobs and then calls `propagatePendingCommands`. If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostExecUnitJobs` list and will be invoke atomically after the current callback ends. This raises a concerns of entering an infinite loops, we consider infinite loops as a logical bug that need to be fixed in the module, an attempt to protect against infinite loops by halting the execution could result in violation of the feature correctness and so **Redis will make no attempt to protect the module from infinite loops** In addition, currently key space notifications are not nested. Some modules might want to allow nesting key-space notifications. To allow that and keep backward compatibility, we introduce a new module option called `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`. Setting this option will disable the Redis key-space notifications nesting protection and will pass this responsibility to the module. ### Redis infrastructure This PR promotes the existing `propagatePendingCommands` to an "Execution Unit" concept, which is called after each atomic unit of execution, Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Yossi Gottlieb <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>

Accidentally introduced when merging unstable in redis#11199

…11199) ### Summary of API additions * `RedisModule_AddPostNotificationJob` - new API to call inside a key space notification (and on more locations in the future) and allow to add a post job as describe above. * New module option, `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`, allows to disable Redis protection of nested key-space notifications. * `RedisModule_GetModuleOptionsAll` - gets the mask of all supported module options so a module will be able to check if a given option is supported by the current running Redis instance. ### Background The following PR is a proposal of handling write operations inside module key space notifications. After a lot of discussions we came to a conclusion that module should not perform any write operations on key space notification. Some examples of issues that such write operation can cause are describe on the following links: * Bad replication oreder - redis#10969 * Used after free - redis#10969 (comment) * Used after free - redis#9406 (comment) There are probably more issues that are yet to be discovered. The underline problem with writing inside key space notification is that the notification runs synchronously, this means that the notification code will be executed in the middle on Redis logic (commands logic, eviction, expire). Redis **do not assume** that the data might change while running the logic and such changes can crash Redis or cause unexpected behaviour. The solution is to state that modules **should not** perform any write command inside key space notification (we can chose whether or not we want to force it). To still cover the use-case where module wants to perform a write operation as a reaction to key space notifications, we introduce a new API , `RedisModule_AddPostNotificationJob`, that allows to register a callback that will be called by Redis when the following conditions hold: * It is safe to perform any write operation. * The job will be called atomically along side the operation that triggers it (in our case, key space notification). Module can use this new API to safely perform any write operation and still achieve atomicity between the notification and the write. Although currently the API is supported on key space notifications, the API is written in a generic way so that in the future we will be able to use it on other places (server events for example). ### Technical Details Whenever a module uses `RedisModule_AddPostNotificationJob` the callback is added to a list of callbacks (called `modulePostExecUnitJobs`) that need to be invoke after the current execution unit ends (whether its a command, eviction, or active expire). In order to trigger those callback atomically with the notification effect, we call those callbacks on `postExecutionUnitOperations` (which was `propagatePendingCommands` before this PR). The new function fires the post jobs and then calls `propagatePendingCommands`. If the callback perform more operations that triggers more key space notifications. Those keys space notifications might register more callbacks. Those callbacks will be added to the end of `modulePostExecUnitJobs` list and will be invoke atomically after the current callback ends. This raises a concerns of entering an infinite loops, we consider infinite loops as a logical bug that need to be fixed in the module, an attempt to protect against infinite loops by halting the execution could result in violation of the feature correctness and so **Redis will make no attempt to protect the module from infinite loops** In addition, currently key space notifications are not nested. Some modules might want to allow nesting key-space notifications. To allow that and keep backward compatibility, we introduce a new module option called `REDISMODULE_OPTIONS_ALLOW_NESTED_KEYSPACE_NOTIFICATIONS`. Setting this option will disable the Redis key-space notifications nesting protection and will pass this responsibility to the module. ### Redis infrastructure This PR promotes the existing `propagatePendingCommands` to an "Execution Unit" concept, which is called after each atomic unit of execution, Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Yossi Gottlieb <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>

Accidentally introduced when merging unstable in redis#11199

MeirShpilraien requested review from enjoy-binbin, guybe7 and oranagra August 28, 2022 16:01

oranagra reviewed Aug 31, 2022

View reviewed changes

Meir Shpilraien (Spielrein) and others added 2 commits September 6, 2022 14:23

Apply suggestions from code review

089ff80

Co-authored-by: Oran Agra <[email protected]>

Review fixes

511699a

yossigo reviewed Sep 12, 2022

View reviewed changes

meir added 2 commits September 18, 2022 14:17

Added test to check RM_StringDMA from within key space notifications.

4f4269b

Change max_post_notifications_jobs to be set using a module API.

224cfa6

MeirShpilraien requested a review from oranagra September 18, 2022 12:03