Replica keep serving data during repl-diskless-load=swapdb for better availability by eduardobr · Pull Request #9323 · redis/redis

eduardobr · 2021-08-05T15:32:54Z

For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master:

Avoid LOADING response during failed and successful synchronization for cases where the replica is already up and running with data.
Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping.
This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage.

General notes:

The concept of backupDb becomes tempDb for clarity.
Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline.
New property in INFO: async_loading to differentiate from the blocking loading
Slot to Key mapping is now a field of redisDb as it's more natural to access it from both server.db and the tempDb that is passed around.
Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications.

Considerations for review:

We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code)
Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead.

Additional for Release Notes

Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE
New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING
Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event.
Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED.
New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading).

eduardobr · 2021-08-05T16:29:08Z

Relates to #4624, resolving it in some setups

oranagra

@eduardobr thank you, i think this PR shows the potential of that concept quite well, and it's nice to see that the code was nearly prepared for this concept and you were able to apply it nicely without too many changes.

I did a quick review and gave a few comments, the main issue to be resolved is probably around the module API and complex modules with out of keyspace data.

Few other things i must note:

During loading redis is not highly responsive, it only processes events for a short period once in some 2mb of RDB data. so the clients working with the replica may get degraded performance, even beyond the fact that the CPU is heavily invested in loading.
I think we wanna have that mode enabled only when a replica was previously in sync with a master (not depending on whatever the database was empty or not though), when a new replica comes up, we'll want to avoid using swapdb, so that we will respond with -LOADING.

src/cluster.c

src/db.c

src/replication.c

tests/integration/replication.tcl

src/multi.c

src/db.c

oranagra

LGTM with minor fix.

what's left is to handle the modules issue.

src/replication.c

oranagra

@eduardobr thank you.. your changes seem great!
i gave a few minor comments mostly about renaming "repl async" to "repl async load".
and raised a few questions about ordering.
i think you can proceed with adjusting the tests.

src/module.c

src/redismodule.h

src/module.c

src/redismodule.h

src/module.c

src/replication.c

oranagra · 2021-09-09T09:41:30Z

@eduardobr are you gonna finish this? any major issues? IIRC it's only renaming / doc and some minor changes.
Let me know if you want me to pick it up instead.

eduardobr · 2021-09-09T10:04:57Z

@eduardobr are you gonna finish this? any major issues? IIRC it's only renaming / doc and some minor changes.
Let me know if you want me to pick it up instead.

@oranagra I can move on, was just waiting for some answer on the questions you raised, so I can finish in one seating.
I’ll apply the suggestions soon.
Thanks

eduardobr · 2021-09-12T12:57:01Z

@oranagra Things got quite complicated due to many conflicting changes
Especially related to #9356 and #9398
So I decided to squash everything then rebase to reduce my cognitive load. The rebase was commited in 2 steps due to the huge amount of changes (5676f0c and ca922aa).

In relation to "Delay to discard cached master when full synchronization #9398", at replication.c readSyncBulkPayload(), I repositioned this new block and run it in a specific situation for SWAPDB mode, and another specific situation otherwise (please review). It could be wrapped in a method for reuse, but how to call a routine like this?:

       /* Replica starts to apply data from new master, we must discard the cached
        * master structure. */
        serverAssert(server.master == NULL);
        replicationDiscardCachedMaster();

        /* We want our slaves to resync with us as well, if we have any sub-slaves.
        * The master already transferred us an entirely different data set and we
        * have no way to incrementally feed our slaves after that. */
        disconnectSlaves(); /* Force our slaves to resync with us as well. */
        freeReplicationBacklog(); /* Don't allow our chained slaves to PSYNC. */

In relation to "Slot-to-keys using dict entry metadata #9356", it hit this PR in some points and even required moving this to server.h:

/* State for the Slot to Key API, for a single slot. The keys in the same slot
 * are linked together using dictEntry metadata. See also "Slot to Key API" in
 * cluster.c. */
struct clusterSlotToKeys {
    uint64_t count;             /* Number of keys in the slot. */
    dictEntry *head;            /* The first key-value entry in the slot. */
};
typedef struct clusterSlotToKeys clusterSlotsToKeysData[CLUSTER_SLOTS];

Additionally, raised question about a missing bit:
Naming for Pre and Post versions of some module events

src/replication.c

ShooterIT · 2021-09-13T02:21:48Z

For #9356, if you have uncertain things, we can ask @zuiderkwast for suggestions, he is a friendly man!

zuiderkwast · 2021-09-13T08:24:01Z

In relation to "Slot-to-keys using dict entry metadata #9356", it hit this PR in some points and even required moving this to server.h
struct clusterSlotToKeys [...]

I'm here to help. 😄 I tried to move everything cluster-related to cluster.{c,h}.

Isn't it enough to add #include "cluster.h" in some files? I can't see exactly why you had to move them. Maybe create some new .h file (replication.h) which includes cluster.h and defines some new structs..?

eduardobr · 2021-09-13T18:57:54Z

In relation to "Slot-to-keys using dict entry metadata #9356", it hit this PR in some points and even required moving this to server.h
struct clusterSlotToKeys [...]

I'm here to help. smile I tried to move everything cluster-related to cluster.{c,h}.

Isn't it enough to add #include "cluster.h" in some files? I can't see exactly why you had to move them. Maybe create some new .h file (replication.h) which includes cluster.h and defines some new structs..?

@zuiderkwast Thank you ;)
The reason it was moved is that tempDb struct needs clusterSlotsToKeysData (which is not a cluster specific concept)
tempDb is a broad concept and is used by

db.c (depends on cluster)
replication.c (depends on cluster)

If it lived in one of these 2, one of them wouldn't be able to access it.
So keeping in server.h was a better choice.

I'm afraid I'm still not familiar with the code base as a whole and C best practices to know if that was a decent solution though. I believe that can be also a separated commit after the storm of changes and rebases done here if necessary, what do you think?

But most importantly, I believe it would be great to have your review on the correctness of this change after I integrated your PR, for example:
I noticed your change includes this cleanup in db.c restoreDbBackup:

    /* Restore slots to keys map backup if enable cluster. */
    if (server.cluster_enabled) {
        serverAssert(server.cluster->slots_to_keys->numele == 0);
        raxFree(server.cluster->slots_to_keys);
        server.cluster->slots_to_keys = backup->slots_to_keys;
        memcpy(server.cluster->slots_keys_count, backup->slots_keys_count,
                sizeof(server.cluster->slots_keys_count));
    }

into

    /* Restore slots to keys map backup if enable cluster. */
    if (server.cluster_enabled) slotToKeyRestoreBackup(&backup->slots_to_keys);

Which inlining, is:
memcpy(server.cluster->slots_to_keys, backup, sizeof(server.cluster->slots_to_keys));

And that's what I follow here after swapping tempDb with main db:
ca922aa

Does it make sense, is the part removed now unnecessary?

…event reordering

oranagra

I Re-reviewed the whole thing from scratch, here are my new comments:

src/config.c

src/db.c

src/redismodule.h

src/replication.c

src/rdb.c

src/server.c

tests/integration/replication.tcl

src/module.c

src/replication.c

Co-authored-by: Oran Agra <[email protected]>

tests/integration/replication.tcl

madolson

Approving based on documentation, didn't read the implementation details.

src/server.h

oranagra · 2021-11-04T08:47:24Z

@eduardobr merged 🎉
Thank you for your patience.

eduardobr · 2021-11-04T08:48:52Z

@eduardobr merged 🎉
Thank you for your patience.

Thank YOU for the patience 😁

oranagra · 2021-11-07T07:01:38Z

@eduardobr i noticed some sporadic timeout errors in the valgrind runs
https://github.com/redis/redis/runs/4123007165?check_suite_focus=true

Diskless load swapdb (different replid): replica enter loading

and

Diskless load swapdb (async_loading): old database is exposed while async replication is in progress

they did not reproduce they day before / after so it might be nothing.
maybe you wanna look into that, or we can wait to gather more evidence.

src/replication.c

During diskless replication, the check for broken EOF mark is misplaced and should be earlier. Now we do not swap db, we do proper cleanup and correctly raise module events on this kind of failure. This issue existed prior to #9323, but before, the side effect was not restoring backup and not raising the correct module events on this failure.

oranagra · 2021-11-10T06:50:22Z

another timeout in the test under valgrind:
https://github.com/redis/redis/runs/4159338778?check_suite_focus=true

[TIMEOUT]: clients state report follows.
sock55c4ba1cc3c0 => (OK) Diskless load swapdb (async_loading): old database is exposed while async replication is in progress

oranagra · 2021-11-10T16:23:13Z

i think the timeout is not a fault of the tests, hope it'll get solved by #9767

We saw some tests sporadically time out on valgrind (namely the ones from #9323). Increasing valgrind timeout from 20 mins to 40 mins in CI. And fixing an outdated help message.

enjoy-binbin · 2021-11-20T05:10:25Z

src/server.c

            "module_fork_in_progress:%d\r\n"
            "module_fork_last_cow_size:%zu\r\n",
-            (int)server.loading,
+            (int)(server.loading && !server.async_loading),


somehow i think it should not be added !server.async_loading here? (a bit confused?)
btw, i think async_loading should also require a document, maybe missing, i added it in redis/redis-doc#1686

It is indeed a bit confusing that in Redis both flags are set in this mode, but in info, it's either one or the other.
But we concluded that for each one (users or developers) it makes sense to look at it this way.

In redis#9323, when `repl-diskless-load` is enabled and set to `swapdb`, if the master replication ID hasn't changed, we can load data-set asynchronously, and serving read commands during the full resync. In `diskless loading short read` test, after a loading successfully, we will wait for the loading to stop and continue the for loop. After the introduction of `async_loading`, we also need to check it. Otherwise the next loop will start too soon, may trigger a timing issue.

In #9323, when `repl-diskless-load` is enabled and set to `swapdb`, if the master replication ID hasn't changed, we can load data-set asynchronously, and serving read commands during the full resync. In `diskless loading short read` test, after a loading successfully, we will wait for the loading to stop and continue the for loop. After the introduction of `async_loading`, we also need to check it. Otherwise the next loop will start too soon, may trigger a timing issue.

In redis#9323, when `repl-diskless-load` is enabled and set to `swapdb`, if the master replication ID hasn't changed, we can load data-set asynchronously, and serving read commands during the full resync. In `diskless loading short read` test, after a loading successfully, we will wait for the loading to stop and continue the for loop. After the introduction of `async_loading`, we also need to check it. Otherwise the next loop will start too soon, may trigger a timing issue.

…ading (#9878) ## background Till now CONFIG SET was blocked during loading. (In the not so distant past, GET was disallowed too) We recently (not released yet) added an async-loading mode, see #9323, and during that time it'll serve CONFIG SET and any other command. And now we realized (#9770) that some configs, and commands are dangerous during async-loading. ## changes * Allow most CONFIG SET during loading (both on async-loading and normal loading) * Allow CONFIG REWRITE and CONFIG RESETSTAT during loading * Block a few config during loading (`appendonly`, `repl-diskless-load`, and `dir`) * Block a few commands during loading (list below) ## the blocked commands: * SAVE - obviously we don't wanna start a foregreound save during loading 8-) * BGSAVE - we don't mind to schedule one, but we don't wanna fork now * BGREWRITEAOF - we don't mind to schedule one, but we don't wanna fork now * MODULE - we obviously don't wanna unload a module during replication / rdb loading (MODULE HELP and MODULE LIST are not blocked) * SYNC / PSYNC - we're in the middle of RDB loading from master, must not allow sync requests now. * REPLICAOF / SLAVEOF - we're in the middle of replicating, maybe it makes sense to let the user abort it, but he couldn't do that so far, i don't wanna take any risk of bugs due to odd state. * CLUSTER - only allow [HELP, SLOTS, NODES, INFO, MYID, LINKS, KEYSLOT, COUNTKEYSINSLOT, GETKEYSINSLOT, RESET, REPLICAS, COUNT_FAILURE_REPORTS], for others, preserve the status quo ## other fixes * processEventsWhileBlocked had an issue when being nested, this could happen with a busy script during async loading (new), but also in a busy script during AOF loading (old). this lead to a crash in the scenario described in #6988

The bug is introduced by #9323. (released in 7.0 RC1) The define of `REDISMODULE_OPTIONS_HANDLE_IO_ERRORS` and `REDISMODULE_OPTION_NO_IMPLICIT_SIGNAL_MODIFIED` have the same value. This will result in skipping `signalModifiedKey()` after `RM_CloseKey()` if the module has set `REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD` option. The implication is missing WATCH and client side tracking invalidations. Other changes: - add `no-implicit-signal-modified` to the options in INFO modules Co-authored-by: Oran Agra <[email protected]>

eduardobr force-pushed the feature/draft-use-tempdb-swapdb branch from 9d16f89 to ecf3f43 Compare August 5, 2021 16:17

eduardobr force-pushed the feature/draft-use-tempdb-swapdb branch 3 times, most recently from 47b7d2e to 409c72a Compare August 6, 2021 13:10

eduardobr mentioned this pull request Aug 6, 2021

LOADING Redis is loading the dataset in memory #4624

Open

oranagra reviewed Aug 9, 2021

View reviewed changes

oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus labels Aug 12, 2021

oranagra reviewed Aug 24, 2021

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

src/replication.c Outdated Show resolved Hide resolved

oranagra reviewed Aug 29, 2021

View reviewed changes

oranagra added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Sep 9, 2021

eduardobr force-pushed the feature/draft-use-tempdb-swapdb branch from 3c6ad5a to 8ee0c56 Compare September 12, 2021 09:16

Eduardo Semprebon and others added 2 commits September 12, 2021 12:26

Swap db only at end of diskless replication for better availability

5676f0c

Rebase and conflict resolution

ca922aa

eduardobr force-pushed the feature/draft-use-tempdb-swapdb branch from 8ee0c56 to ca922aa Compare September 12, 2021 12:34

ShooterIT reviewed Sep 13, 2021

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

moduleAllDatatypesHandleReplAsyncLoad to moduleHandleReplAsyncLoad + …

7e022a2

…event reordering

oranagra reviewed Sep 13, 2021

View reviewed changes

eduardobr added 3 commits September 13, 2021 21:46

Refactor routine for attaching to new master

fb85fc8

Revert block (potential merge issue)

1833aec

Renaming - moduleAllModulesHandleReplAsyncLoad

ce69bd5

oranagra reviewed Sep 13, 2021

View reviewed changes

src/module.c Outdated Show resolved Hide resolved

src/replication.c Outdated Show resolved Hide resolved

Fix REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD flag value

90b8b4e

Co-authored-by: Oran Agra <[email protected]>

oranagra reviewed Nov 2, 2021

View reviewed changes

tests/integration/replication.tcl Show resolved Hide resolved

eduardobr added 2 commits November 3, 2021 17:03

Merge branch 'redis:unstable' into feature/draft-use-tempdb-swapdb

13c5994

More accurate logs and comments

6b54541

madolson approved these changes Nov 4, 2021

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

oranagra changed the title ~~Swap db only at end of diskless replication for better availability~~ Replica keep serving data during repl-diskless-sync=swapdb for better availability Nov 4, 2021

oranagra changed the title ~~Replica keep serving data during repl-diskless-sync=swapdb for better availability~~ Replica keep serving data during repl-diskless-load=swapdb for better availability Nov 4, 2021

oranagra merged commit 91d0c75 into redis:unstable Nov 4, 2021

oranagra mentioned this pull request Nov 7, 2021

Replica doesn't serve stale data if it never synced with master #9734

Draft

oranagra reviewed Nov 8, 2021

View reviewed changes

src/replication.c Show resolved Hide resolved

eduardobr mentioned this pull request Nov 8, 2021

Fix diskless load handling on broken EOF marker #9752

Merged

oranagra mentioned this pull request Nov 10, 2021

Increase test timeout in valgrind runs #9767

Merged

soloestoy mentioned this pull request Nov 11, 2021

Revert async loading check #9770

Closed

enjoy-binbin mentioned this pull request Nov 20, 2021

Add new async_loading field in INFO command redis/redis-doc#1686

Merged

enjoy-binbin reviewed Nov 20, 2021

View reviewed changes

enjoy-binbin mentioned this pull request Nov 24, 2021

Wait for asyn_loading to stop in short read test #9841

Merged

oranagra mentioned this pull request Dec 1, 2021

Allow most CONFIG SET during loading, block some commands in async-loading #9878

Merged

4 tasks

sundb mentioned this pull request Feb 11, 2022

Fix RM_SetModuleOptions flag collision #10284

Merged

Conversation

eduardobr commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eduardobr commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra commented Sep 9, 2021

Uh oh!

eduardobr commented Sep 9, 2021

Uh oh!

eduardobr commented Sep 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ShooterIT commented Sep 13, 2021

Uh oh!

zuiderkwast commented Sep 13, 2021

Uh oh!

eduardobr commented Sep 13, 2021

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oranagra commented Nov 4, 2021

Uh oh!

eduardobr commented Nov 4, 2021

Uh oh!

oranagra commented Nov 7, 2021

Uh oh!

Uh oh!

oranagra commented Nov 10, 2021

Uh oh!

oranagra commented Nov 10, 2021

eduardobr commented Aug 5, 2021 •

edited

Loading

eduardobr commented Aug 5, 2021 •

edited

Loading

eduardobr commented Sep 12, 2021 •

edited

Loading