-
Notifications
You must be signed in to change notification settings - Fork 23.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reply LOADING on replica while flushing the db #13495
Conversation
* reply -LOADING. */ | ||
emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | ||
} | ||
loadingFireEvent(RDBFLAGS_REPLICATION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split startLoading() into two functions here. Before the PR, we fired module event after flushing the data. Not sure it matters but I tried to preserve same behavior.
loadingSetFlags() sets flags so emptyData() can reply -LOADING.
loadingFireEvent() fires event with an empty db same as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd like @sundb to take a look too
i don't think this is related to lazy flush. |
approved by accident |
Line 13073 in 3fcddfb
Also, I just realized this should have been Note: It needs retry logic as well. We can call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#13495 introduced a change to reply -LOADING while flushing existing db on a replica. Some of our tests are sensitive to this change and do no expect -LOADING reply. Fixing a couple of tests that fail time to time.
### New Features in binary distributions - 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter, Count-min sketch, Top-k, t-digest - Redis scalable query engine (including vector search) ### Potentially breaking changes - #12272 `GETRANGE` returns an empty bulk when the negative end index is out of range - #12395 Optimize `SCAN` command when matching data type ### Bug fixes - #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed - #13489 `ACL CAT` - return module commands - #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx` - #13473 Fix incorrect lag due to trimming stream via `XTRIM` command - #13338 Fix incorrect lag field in `XINFO` when tombstone is after the `last_id` of the consume group - #13470 On `HDEL` of last field - update the global hash field expiration data structure - #13465 Cluster: Pass extensions to node if extension processing is handled by it - #13443 Cluster: Ensure validity of myself when loading cluster config - #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array ### Modules API - #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and `RM_RegisterDefragCallbacks` - defrag API to allocate and free raw memory ### Performance and resource utilization improvements - #13503 Avoid overhead of comparison function pointer calls in listpack `lpFind` - #13505 Optimize `STRING` datatype write commands - #13499 Optimize `SMEMBERS` command - #13494 Optimize `GEO*` commands reply - #13490 Optimize `HELLO` command - #13488 Optimize client query buffer - #12395 Optimize `SCAN` command when matching data type - #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands - #13516 Optimize `LRANGE` and other commands that perform several writes to client buffers per call - #13431 Avoid `used_memory` contention when updating from multiple threads ### Other general improvements - #13495 Reply `-LOADING` on replica while flushing the db ### CLI tools - #13411 redis-cli: Fix wrong `dbnum` showed after the client reconnected ### Notes - No backward compatibility for replication or persistence. - Additional distributions, upgrade paths, features, and improvements will be introduced in upcoming pre-releases. - With the GA release of 8.0 we will deprecate Redis Stack.
In #13495, we introduced a feature to reply -LOADING while flushing a large db on a replica. While `_dictClear()` is in progress, it calls a callback for every 65k items and we yield back to eventloop to reply -LOADING. This change has made some tests unstable as those tests don't expect new -LOADING reply. One observation, inside `_dictClear()`, we call the callback even if db has a few keys. Most tests run with small amount of keys. So, each replication and cluster test has to handle potential -LOADING reply now. This PR changes this behavior, skips calling callback when `i=0` to stabilize replication tests. Callback will be called after the first 65k items. Most tests use less than 65k keys and they won't get -LOADING reply.
In #13495, we introduced a feature to reply -LOADING while flushing a large db on a replica. While `_dictClear()` is in progress, it calls a callback for every 65k items and we yield back to eventloop to reply -LOADING. This change has made some tests unstable as those tests don't expect new -LOADING reply. One observation, inside `_dictClear()`, we call the callback even if db has a few keys. Most tests run with small amount of keys. So, each replication and cluster test has to handle potential -LOADING reply now. This PR changes this behavior, skips calling callback when `i=0` to stabilize replication tests. Callback will be called after the first 65k items. Most tests use less than 65k keys and they won't get -LOADING reply.
When the diskless load configuration is set to on-empty-db, we retain a pointer to the function library context. When emptyData() is called, it frees this function library context pointer, leading to a use-after-free situation. I refactored code to ensure that emptyData() is called first, followed by retrieving the valid pointer to the function library context. Refactored code should not introduce any runtime implications. Bug introduced by #13495 (Redis 8.0) Co-authored-by: Oran Agra <[email protected]>
On a full sync, replica starts discarding existing db. If the existing db is huge and
flush is happening synchronously, replica may become unresponsive.
Adding a change to yield back to event loop while flushing db on a replica.
Replica will reply -LOADING in this case. Note that while replica is loading the new rdb,
it may get an error and start flushing the partial db. This step may take a long time as well.
Similarly, replica will reply -LOADING in this case.
To call processEventsWhileBlocked() and reply -LOADING, we need to do a few things:
rdbload() already does these steps and calls processEventsWhileBlocked() while loading the rdb.
Added a new call rdbLoadWithEmptyCb() which accepts callback to flush db before loading rdb or
when an error happens while loading.
For diskless replication, doing something similar and calling emptyData() after setting required flags.
Additional changes:
Allow
appendonly
config change during loading.Config can be changed while loading data on startup or on replication when slave is loading RDB.
We allow config change command to update
server.aof_enabled
and then lazily apply config change after loading operation is completed.Added a test for
replica-lazy-flush
config