WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync#12622
Merged
oranagra merged 2 commits intoredis:unstablefrom Sep 28, 2023
Merged
WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync#12622oranagra merged 2 commits intoredis:unstablefrom
oranagra merged 2 commits intoredis:unstablefrom
Conversation
The problem is that WAITAOF used to hang in case commands were propagated only to replicas. In that case, master_repl_offset would increase, but there wold be nothing to fsync, so fsynced_reploff_pending would stay the same This commit updates fsynced_reploff_pending in case there's nothing to fsync Other changes: Fix a race in wait.tcl (client getting blocked vs. the fsync thread)
oranagra
reviewed
Sep 28, 2023
oranagra
approved these changes
Sep 28, 2023
Member
|
since this case seems unlikely (requires a module that does funny things), i'm not gonna mark if for 7.2 backport. |
ShooterIT
added a commit
that referenced
this pull request
Feb 13, 2025
…cond (#13793) ``` if (server.aof_fsync == AOF_FSYNC_EVERYSEC && server.aof_last_incr_fsync_offset != server.aof_last_incr_size && server.mstime - server.aof_last_fsync >= 1000 && !(sync_in_progress = aofFsyncInProgress())) { goto try_fsync; ``` In #12622, when when appendfsync=everysecond, if redis has written some data to AOF but not `fsync`, and less than 1 second has passed since the last `fsync `, redis will won't fsync AOF, but we will update ` fsynced_reploff_pending`, so it cause the `WAITAOF` to return prematurely. this bug is introduced in #12622, from 7.4 The bug fix 1bd6688 is just as follows: ```diff diff --git a/src/aof.c b/src/aof.c index 8ccd8d8..521b304 100644 --- a/src/aof.c +++ b/src/aof.c @@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) { * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */ - if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO) + if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size && + !(sync_in_progress = aofFsyncInProgress())) + { atomicSet(server.fsynced_reploff_pending, server.master_repl_offset); + } return; ``` Additionally, we slightly refactored fsync AOF to make it simpler, as 584f008
YaacovHazan
pushed a commit
to YaacovHazan/redis
that referenced
this pull request
Apr 22, 2025
…cond (redis#13793) ``` if (server.aof_fsync == AOF_FSYNC_EVERYSEC && server.aof_last_incr_fsync_offset != server.aof_last_incr_size && server.mstime - server.aof_last_fsync >= 1000 && !(sync_in_progress = aofFsyncInProgress())) { goto try_fsync; ``` In redis#12622, when when appendfsync=everysecond, if redis has written some data to AOF but not `fsync`, and less than 1 second has passed since the last `fsync `, redis will won't fsync AOF, but we will update ` fsynced_reploff_pending`, so it cause the `WAITAOF` to return prematurely. this bug is introduced in redis#12622, from 7.4 The bug fix redis@1bd6688 is just as follows: ```diff diff --git a/src/aof.c b/src/aof.c index 8ccd8d8..521b304 100644 --- a/src/aof.c +++ b/src/aof.c @@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) { * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */ - if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO) + if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size && + !(sync_in_progress = aofFsyncInProgress())) + { atomicSet(server.fsynced_reploff_pending, server.master_repl_offset); + } return; ``` Additionally, we slightly refactored fsync AOF to make it simpler, as redis@584f008
funny-dog
pushed a commit
to funny-dog/redis
that referenced
this pull request
Sep 17, 2025
…cond (redis#13793) ``` if (server.aof_fsync == AOF_FSYNC_EVERYSEC && server.aof_last_incr_fsync_offset != server.aof_last_incr_size && server.mstime - server.aof_last_fsync >= 1000 && !(sync_in_progress = aofFsyncInProgress())) { goto try_fsync; ``` In redis#12622, when when appendfsync=everysecond, if redis has written some data to AOF but not `fsync`, and less than 1 second has passed since the last `fsync `, redis will won't fsync AOF, but we will update ` fsynced_reploff_pending`, so it cause the `WAITAOF` to return prematurely. this bug is introduced in redis#12622, from 7.4 The bug fix redis@1bd6688 is just as follows: ```diff diff --git a/src/aof.c b/src/aof.c index 8ccd8d8..521b304 100644 --- a/src/aof.c +++ b/src/aof.c @@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) { * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */ - if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO) + if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size && + !(sync_in_progress = aofFsyncInProgress())) + { atomicSet(server.fsynced_reploff_pending, server.master_repl_offset); + } return; ``` Additionally, we slightly refactored fsync AOF to make it simpler, as redis@584f008
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The problem is that WAITAOF could have hang in case commands were propagated only to replicas.
This can happen if a module uses RM_Call with the REDISMODULE_ARGV_NO_AOF flag.
In that case, master_repl_offset would increase, but there would be nothing to fsync, so in the absence of other traffic, fsynced_reploff_pending would stay the static, and WAITAOF can hang.
This commit updates fsynced_reploff_pending to the latest offset in flushAppendOnlyFile in case there's nothing to fsync. i.e. in case it's behind because of the above mentions case it'll be refreshed and release the WAITAOF.
Other changes:
Fix a race in wait.tcl (client getting blocked vs. the fsync thread)
Release notes: