Skip to content

WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync#12622

Merged
oranagra merged 2 commits intoredis:unstablefrom
guybe7:waitaof_replicas
Sep 28, 2023
Merged

WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync#12622
oranagra merged 2 commits intoredis:unstablefrom
guybe7:waitaof_replicas

Conversation

@guybe7
Copy link
Collaborator

@guybe7 guybe7 commented Sep 28, 2023

The problem is that WAITAOF could have hang in case commands were propagated only to replicas.
This can happen if a module uses RM_Call with the REDISMODULE_ARGV_NO_AOF flag.
In that case, master_repl_offset would increase, but there would be nothing to fsync, so in the absence of other traffic, fsynced_reploff_pending would stay the static, and WAITAOF can hang.

This commit updates fsynced_reploff_pending to the latest offset in flushAppendOnlyFile in case there's nothing to fsync. i.e. in case it's behind because of the above mentions case it'll be refreshed and release the WAITAOF.

Other changes:
Fix a race in wait.tcl (client getting blocked vs. the fsync thread)

Release notes:

WAITAOF could timeout or hang if used after a module command that propagated effects only to replicas and not to AOF.

The problem is that WAITAOF used to hang in case commands were propagated only to replicas.
In that case, master_repl_offset would increase, but there wold be nothing to fsync, so fsynced_reploff_pending would stay the same

This commit updates fsynced_reploff_pending in case there's nothing to fsync

Other changes:
Fix a race in wait.tcl (client getting blocked vs. the fsync thread)
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slavak PATL

@oranagra
Copy link
Member

since this case seems unlikely (requires a module that does funny things), i'm not gonna mark if for 7.2 backport.

@oranagra oranagra merged commit c2a4b78 into redis:unstable Sep 28, 2023
@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Sep 28, 2023
ShooterIT added a commit that referenced this pull request Feb 13, 2025
…cond (#13793)

```
if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&
    server.aof_last_incr_fsync_offset != server.aof_last_incr_size &&
    server.mstime - server.aof_last_fsync >= 1000 &&
    !(sync_in_progress = aofFsyncInProgress())) {
    goto try_fsync;
```
In #12622, when when
appendfsync=everysecond, if redis has written some data to AOF but not
`fsync`, and less than 1 second has passed since the last `fsync `,
redis will won't fsync AOF, but we will update `
fsynced_reploff_pending`, so it cause the `WAITAOF` to return
prematurely.

this bug is introduced in #12622,
from 7.4

The bug fix
1bd6688
is just as follows:
```diff
diff --git a/src/aof.c b/src/aof.c
index 8ccd8d8..521b304 100644
--- a/src/aof.c
+++ b/src/aof.c
@@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) {
              * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated
              * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on
              * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */
-            if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO)
+            if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size &&
+                !(sync_in_progress = aofFsyncInProgress()))
+            {
                 atomicSet(server.fsynced_reploff_pending, server.master_repl_offset);
+            }
             return;
```
Additionally, we slightly refactored fsync AOF to make it simpler, as
584f008
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Apr 22, 2025
…cond (redis#13793)

```
if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&
    server.aof_last_incr_fsync_offset != server.aof_last_incr_size &&
    server.mstime - server.aof_last_fsync >= 1000 &&
    !(sync_in_progress = aofFsyncInProgress())) {
    goto try_fsync;
```
In redis#12622, when when
appendfsync=everysecond, if redis has written some data to AOF but not
`fsync`, and less than 1 second has passed since the last `fsync `,
redis will won't fsync AOF, but we will update `
fsynced_reploff_pending`, so it cause the `WAITAOF` to return
prematurely.

this bug is introduced in redis#12622,
from 7.4

The bug fix
redis@1bd6688
is just as follows:
```diff
diff --git a/src/aof.c b/src/aof.c
index 8ccd8d8..521b304 100644
--- a/src/aof.c
+++ b/src/aof.c
@@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) {
              * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated
              * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on
              * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */
-            if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO)
+            if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size &&
+                !(sync_in_progress = aofFsyncInProgress()))
+            {
                 atomicSet(server.fsynced_reploff_pending, server.master_repl_offset);
+            }
             return;
```
Additionally, we slightly refactored fsync AOF to make it simpler, as
redis@584f008
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…cond (redis#13793)

```
if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&
    server.aof_last_incr_fsync_offset != server.aof_last_incr_size &&
    server.mstime - server.aof_last_fsync >= 1000 &&
    !(sync_in_progress = aofFsyncInProgress())) {
    goto try_fsync;
```
In redis#12622, when when
appendfsync=everysecond, if redis has written some data to AOF but not
`fsync`, and less than 1 second has passed since the last `fsync `,
redis will won't fsync AOF, but we will update `
fsynced_reploff_pending`, so it cause the `WAITAOF` to return
prematurely.

this bug is introduced in redis#12622,
from 7.4

The bug fix
redis@1bd6688
is just as follows:
```diff
diff --git a/src/aof.c b/src/aof.c
index 8ccd8d8..521b304 100644
--- a/src/aof.c
+++ b/src/aof.c
@@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) {
              * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated
              * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on
              * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */
-            if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO)
+            if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size &&
+                !(sync_in_progress = aofFsyncInProgress()))
+            {
                 atomicSet(server.fsynced_reploff_pending, server.master_repl_offset);
+            }
             return;
```
Additionally, we slightly refactored fsync AOF to make it simpler, as
redis@584f008
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants