Skip to content

Fix race bug (between flushAppendOnly and fsync thread)#5985

Closed
guybe7 wants to merge 1 commit intoredis:unstablefrom
guybe7:fix_fsync_bug
Closed

Fix race bug (between flushAppendOnly and fsync thread)#5985
guybe7 wants to merge 1 commit intoredis:unstablefrom
guybe7:fix_fsync_bug

Conversation

@guybe7
Copy link
Collaborator

@guybe7 guybe7 commented Apr 3, 2019

Tail of the AOF may never be synced to disk (until shutdown)

Consider the following scenario:

  1. server executes INCR x - it is fed to aof_buf, beforeSleep is called,
    flushAppendOnly, aof_buf is written to file and cleared, background
    fsync thread starts (and ends), server.aof_last_fsync is set.
  2. server executes INCR y (less than a second after INCR x) - it is fed
    to aof_buf, beforeSleep is called, flushAppendOnly, aof_buf is written
    to file and cleared, background fsync thread does not start because
    it's been less than a second since server.aof_last_fsync was set.
  3. on the next beforeSleep flushAppendOnly returns immediately
    because aof_buf is empty, without starting a sync and so on..

In order to fix the above we remember the last synced offset and
call fsync in flushAppendOnly if it's not as expected (different
from aof_current_size) even if aof_buf is empty

We use atomic operation on aof_current_size and aof_fsync_offset
because now they are read/written from 2 parallel threads (may
not be atomic in 32bit systems)

@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 3, 2019

before the fix (strace):

19882 19:32:46.259618 execve("./redis-server", ["./redis-server", "--appendonly", "yes"], 0x7ffedc2ad140 /* 67 vars */) = 0
19882 19:32:46.260454 brk(NULL)         = 0x557bb9781000
19882 19:32:46.260951 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
19882 19:32:46.261290 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
19882 19:32:46.261572 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
19882 19:32:46.261844 fstat(3, {st_mode=S_IFREG|0644, st_size=133506, ...}) = 0
19882 19:32:46.262103 mmap(NULL, 133506, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7430e28000
19882 19:32:46.262347 close(3)          = 0
...
19882 19:32:59.052615 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 30) = 30
19882 19:32:59.052711 write(9, ":14287\r\n", 8) = 8
19882 19:32:59.052821 epoll_wait(5, [{EPOLLIN, {u32=9, u64=9}}], 10128, 2) = 1
19882 19:32:59.052918 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
19882 19:32:59.053069 read(3, 0x7ffc19f95927, 1) = -1 EAGAIN (Resource temporarily unavailable)
19882 19:32:59.053165 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 30) = 30
19882 19:32:59.053271 write(9, ":14288\r\n", 8) = 8
19882 19:32:59.053386 epoll_wait(5, [{EPOLLIN, {u32=9, u64=9}}], 10128, 1) = 1
19882 19:32:59.053483 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
19882 19:32:59.053586 read(3, 0x7ffc19f95927, 1) = -1 EAGAIN (Resource temporarily unavailable)
19882 19:32:59.053683 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 30) = 30
19882 19:32:59.053779 write(9, ":14289\r\n", 8) = 8
...
19884 19:32:59.055191 fdatasync(8 <unfinished ...>
19882 19:32:59.055236 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
19882 19:32:59.055368 read(3, 0x7ffc19f95927, 1) = -1 EAGAIN (Resource temporarily unavailable)
19882 19:32:59.055531 write(9, ":14291\r\n", 8) = 8
...
19884 19:32:59.062068 <... fdatasync resumed> ) = 0
19882 19:32:59.062097 <... read resumed> 0x7ffc19f95927, 1) = -1 EAGAIN (Resource temporarily unavailable)
19884 19:32:59.062110 futex(0x557bb8698fb8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
19882 19:32:59.062145 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n*3"..., 390) = 390
19882 19:32:59.062240 write(9, ":14303\r\n", 8) = 8
...
19882 19:32:59.180340 epoll_wait(5, [{EPOLLIN, {u32=9, u64=9}}], 10128, 74) = 1
19882 19:32:59.180547 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
19882 19:32:59.180778 read(3, 0x7ffc19f95927, 1) = -1 EAGAIN (Resource temporarily unavailable)
19882 19:32:59.180888 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 30) = 30
19882 19:32:59.181000 write(9, ":14504\r\n", 8) = 8
...
19882 19:33:06.599001 write(1, "Received SIGINT scheduling shutd"..., 38) = 38
19882 19:33:06.599245 write(1, "\n", 1) = 1
19882 19:33:06.599465 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
19882 19:33:06.599704 read(3, 0x7ffc19f95927, 1) = -1 EAGAIN (Resource temporarily unavailable)
19882 19:33:06.599973 epoll_wait(5, [], 10128, 68) = 0
19882 19:33:06.668393 getpid()          = 19882
19882 19:33:06.668712 openat(AT_FDCWD, "/proc/19882/stat", O_RDONLY) = 10
19882 19:33:06.668993 read(10, "19882 (redis-server) R 19880 198"..., 4096) = 330
19882 19:33:06.669254 close(10)         = 0
19882 19:33:06.669524 getpid()          = 19882
19882 19:33:06.669722 getpid()          = 19882
19882 19:33:06.669868 write(1, "19882:M 03 Apr 2019 19:33:06.669"..., 62) = 62
19882 19:33:06.670084 getpid()          = 19882
19882 19:33:06.670224 getpid()          = 19882
19882 19:33:06.670427 write(1, "19882:M 03 Apr 2019 19:33:06.670"..., 68) = 68
19882 19:33:06.670669 fdatasync(8)      = 0
...

after traffic was stopped no fdatasync was called until shutdown, 7 (or whatever) seconds after traffic finished

@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 3, 2019

after the fix:

21126 20:08:50.549241 execve("./src/redis-server", ["./src/redis-server", "--appendonly", "yes"], 0x7ffcbda4f100 /* 67 vars */) = 0
21126 20:08:50.550099 brk(NULL)         = 0x561b176ee000
21126 20:08:50.550532 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
21126 20:08:50.551002 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
21126 20:08:50.551335 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
21126 20:08:50.551651 fstat(3, {st_mode=S_IFREG|0644, st_size=133506, ...}) = 0
21126 20:08:50.551947 mmap(NULL, 133506, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd1c082a000
21126 20:08:50.552232 close(3)          = 0
...
21126 20:09:02.027309 epoll_wait(5, [{EPOLLIN, {u32=9, u64=9}}], 10128, 2) = 1
21126 20:09:02.027412 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
21126 20:09:02.027521 read(3, 0x7ffe46eec9c7, 1) = -1 EAGAIN (Resource temporarily unavailable)
21126 20:09:02.027621 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 30) = 30
21126 20:09:02.027729 write(9, ":7176\r\n", 7) = 7
21126 20:09:02.027846 epoll_wait(5, [{EPOLLIN, {u32=9, u64=9}}], 10128, 2) = 1
21126 20:09:02.027949 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
21126 20:09:02.028057 read(3, 0x7ffe46eec9c7, 1) = -1 EAGAIN (Resource temporarily unavailable)
21126 20:09:02.028158 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 30) = 30
21126 20:09:02.028259 write(9, ":7177\r\n", 7) = 7
...
21128 20:09:02.030577 fdatasync(8 <unfinished ...>
21126 20:09:02.030606 <... epoll_wait resumed> [{EPOLLIN, {u32=9, u64=9}}], 10128, 99) = 1
21126 20:09:02.030802 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
21126 20:09:02.031044 read(3, 0x7ffe46eec9c7, 1) = -1 EAGAIN (Resource temporarily unavailable)
21126 20:09:02.031447 write(9, ":7180\r\n", 7) = 7
...
21128 20:09:02.037128 <... fdatasync resumed> ) = 0
21126 20:09:02.037151 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
21128 20:09:02.037231 futex(0x561b15942fbc, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
21126 20:09:02.037322 read(3, 0x7ffe46eec9c7, 1) = -1 EAGAIN (Resource temporarily unavailable)
21126 20:09:02.037534 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n*3"..., 150) = 150
21126 20:09:02.037651 write(9, ":7184\r\n", 7) = 7
21126 20:09:02.037930 epoll_wait(5, [{EPOLLIN, {u32=9, u64=9}}], 10128, 92) = 1
21126 20:09:02.038058 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
21126 20:09:02.038168 read(3, 0x7ffe46eec9c7, 1) = -1 EAGAIN (Resource temporarily unavailable)
21126 20:09:02.038273 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 30) = 30
21126 20:09:02.038386 write(9, ":7185\r\n", 7) = 7
...
21126 20:09:02.142037 epoll_wait(5, [{EPOLLIN, {u32=9, u64=9}}], 10128, 88) = 1
21126 20:09:02.142145 read(9, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 16384) = 30
21126 20:09:02.142523 read(3, 0x7ffe46eec9c7, 1) = -1 EAGAIN (Resource temporarily unavailable)
21126 20:09:02.142850 write(8, "*3\r\n$6\r\nINCRBY\r\n$1\r\nx\r\n$1\r\n1\r\n", 30) = 30
21126 20:09:02.143091 write(9, ":7341\r\n", 7) = 7
21126 20:09:02.143229 epoll_wait(5, [], 10128, 86) = 0
...
21128 20:09:03.048505 fdatasync(8 <unfinished ...>
21126 20:09:03.048670 epoll_wait(5,  <unfinished ...>
21128 20:09:03.055693 <... fdatasync resumed> ) = 0
...
21126 20:09:08.476551 write(1, "Received SIGINT scheduling shutd"..., 38) = 38
21126 20:09:08.476804 write(1, "\n", 1) = 1
21126 20:09:08.477190 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
21126 20:09:08.477489 read(3, 0x7ffe46eec9c7, 1) = -1 EAGAIN (Resource temporarily unavailable)
21126 20:09:08.477754 epoll_wait(5, [], 10128, 3) = 0
21126 20:09:08.483762 getpid()          = 21126
21126 20:09:08.483985 openat(AT_FDCWD, "/proc/21126/stat", O_RDONLY) = 10
21126 20:09:08.484161 read(10, "21126 (redis-server) R 21124 211"..., 4096) = 329
21126 20:09:08.484407 close(10)         = 0
21126 20:09:08.484636 getpid()          = 21126
21126 20:09:08.484764 getpid()          = 21126
21126 20:09:08.485129 write(1, "21126:M 03 Apr 2019 20:09:08.484"..., 62) = 62
21126 20:09:08.485683 getpid()          = 21126
21126 20:09:08.486001 getpid()          = 21126
21126 20:09:08.486142 write(1, "21126:M 03 Apr 2019 20:09:08.485"..., 68) = 68
21126 20:09:08.486329 fdatasync(8)      = 0

fdatasync was called 1s after previous fdatasync (and after traffic is finished)

Tail of the AOF may never be synced to disk (until shutdown)

Consider the following scenario:
1. Some constant traffic on server
2. beforeSleep calls flushAppendOnly, schedules fsync, clears aof_buf
3. (fsync thread is running), traffic continues, beforeSleep, flushAppendOnly,
   this time no fsync is scheduled because it’s already in progress, clears aof_buf
4. Traffic stops, fsync thread finishes
5. Nothing will trigger another fsync because flushAppend returns immediately because
   aof_buf is empty

In order to fix the above we remember the last synced offset and
call fsync in flushAppendOnly if it's not as expected (different
from aof_current_size) even if aof_buf is empty
@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 10, 2019

side note:
in stopAppendOnly there's a race between the fsync thread (started by flushAppendOnly) and the immediate fsync (redis_fsync). AFAIU it doesn't have any impact (there is a situation in which the thread calls fsyncAppendOnly after the assignment of aof_sync_offset to zero, but it will be zeroed when AOF is turned back on) but, if the file descriptor is closed before the thread calls fsync, fsync will fail.
we can either:

  1. ignore it
  2. add a "do_fsync" flag to flushAppendOnly
  3. remove the fsync logic from flushAppendOnly (put it into fsyncAppendOnlyIfNeeded and call it only when we don't want blocking fsync)
  4. busy-wait for fsync thread to finish (if exists) in stopAppendOnly

the same race exists also in prepareForShutdown

@antirez
Copy link
Contributor

antirez commented Apr 10, 2019

Thank you @guybe7, I've to study this a bit in depth. Will check tomorrow afternoon.

@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 10, 2019

@antirez thanks!
btw, a slight correction to my previous comment:
even if we remove the fsync logic from flushAppendOnly we could still have a race if there's already a fsync in progress while calling stopAppendOnly.. so if we want to fix this, (4) is the right way to go (or another strategy which i haven't thought about)

@antirez
Copy link
Contributor

antirez commented Apr 11, 2019

Great discovery @guybe7, that should be fixed ASAP indeed. I've a question however, what about making the fix much simpler, just adding to serverCron() something like this?

if (server.aof_flush_postponed_start != 0 && sdslen(server.aof_buf) != 0)
    flushAppendOnlyFile(0);

@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 11, 2019

@antirez the problem starts when aof_buf is empty but the AOF still needs to be synced to disk (scenario is explained in top comment and commit comment).
in this situation (with the code you provided) sdslen(server.aof_buf) == 0 and flushAppendOnly will not be called and the AOF will not be synced

@antirez
Copy link
Contributor

antirez commented Apr 11, 2019

@guybe7 That looks like a simple refinement on top of what I suggested, we just create a function flushAppendOnlyPeriodically() that will do what I wrote above if the buffer is non empty, otherwise will just schedule the background fsync if the buffer is empty, no fsync is in progress, and aof_flush_postponed_start is non zero. And when it does, it just sets aof_flush_postponed_start to 0.

@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 11, 2019

@antirez i think i have not provided enough details:

  1. the problem is not that we are missing calls to flushAppendOnly (write to AOF), but that we are missing calls to fsync
  2. it's not related to aof_flush_postponed_start and "stuck" fsync

i'll describe a simple scenario:

  1. server executes INCR x - it is fed to aof_buf, beforeSleep is called, flushAppendOnly, aof_buf is written to file and cleared, background fsync thread starts (and ends), server.aof_last_fsync is set.
  2. server executes INCR y (less than a second after INCR x) - it is fed to aof_buf, beforeSleep is called, flushAppendOnly, aof_buf is written to file and cleared, background fsync thread does not start because it's been less than a second since server.aof_last_fsync was set.
  3. on the next beforeSleep flushAppendOnly returns immediately because aof_buf is empty, without starting a sync and so on..

maybe it is possible to fix this problem in a simpler manner (with a boolean), but i'm relying on maintaining aof_sync_offset for a future PR (allowing modules to get notified when AOF is synced up to a certain point in time - may be also helpful for disque)

another side note: since flushAppendOnly is called unconditionally in beforeSleep (every 100ms) we don't really need to call it from cron as well.. but that's harmless i suppose

@antirez
Copy link
Contributor

antirez commented Apr 12, 2019

@guybe7 ok thanks for making it clearer. Yep I got "1" and I also understand is not a matter of calling flushAppendOnly() in serveCron(), I suggested to call a different function from serverCron() to handle just the missing fsync, but I was not clear enough. However now I get that this can happen even if the fsync was not in progress.

The reason I want to fix it in the simplest way to start is that I think before making other PRs around that area I should start merging the wait-aof branch, please give it a look if possible, because it touches very related things and I've the feeling it's better to add the API for the modules after such branch is fully merged into unstable. Since this AOF thing is a very critical issue I would like to start with the simplest fix possible, and then reason how to make the wait-aof and the module API well designed together.

A minor thing about the PR: are you sure it is needed to declare the variable volatile with atomic operations? I think this should not be needed. Second, we are going to merge soon the Redis threaded I/O, I think we can use _Atomic instead. I'll soon covert all Redis atomic vars to _Atomic from C11. The new Makefile in the threaded-io branch directly uses a C11 target. Cheers.

@soloestoy
Copy link
Contributor

Hi @antirez @guybe7 , just read this PR and wait-aof branch, very interesting.

I understand the delayed fsync, but this PR seems a little bit complex, trying a simpler way to fix it.

But some codes and comments in wait-aof branch makes me confused:

        if (!sync_in_progress) aof_background_fsync(server.aof_fd);
        aofStartBackgroundFsync();
        /* Note that we update this time regardless of the fact fsync
         * was actually started or nmot. server.aof_last_fsync is not
         * used to really know when the latest fsync succeeded, but just
         * as a timer to *try* to fsync once every second. */
        server.aof_last_fsync = server.unixtime;

It said that we don't care about the fact fsync, so it seems that we don't need to fix the delayed fsync in this PR? Or maybe I misunderstand something?

@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 15, 2019

hi @antirez @soloestoy thanks for reviewing this PR.

TBH i can't really think of a simpler way to keep track of "how much" was fsynced except sampling the current aof size right before calling fsync... if you have a better/simpler way please let me know.

also, as i mentioned, having an indicator of the number of bytes guaranteed to be synced may be useful for modules (even tough it's possible to achieve the same result using aof_fsync_in_progress_epoch, but again, how can you tell if you're missing a call to fsync without sampling the current size before calling fsync?)

i read the wait-aof code, it seems to me like it could work together with this PR (of course there will be some conflicts but basically it's just calling fsyncAppendOnly instead of redis_fsync)

antirez, i think you're right about the volatile modifier, it's unnecessary and i'll drop it (if you think you'll merge this PR)

soloestoy, IIUC aof_last_fsync is just a rough indicator of when we did the last fsync... it's not an accurate metric and it's only used in order to trigger an fsync (roughly) every seconds.

@antirez antirez added the IMPORTANT This issue is important and should be checked often label Apr 26, 2019
@soloestoy
Copy link
Contributor

Hi @antirez @guybe7

Here is a simpler way IMHO, I think we don't need to check how many data synced in bio thread, the only thing we should know is the AOF file size when we create a bio job.

When we create a fsync bio job, we record the AOF file size as a fsync offset. Next time if aof buffer is empty, just check if the recorded fsync offset is same with current AOF size, if not, create a new bio job, so no race, no lock, just an offset.

See PR #6053 .

soloestoy added a commit to soloestoy/libredis-BSD that referenced this pull request Apr 29, 2019
@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 29, 2019

hi @soloestoy thanks for your time

i reviewed the fix, LGTM
the fix works according to the same idea, track how many bytes were synced, but offers a compromise over temporal accuracy (sampling aof_current_size prematurely, in the main thread) over usage of atomic variables.

both the fixes still comply with my work on modules "wait-aof" feature as they both keep track of the current amount of synced bytes.

@antirez it's up to you :)

@soloestoy
Copy link
Contributor

Yes @guybe7 , I use your idea and just simplify the implementation, hope you don't feel uncomfortable.

@guybe7
Copy link
Collaborator Author

guybe7 commented Apr 29, 2019

@soloestoy not at all :) TBH i don't think there's a better way to do it... my personal opinion is that your PR should be merged rather than this one

@itamarhaber
Copy link
Member

You two are such a cute pair :P

@antirez
Copy link
Contributor

antirez commented May 10, 2019

Thanks both, I merged @soloestoy PR. It's unfortunate we have to track this information (however @guybe7 said he has at least an usage for that), however looks like there is no simple way out. There is an alternative, but has also disadvantages, that is: just fsync every second even if the buffer is empty. However for read-heavy workloads it looks absurd and may cause totally unneeded delays. So... I guess we picked the best option. Thank you.

@antirez antirez closed this May 10, 2019
antirez added a commit that referenced this pull request May 10, 2019
aof: enhance AOF_FSYNC_EVERYSEC, more details in #5985
@antirez
Copy link
Contributor

antirez commented May 10, 2019

Note: this was backported to Redis 5 together with other changes. I think Monday is time to release a new Redis 5 patch level version.

@soloestoy
Copy link
Contributor

Note: this was backported to Redis 5 together with other changes. I think Monday is time to release a new Redis 5 patch level version.

Hi @antirez I think the next version should include #5956 and #5817, there two are obvious and serious bugs.

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Jul 10, 2019
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Jul 10, 2019
aof: enhance AOF_FSYNC_EVERYSEC, more details in redis#5985
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Mar 26, 2023
…YNC_ALWAYS

This PR fix two bugs:

1. About the MP-AOF one, that code assumes that we started a new AOF file just
now (when we have a new base into which we're gonna incrementally write), but the
fact is that with multipart-aof, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to re-issue
SELECT, and i suppose there's no reason to update any of the fsync metrics.
This should have been deleted with MP-AOF (introduced in redis#9788, 7.0).
The aof_fsync_offset it updates will cause us to miss an fsync in flushAppendOnlyFile,
if we stop write commands in AOF_FSYNC_EVERYSEC

2. About the AOF_FSYNC_ALWAYS one, it is follow redis#6053 (the details is in redis#5985),
we also check AOF_FSYNC_ALWAYS to prevent users from switching from everysec to always.

We detected these problems with WAITAOF test (redis 7.2), introduced in redis#11713.
oranagra pushed a commit that referenced this pull request Mar 29, 2023
…SYNC_ALWAYS (#11973)

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang. 

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
oranagra pushed a commit that referenced this pull request Apr 17, 2023
…SYNC_ALWAYS (#11973)

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.

(cherry picked from commit cb17178)
pulllock pushed a commit to pulllock/redis that referenced this pull request Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IMPORTANT This issue is important and should be checked often

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants