Fix fork done handler wrongly update fsync metrics and enhance AOF_FSYNC_ALWAYS#11973
Fix fork done handler wrongly update fsync metrics and enhance AOF_FSYNC_ALWAYS#11973oranagra merged 10 commits intoredis:unstablefrom
Conversation
…YNC_ALWAYS This PR fix two bugs: 1. About the MP-AOF one, that code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with multipart-aof, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue SELECT, and i suppose there's no reason to update any of the fsync metrics. This should have been deleted with MP-AOF (introduced in redis#9788, 7.0). The aof_fsync_offset it updates will cause us to miss an fsync in flushAppendOnlyFile, if we stop write commands in AOF_FSYNC_EVERYSEC 2. About the AOF_FSYNC_ALWAYS one, it is follow redis#6053 (the details is in redis#5985), we also check AOF_FSYNC_ALWAYS to prevent users from switching from everysec to always. We detected these problems with WAITAOF test (redis 7.2), introduced in redis#11713.
oranagra
left a comment
There was a problem hiding this comment.
generally LGTM.
few comments about comment or test improvements though.
and one about changing the approach for the everysec->always config change.
|
@soloestoy please take a look. |
soloestoy
left a comment
There was a problem hiding this comment.
mostly LGTM, but aof_fsync_dirty is unnecessary, aof_fsync_offset can work well.
|
@soloestoy the reason i add |
|
I still don't like the "dirty" flag, |
|
@oranagra @soloestoy thanks for the review and suggestions, i changed to use aof_last_incr_size, please take a look |
|
so anything left here or can this be merged? |
|
i think it is ready to merge, and i also update the top comment, please check (or fix it) |
…SYNC_ALWAYS (#11973) This PR fix several unrelated bugs that were discovered by the same set of tests (WAITAOF tests in #11713), could make the `WAITAOF` test hang. The change in `backgroundRewriteDoneHandler` is about MP-AOF. That leftover / old code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with MP-AOF, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue `SELECT`, and no reason to update any of the fsync variables in that flow. This should have been deleted with MP-AOF (introduced in #9788, 7.0). The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC` while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever. Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset` and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared `aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced. The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053 (the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where appendfsync is changed from everysec to always while there is data that's written but not yet fsynced. (cherry picked from commit cb17178)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the
WAITAOFtest hang.The change in
backgroundRewriteDoneHandleris about MP-AOF.That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue
SELECT, and no reason to update any of the fsync variables in that flow.This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to
aof_fsync_offsetwill cause us to miss an fsyncin
flushAppendOnlyFile, that happens if we stop write commands inAOF_FSYNC_EVERYSECwhile an AOFRW is in progress. This caused a new
WAITAOFtest to sometime hang forever.Also because of MP-AOF, we needed to change
aof_fsync_offsettoaof_last_incr_fsync_offsetand match it to
aof_last_incr_sizeinflushAppendOnlyFile. This is because in the past we comparedaof_fsync_offsetandaof_current_size, but with MP-AOF it could be the total AOF file will besmaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in
flushAppendOnlyFile, about theAOF_FSYNC_ALWAYS, it is follow #6053(the details is in #5985), we also check
AOF_FSYNC_ALWAYSto handle a case whereappendfsync is changed from everysec to always while there is data that's written but not yet fsynced.