Skip to content

Fix fork done handler wrongly update fsync metrics and enhance AOF_FSYNC_ALWAYS#11973

Merged
oranagra merged 10 commits intoredis:unstablefrom
enjoy-binbin:fix_aof_fsync
Mar 29, 2023
Merged

Fix fork done handler wrongly update fsync metrics and enhance AOF_FSYNC_ALWAYS#11973
oranagra merged 10 commits intoredis:unstablefrom
enjoy-binbin:fix_aof_fsync

Conversation

@enjoy-binbin
Copy link
Contributor

@enjoy-binbin enjoy-binbin commented Mar 26, 2023

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the WAITAOF test hang.

The change in backgroundRewriteDoneHandler is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue SELECT, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to aof_fsync_offset will cause us to miss an fsync
in flushAppendOnlyFile, that happens if we stop write commands in AOF_FSYNC_EVERYSEC
while an AOFRW is in progress. This caused a new WAITAOF test to sometime hang forever.

Also because of MP-AOF, we needed to change aof_fsync_offset to aof_last_incr_fsync_offset
and match it to aof_last_incr_size in flushAppendOnlyFile. This is because in the past we compared
aof_fsync_offset and aof_current_size, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in flushAppendOnlyFile, about the AOF_FSYNC_ALWAYS, it is follow #6053
(the details is in #5985), we also check AOF_FSYNC_ALWAYS to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.

…YNC_ALWAYS

This PR fix two bugs:

1. About the MP-AOF one, that code assumes that we started a new AOF file just
now (when we have a new base into which we're gonna incrementally write), but the
fact is that with multipart-aof, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to re-issue
SELECT, and i suppose there's no reason to update any of the fsync metrics.
This should have been deleted with MP-AOF (introduced in redis#9788, 7.0).
The aof_fsync_offset it updates will cause us to miss an fsync in flushAppendOnlyFile,
if we stop write commands in AOF_FSYNC_EVERYSEC

2. About the AOF_FSYNC_ALWAYS one, it is follow redis#6053 (the details is in redis#5985),
we also check AOF_FSYNC_ALWAYS to prevent users from switching from everysec to always.

We detected these problems with WAITAOF test (redis 7.2), introduced in redis#11713.
Copy link
Contributor

@chenyang8094 chenyang8094 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally LGTM.
few comments about comment or test improvements though.
and one about changing the approach for the everysec->always config change.

@oranagra
Copy link
Member

@soloestoy please take a look.

Copy link
Contributor

@soloestoy soloestoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM, but aof_fsync_dirty is unnecessary, aof_fsync_offset can work well.

@enjoy-binbin
Copy link
Contributor Author

@soloestoy the reason i add aof_fsync_dirty is here: #11973 (review)

@soloestoy
Copy link
Contributor

I still don't like the "dirty" flag, aof_fsync_offset can work well if we compare it with aof_last_incr_size instead of aof_current_size.

@enjoy-binbin
Copy link
Contributor Author

@oranagra @soloestoy thanks for the review and suggestions, i changed to use aof_last_incr_size, please take a look
I will run Daily CI (with some loop) and make sure there will be no problem here

@oranagra
Copy link
Member

so anything left here or can this be merged?
anything new to mention in the top comment?

@enjoy-binbin
Copy link
Contributor Author

i think it is ready to merge, and i also update the top comment, please check (or fix it)
thanks for the patient reviews...

@oranagra oranagra merged commit cb17178 into redis:unstable Mar 29, 2023
@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Mar 29, 2023
@enjoy-binbin enjoy-binbin deleted the fix_aof_fsync branch March 29, 2023 12:19
@enjoy-binbin enjoy-binbin changed the title Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS Fix fork done handler wrongly update fsync metrics and enhance AOF_FSYNC_ALWAYS Mar 31, 2023
@oranagra oranagra mentioned this pull request Apr 17, 2023
oranagra pushed a commit that referenced this pull request Apr 17, 2023
…SYNC_ALWAYS (#11973)

This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.

The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.

Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.

The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.

(cherry picked from commit cb17178)
@oranagra oranagra mentioned this pull request May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants