Implement Multi Part AOF mechanism to avoid AOFRW overheads.#9788
Implement Multi Part AOF mechanism to avoid AOFRW overheads.#9788oranagra merged 65 commits intoredis:unstablefrom chenyang8094:feature-multi-aof-use-meta
Conversation
|
@chenyang8094 Just out of curiosity, does this create a foundation to extend #8015 to AOF-only and not only RDB-only setups? Thank you |
oranagra
left a comment
There was a problem hiding this comment.
@chenyang8094 Thank you.
I started reviewing the PR, and also edited the PR top comment to make it clearer (please check if it looks right).
So far i reviewed just parts of it (namely all the files excluding aof.c and the tests).
I.d like to post my comments for discussion before i proceed with the other parts, to see if we agree on things, or maybe i have a misunderstanding.
Sure, you can see some followup PR plans mentioned by oranagra herer. |
|
I've just created a few new issues to track these ideas (don't like them sitting around in a comment in a closed PR) The other ones might not make it for this release, but we can at least start discussion and mark them for next release: |
|
@oranagra About upgrades from old redis versions, I think we can't directly rename the old AOF to the new name, because we still can't solve the atomic problem of rename and update manifest. We can write the old AOF name directly into the manifest, so that we can start loading normally. Even if the process crashes after writing the manifest, that is no problem, because we can find the AOF from the manifest. Once AOFRW is executed once, all files will have new file names. (current code is implemented according to this idea) |
|
ok, sounds good to me. so the AOF file (possibly with preamble content) will have it's old name, and will be listed in the manifest. |
oranagra
left a comment
There was a problem hiding this comment.
Done reviewing aof.c, didn't review the tests yet.
Change aofRewriteLimited and Update TCL test
|
🎉 more than 400 comments 2500 LOC, and some 3 months (counting form #9539) |
Thank you, now I am going to deal with the next PR. |
|
@chenyang8094 can you please look into this failure: |
|
i also saw it in my daily. so maybe change it to 1KB, or do a for loop and insert some keys (maybe at least 20000 keys) |
|
maybe.. maybe instead of using |
|
it is easy to reproduce if we delete all the like in this commit (7d4f50f), if we change the 20000 to be smaller, it will fail because not able to touch 1MB. (i can trigger a CI in FreeBSD and see the result, but i think this is the right fix, the but the time it should be similar in FreeBSD, because we need to generate that many keys all the time (successful use case too))
btw, i think it is not enought, the |
|
I think we can change |
In redis#9788, now we stores all persistent append-only files in a dedicated directory. The name of the directory is determined by the appenddirname configuration parameter in redis.conf Update create-cluster clean to clean this default directory. Fixes redis#10222
In #9788, now we stores all persistent append-only files in a dedicated directory. The name of the directory is determined by the appenddirname configuration parameter in redis.conf. Now each node have a separate folder. Update create-cluster clean to clean this default directory.
| misc/* | ||
| src/release.h | ||
| appendonly.aof | ||
| appendonly.aof.* |
There was a problem hiding this comment.
should be appendonly.aof* instead of appendonly.aof.*
now, appendonly.aof will not be ignored.
There was a problem hiding this comment.
Maybe you're right, but the current implementation doesn't create a file named appendonly.aof unless you manually copy an old-style AOF for upgrade testing. So I think we just need to ignore the file we're going to create. @oranagra WDYT?
There was a problem hiding this comment.
i think it's a good idea to ignore both.
i.e. both the ones we create, and the ones we created in the past, that are still left in the folder.
There was a problem hiding this comment.
Only developers pay attention to .gitignore, and when I updated the unstable brance code, appendonly.aof caught my attention, so I started this discussion.
There was a problem hiding this comment.
@oranagra @yangbodong22011 Sounds reasonable, I'll make a PR to add it, thanks.
…YNC_ALWAYS This PR fix two bugs: 1. About the MP-AOF one, that code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with multipart-aof, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue SELECT, and i suppose there's no reason to update any of the fsync metrics. This should have been deleted with MP-AOF (introduced in redis#9788, 7.0). The aof_fsync_offset it updates will cause us to miss an fsync in flushAppendOnlyFile, if we stop write commands in AOF_FSYNC_EVERYSEC 2. About the AOF_FSYNC_ALWAYS one, it is follow redis#6053 (the details is in redis#5985), we also check AOF_FSYNC_ALWAYS to prevent users from switching from everysec to always. We detected these problems with WAITAOF test (redis 7.2), introduced in redis#11713.
…SYNC_ALWAYS (#11973) This PR fix several unrelated bugs that were discovered by the same set of tests (WAITAOF tests in #11713), could make the `WAITAOF` test hang. The change in `backgroundRewriteDoneHandler` is about MP-AOF. That leftover / old code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with MP-AOF, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue `SELECT`, and no reason to update any of the fsync variables in that flow. This should have been deleted with MP-AOF (introduced in #9788, 7.0). The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC` while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever. Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset` and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared `aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced. The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053 (the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
…SYNC_ALWAYS (#11973) This PR fix several unrelated bugs that were discovered by the same set of tests (WAITAOF tests in #11713), could make the `WAITAOF` test hang. The change in `backgroundRewriteDoneHandler` is about MP-AOF. That leftover / old code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with MP-AOF, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue `SELECT`, and no reason to update any of the fsync variables in that flow. This should have been deleted with MP-AOF (introduced in #9788, 7.0). The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC` while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever. Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset` and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared `aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced. The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053 (the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where appendfsync is changed from everysec to always while there is data that's written but not yet fsynced. (cherry picked from commit cb17178)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
This PR implements the Meta-File Based Multi-Part AOF mechanism described in 9539.
The main issues with the the original AOFRW mechanism are:
The main modifications of this PR:
BASEtype, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only oneBASEfile at most. The second isINCRtype, may have more than one. They represent the incremental commands since the last AOFRW.appendfilenamewill be the base part of the new file name, for example:appendonly.aof.1.base.rdbandappendonly.aof.2.incr.aofappendfilenameaof_rewrite_buffer_lengthfield in info.aof-disable-auto-gcconfiguration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now.of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
appenddirnameconfiguration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory.aof-load-truncatedis enabled.To Do: