Adapt redis-check-aof tool for Multi Part Aof#10061
Adapt redis-check-aof tool for Multi Part Aof#10061oranagra merged 18 commits intoredis:unstablefrom chenyang8094:adapt-check-aof-for-multi-aof
Conversation
|
PAT @yoav-steinberg |
|
@chenyang8094 i didn't review the code yet, but i'd like to respond to the top comment.
|
|
Before I review here are some questions:
I don't think the tool needs to be atomic. If we have BASE+INCR1+INCR2 and the specified timestamp is in the middle of INCR1 then we need to delete to truncate INCR1, delete INCR2 and update the manifest. I think it's OK not to be atomic here. The user should always backup the directory before running this tool.
|
Ahe AOF file produced by rewrite won't have timestamps. And note that corruptions are only expected at the last INCR file (due to fsync or bad mount options) |
Unless we add special parameters, I can't think of any effective way to identify whether this file is an old AOF or a manifest. It is not reliable to rely on the file name alone.
I thought, in this case, the user will see the following message: |
|
regarding the detection of an old AOF file, or a manifest i'm not sure it's that bad to rely on the file name (we do control the name of the manifest file, and it's unlikely that users will name their old AOF files with a "manifest" suffix). regarding the truncation message, that's great.
|
We only recommend that users back up but it is not mandatory. Without the guarantee of atomicity, I don't want to destroy the original data. In addition, if it crashes due to some reason in the middle, and we have no chance to print out enough log, may cause the manifest and AOF to be incompatible, the user will not know whether the problem is caused by the
Of course we can (if BASE is the last file).
|
|
The manifest must start with So, we can do like this:
Yes, only modifying the manifest is enough, but a better suggestion is to delete the files together, otherwise these files will never be seen by redis and will not be recycled by the GC mechanism. |
|
ok, so i think we can agree on:
|
|
For the record I think a recovery tool shouldn't worry about atomicity. For two main reasons:
I'm also fine with not editing the manifest file at this point to keep the tool simple (in 99% of the cases we won't need to truncate a non last INCR file), but I don't think atomicity has anything to do with it. |
|
@yoav-steinberg Aside from atomicity, but it does bring a lot of complexity. Thinking from another perspective, if the user really wants to truncate the non-last INCR file, we print a friendly message and ask him to manually delete these files (this time he may realize that these files should be backed up first). I think this is a safe practice, and deleting files is a very easy thing for user, and the probability of this happening is very small. |
|
@oranagra 1,2,3,4 all done. |
oranagra
left a comment
There was a problem hiding this comment.
there's quite a lot of refactoring here, which is good since the old code was out of shape.
maybe though some of this refactoring could have been left aside for another chance (like the process function which i'm not sure why you had to refactor).
additionally, if we already do so much work here, let's try to further improve.
specifically i feel some doc comments are still missing.
|
@chenyang8094 please take a look at this failure |
|
another similar error: i don't understand how we could get "aborting...", if we didn't pass the |
|
@oranagra GOT, I have added more logs (printf the |
|
finally reappeared. It looks like we are getting a wrong path, is there a bug in the dirname function? |
|
I add this log: So it's Copy a glibc implementation: |
|
finally found the bug: Since |


issue link : #9788 (comment)
Modifications of this PR:
Support the verification of
Multi Part AOF, while still maintaining support for the old-styleAOF/RDB-preamble.redis-check-aofwill automatically choose which mode to use according to the incoming file format.Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>Refactor part of the code to make it easier to understand
Currently only supports truncate (
--fixor--truncate-to-timestamp) the last AOF file (may beBASEorINCR)The reasons for 3 above:
--fix: Only the last AOF may be truncated, this is guaranteed by redis--truncate-to-timestamp: Normally, we only haveBASE+INCRfiles at most, andBASEcannot be truncated(It only contains a timestamp annotation at the beginning of the file), so onlyINCRcan be truncated. If we have aBASE+INCR1+INCR2file (meaning we have an interrupted AOFRW), OnlyINCR2files can be truncated at this time. If we still insist on truncateINCR1, we need to manually deleteINCR2and update the manifest file, then re-runredis-check-aof