-
-
Notifications
You must be signed in to change notification settings - Fork 827
Closed
Description
borg 1.x approach
borg 1.x archived regular file items have:
.chunks: a list of(chunkid, plaintext_size)tuples, referencing file content chunks.chunks_healthy: same thing for an item that got its chunks list patched with all-zero replacement chunks because the correct chunk is missing in the repo. chunks_healthy has the original, correct chunk ids. the all-zero replacement chunks are stored in the repo.
When doing it that way, all borg code reading file content does not necessarily need special casing for missing chunks. If there is a NEW missing chunk, it might crash though, but then users will run borg check --repair and that will "patch" the chunks list with a new all-zero replacement chunk and it won't crash anymore.
If a missing chunk reappears (e.g. because a new backup has created it again), borg check --repair will notice that and put the correct chunk id from the .chunks_healthy list back to the chunks list.
That approach worked, but has some issues:
- some places need to deal with both lists, e.g. borg2 compact, check
- as long as
.chunksis not patched, places not dealing with missing chunks might crash - requires
repair,create,repairsequence to fix missing chunks (ifcreatere-creates the missing chunks)
borg2 approach
New, better approach for borg2:
- make the places reading file content (chunks) deal with missing chunks. they can either fill in a dynamically created (not stored) all-zero bytestring (length is known) or raise an
IOError. IIRC,borg mounthas already an option for that. - do NOT have
.chunks_healthyin the archived item (not needed) and also never modify.chunks(it will always contain the correct (chunkid, size) tuples, even for missing chunks.
Pros of this approach:
- if a previously missing chunk reappears, all items referencing it will be immediately healed, no double-run of
borg check --repairneeded. - if there is a new missing chunk, borg will have some defined behaviour (IOError or zero-bytes) and not just crash, without requiring
borg check --repairto achieve that behaviour. - code that needed to deal with
.chunksand.chunks_healthywill get simpler. - no need to store all-zero patch chunks in the repo
- it's simpler overall
Cons:
- we can't track "new missing" / "new reappeared" chunks ("new" since last repair), we can only track the overall count of missing chunks. but guess that is good enough.
- if an archive points to missing file content chunks and the archive is re-chunked (borg recreate), then the target archive will have an all-zero bytes run instead of a missing chunk. so better first do a repo check and avoid rechunking if the repo misses chunks.
Transfer from borg 1.x to 2:
- If there is a
.chunks_healthylist for an archived borg 1.x item, this is the one it would use for.chunksof the borg2 item - because this has all the correct chunk ids. - Also, it would transfer chunks from borg1 repo to borg2 repo using the
.chunks_healthylist, if present (and thus not transfer any all-zero replacement chunks). It might encounter missing chunks in the borg1 repo when doing that and would silently skip them.
Reactions are currently unavailable