borg2: reconsider chunks_healthy approach

## borg 1.x approach

borg 1.x archived regular file items have:
- `.chunks`: a list of `(chunkid, plaintext_size)` tuples, referencing file content chunks
- `.chunks_healthy`: same thing for an item that got its chunks list patched with all-zero replacement chunks because the correct chunk is missing in the repo. chunks_healthy has the original, correct chunk ids. the all-zero replacement chunks are stored in the repo.

When doing it that way, all borg code reading file content does not necessarily need special casing for missing chunks. If there is a NEW missing chunk, it might crash though, but then users will run `borg check --repair` and that will "patch" the chunks list with a new all-zero replacement chunk and it won't crash anymore.

If a missing chunk reappears (e.g. because a new backup has created it again), `borg check --repair` will notice that and put the correct chunk id from the `.chunks_healthy` list back to the `chunks` list.

That approach worked, but has some issues:
- some places need to deal with both lists, e.g. borg2 compact, check
- as long as `.chunks` is not patched, places not dealing with missing chunks might crash
- requires `repair`, `create`, `repair` sequence to fix missing chunks (if `create` re-creates the missing chunks)

## borg2 approach

New, better approach for borg2:
- make the places reading file content (chunks) deal with missing chunks. they can either fill in a dynamically created (not stored) all-zero bytestring (length is known) or raise an `IOError`. IIRC, `borg mount` has already an option for that.
- do NOT have `.chunks_healthy` in the archived item (not needed) and also never modify `.chunks` (it will always contain the correct (chunkid, size) tuples, even for missing chunks.

Pros of this approach:
- if a previously missing chunk reappears, all items referencing it will be immediately healed, no double-run of `borg check --repair` needed.
- if there is a new missing chunk, borg will have some defined behaviour (IOError or zero-bytes) and not just crash, without requiring `borg check --repair` to achieve that behaviour.
- code that needed to deal with `.chunks` and `.chunks_healthy` will get simpler.
- no need to store all-zero patch chunks in the repo
- it's simpler overall

Cons:
- we can't track "new missing" / "new reappeared" chunks ("new" since last repair), we can only track the overall count of missing chunks. but guess that is good enough.
- if an archive points to missing file content chunks and the archive is re-chunked (borg recreate), then the target archive will have an all-zero bytes run instead of a missing chunk. so better first do a repo check and avoid rechunking if the repo misses chunks.

## Transfer from borg 1.x to 2:

- If there is a `.chunks_healthy` list for an archived borg 1.x item, this is the one it would use for `.chunks` of the borg2 item - because this has all the correct chunk ids.
- Also, it would transfer chunks from borg1 repo to borg2 repo using the `.chunks_healthy` list, if present (and thus not transfer any all-zero replacement chunks). It might encounter missing chunks in the borg1 repo when doing that and would silently skip them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

borg2: reconsider chunks_healthy approach #8559

borg 1.x approach

borg2 approach

Transfer from borg 1.x to 2:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

borg2: reconsider chunks_healthy approach #8559

Description

borg 1.x approach

borg2 approach

Transfer from borg 1.x to 2:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions