-
-
Notifications
You must be signed in to change notification settings - Fork 827
Description
i thought a bit about how to optimize the chunks cache and just wanted to document one weird idea.
the issue with the chunks cache is that it needs to match the overall repository state (== have up-to-date information about all chunks in all archives, including refcount, size, csize). when backing up multiple machines into same repo, creating an archive of one machine invalidates all chunk caches on the other machines and they need to resync their chunks cache with the repo, which is expensive.
so, there is the idea to store the chunk index into the repo also, so all out-of-sync clients can just fetch the index from the repo.
But:
- index can be large (way larger than segment size)
- when using raw hashtable it could have up to 75% unused bucket space
- index has additional information about chunks, so we should not store it unencrypted if the repo is encrypted
- the index should match the chunks in the repo, so it should not create own chunks in the repo.
So we need:
- chunking of the index into smaller pieces
- compression (unused bucket space is mostly binary zeros AFAIK)
- encryption
This pretty much sounds like we should just backup the index of repo A into a related, but separate borg repository A'. :-)