Skip to content

borgception #474

@ThomasWaldmann

Description

@ThomasWaldmann

i thought a bit about how to optimize the chunks cache and just wanted to document one weird idea.

the issue with the chunks cache is that it needs to match the overall repository state (== have up-to-date information about all chunks in all archives, including refcount, size, csize). when backing up multiple machines into same repo, creating an archive of one machine invalidates all chunk caches on the other machines and they need to resync their chunks cache with the repo, which is expensive.

so, there is the idea to store the chunk index into the repo also, so all out-of-sync clients can just fetch the index from the repo.

But:

  • index can be large (way larger than segment size)
  • when using raw hashtable it could have up to 75% unused bucket space
  • index has additional information about chunks, so we should not store it unencrypted if the repo is encrypted
  • the index should match the chunks in the repo, so it should not create own chunks in the repo.

So we need:

  • chunking of the index into smaller pieces
  • compression (unused bucket space is mostly binary zeros AFAIK)
  • encryption

This pretty much sounds like we should just backup the index of repo A into a related, but separate borg repository A'. :-)


💰 there is a bounty for this

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions