Skip to content

the new (c)size ticket #2357

@ThomasWaldmann

Description

@ThomasWaldmann

scope of this ticket

let's concentrate here on the issue of csize (and also size) information in the items' chunks lists, in the chunks and files cache. no crypto or other discussion in here, let's stay focussed.

csize

the main issue is that csize is not a direct function of the data, it also depends on compression and encryption (and other overhead) that is applied to the data. as both might change (and thus csize might change) while the chunk still contains the same (plaintext) content and has the same id, it is an annoyance to have csize in the chunks lists of archived items.

size

we must have chunk size information in the chunks lists of archived items for the case we lose multiple chunks in the repo - so we can replace them with all-zero chunks of same length. size is a direct function of the data, so no problem here if we change compression/encryption/overhead.

timing of size / csize computation

  • size is computed early, after the chunker has cut the chunks: len(chunk)
  • csize is computed late, after compression, after encryption/authentication. note: this can lead to a race (wait) condition in multithreaded processing.

where is chunk size/csize (not) stored?

  • repo: the current PUT entry in the segment file contains csize in the length information. no size available here! also, neither size nor csize is in the repo index.
  • archive: item.chunks = [(id, size, csize), ...]
  • chunks cache: id -> (refcount, size, csize)
  • files cache: no size/csize here: path_hash -> (file_size, ino, mtime, chunks=[id, id, ...])

where is size/csize used?

  • size dsize csize dcsize placeholders
  • Statistics class + show_progress
  • chunk_incref (gets size/csize from chunks cache - important for archiving unchanged files)
  • csize: Archive.info -> limits -> max_archive_size and Archive.__str__
  • csize: Cache.__str__ .chunks_stored_size
  • size: do_diff sum_chunk_sizes (to show sum of lengths of added/removed chunks of a file)
  • size: borg check size consistency check item.size == sum(chunks size)
  • tests

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions