Skip to content

Conversation

@dnnr
Copy link

@dnnr dnnr commented Jan 22, 2015

This adds the construction of a metadata index during archive creation,
which can be used to narrow down the location of particular entries
within the items list. The FUSE mount uses this index to fetch only
those chunks that are relevant to the specific operation instead of
fetching the metadata of the entire archive.

As a result, using FUSE mounts of large archives is consirably more
reponsive. And more importantly, the performance isn't indirectly
proportional to the the archive size anymore. Any bulk operations that
require the full metadata tree anyways (such as running "find" on the
entire archive) are not negatively impacted.

For this to work, the filesystem traversal order had to be changed from
depth-first to breadth-first, which introduces the new metadata version
number 2. Any otherwise unrelated parts of the code and tests that
relied on the previous behavior are adjusted accordingly.

@dnnr
Copy link
Author

dnnr commented Jan 22, 2015

This could be improved even further by adding a read-cache into attic. As far as I could tell, the Cache class is currently used for write access only, right? As of yet, my patch still fetches chunks repeatedly if (and only if) multiple archives are loaded that have intersecting entries their metadata['items'] list... which isn't unlikely for real datasets.

I wasn't sure though if this should be just slapped into the Cache class and used in do_mount, so I left that out for now.

This adds the construction of a metadata index during archive creation,
which can be used to narrow down the location of particular entries
within the items list. The FUSE mount uses this index to fetch only
those chunks that are relevant to the specific operation instead of
fetching the metadata of the entire archive.

As a result, using FUSE mounts of large archives is consirably more
reponsive. And more importantly, the performance isn't indirectly
proportional to the the archive size anymore. Any bulk operations that
require the full metadata tree anyways (such as running "find" on the
entire archive) are not negatively impacted.

For this to work, the filesystem traversal order had to be changed from
depth-first to breadth-first, which introduces the new metadata version
number 2. Any otherwise unrelated parts of the code and tests that
relied on the previous behavior are adjusted accordingly.
@dnnr dnnr closed this by deleting the head repository Sep 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant