Skip to content

Network Cache Utilities#34686

Open
blue42u wants to merge 3 commits intospack:developfrom
blue42u:unified-network-cache
Open

Network Cache Utilities#34686
blue42u wants to merge 3 commits intospack:developfrom
blue42u:unified-network-cache

Conversation

@blue42u
Copy link
Copy Markdown
Contributor

@blue42u blue42u commented Dec 23, 2022

Spack contains at least two caches for remote files, with a proposed third:

  • The BinaryIndexCache caches the index.json fetched from configured mirrors,
  • The source cache caches tarballs (etc.) fetched from either configured mirrors or external hosting sites, and
  • Local cache for buildcache files #32136 proposes a third cache for buildcache *.spack files for CI users.

All three are implemented separately, use very different management schemes and reinvent cache validity algorithms depending on the situation. This is a large maintenance burden for three otherwise similar problems. Fundamentally, all three caches cache remote files that change over time (with varying frequencies).

This PR implements a single NetworkCache class that abstracts away the details of maintaining a network cache, providing a convenient URL-based interface to fetch() remote files via a unified code path. The class is self-contained and robust for later planned improvements/extensions, and is both parallel-safe and space-friendly using content hashes. The added tests drive this new code with 99% coverage.

The end-goal is to (eventually) unify all network caches to primarily use NetworkCache. The first user is the BinaryIndexCache, which has been gutted to remove the network and caching code (much of which was rewritten recently by #34641). The BinaryIndexCache now primarily implements an in-memory cache of Spec objects from buildcache indices, not the index.json files themselves.

Do one thing and do it well. -- UNIX

@spackbot-app spackbot-app bot added binary-packages core PR affects Spack core functionality tests General test capability(ies) utilities labels Dec 23, 2022
@blue42u blue42u force-pushed the unified-network-cache branch 2 times, most recently from 77eafc3 to 59675c6 Compare December 23, 2022 21:20
@blue42u
Copy link
Copy Markdown
Contributor Author

blue42u commented Dec 23, 2022

@haampie I'd like to use this PR to supersede #34360. IMO this cuts at the root issue with the current BinaryIndexCache, and the abstraction makes this approach far more robust and extensible moving forward.

@haampie
Copy link
Copy Markdown
Member

haampie commented Dec 24, 2022

I'll have a look after the Christmas break. The tests look good from a quick scan. I wouldn't call this "unified" since it's the only place in spack where cache invalidation (for something remote) is needed. The only thing that could be unified is fetching sources & binaries the same way (by content hash), but that requires more steps

@blue42u blue42u changed the title (Unified) Network Cache Utilities Network Cache Utilities Dec 24, 2022
@blue42u
Copy link
Copy Markdown
Contributor Author

blue42u commented Dec 24, 2022

@haampie Thanks! No rush, enjoy your break. 🎄

Cache invalidation is not strictly necessary for current functionality but IMO there are benefits to having it:

  • Caching the *.spec.json becomes possible with cache invalidation. Without it, a client may forever install a previous binary package from the cache (which was why only *.spack was implemented in Local cache for buildcache files #32136).
  • Sites could (in theory) host a mirror containing "rolling" source tarballs for local/proprietary packages. Cache invalidation would ensure the most up-to-date source is used every time the package is built. (Obviously the version(...) directive would also need to be updated, presumably that would be done via a custom repo maintained at the site.)
  • In general, if the content hash for a remote file changes the cache may end up storing both the old and new versions of the file. Caching based on request URL allows the old version to be cleaned up in many cases, reducing space usage automatically without requiring a nuclear spack clean -a. (TBD interface to map mirror URLs to their "origin" URL.)

IMHO the end result will be more elegant and maintainable if all three use a unified cache implementation under the hood, keeping the complex urllib code out of already complicated algorithms. BinaryIndexCache seemed like the most complex of the set and a reasonable starting place.

@blue42u
Copy link
Copy Markdown
Contributor Author

blue42u commented Jan 27, 2023

@haampie Will you have time to review this in the coming weeks?

@blue42u blue42u force-pushed the unified-network-cache branch from 4540725 to 0aedba7 Compare February 17, 2023 05:06
@blue42u blue42u force-pushed the unified-network-cache branch from 0aedba7 to a039e98 Compare February 17, 2023 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binary-packages core PR affects Spack core functionality tests General test capability(ies) utilities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants