Solver: Cache Concretization Results#48198
Conversation
33c066d to
193bd00
Compare
|
I've had to move significantly more code than planned. The |
Go for it |
|
I only quickly glanced at this, but what I've seen does not look process safe. You could do file locking per cache entry / dir, but probably easier is to do atomic adds (write to safe tempfile in target dir and os.rename atomically) and reads (just an open call), minimize tree traversal, and assume that any time you traverse most operations can raise. Also write in tell don't ask style, any |
You're right, Windows is single-spack only at the moment, so I often forget this, will address, thanks. |
ef537ee to
e5003fc
Compare
9b4f1a8 to
1a77c8a
Compare
|
Looks like this still needs to handle splicing a bit better, will investigate further monday |
|
Gitlab failures are currently rouge pipeline failures. |
|
Did anybody try any performance benchmark on this PR? Otherwise I can try to compare with develop on a cold and hot cache. |
|
@johnwparent had shared some numbers previously. I agree it would be nice to have a record of that in the PR too. |
kwryankrattiger
left a comment
There was a problem hiding this comment.
Approving now, will merge after performance numbers are posted.
|
@kwryankrattiger @alalazo benchmarks have been added to the PR description |
|
Do we have a lockfile per cache entry: $ ls -larth ~/.spack/cache/concretization/12
total 1,1M
-rwxrwxr-x 1 culpo culpo 0 feb 28 18:36 .128f2dc2834e4e295aca1f50be39c5003340cbcaaff2db8e94ee565d92c3bea1.lock
drwxrwxr-x 3 culpo culpo 4,0K feb 28 18:36 ..
-rw-rw-r-- 1 culpo culpo 1,1M feb 28 18:36 128f2dc2834e4e295aca1f50be39c5003340cbcaaff2db8e94ee565d92c3bea1
drwxrwxr-x 2 culpo culpo 4,0K feb 28 18:36 .? |
Yes, each entry is locked individually as per the behavior of the underlying |
|
I guess we can probably optimize in a later PR to use a single lock file per the entire cache, like we do for store and stages (and lock different bytes based on the entry, iirc). |
Concretizer caching for reusing solver results
Concretizer caching for reusing solver results
|
@kwryankrattiger why did you merge this? Did you try it out? Please check with other reviewers before merging. I would suggest to disable this feature, because getting cache hits reliably for even the simplest spec is impossible: 2 of 7 hits, that's really bad. Also don't understand why the config option was put under |
|
I had tried it out and we had discussed this PR in another meeting with @tgamblin and @alalazo and it was fine to merge. The issues causing non determinism were found post merge. The issues are known and being resolved by @johnwparent now. It isn't breaking workflows so no one jumped to revert. |
|
The key is URL is because it is going to accept URLs in the near term for CI sharing of the cache. The location of the config was chosen to match the other misc cache configs. We don't put mirror misc cache configs in mirrors.yaml for example. |
Not the best example given that I still think it's better to put it in |
#48198 Introduces the concretization cache, and associated manifest. However, it only added testing for the cache itself, not the manifest or cleanup procedures. This PR: - [x] Adds tests for the manifest and cleanup - [x] Refactors the manifest to update correctly - [x] Refactors the cache to store base32 hashes like the rest of Spack's hashes - [x] gzips the cache entries for a smaller memory footprint `FileCache` class was refactored into an abstract base `Cache` class which provides the general cache read/write transaction interface, and a `FileCache` and `DirectoryCache` child classes, `FileCache` which should be functionally identical to the `FileCache` class pre refactor, deals with caches at the individual file level, and `DirectoryCache`, which is mean to provide a directory level or "bucket" caching level.
Third try to enable concretization caching in `develop`. Past attempts were reverted due to cache management issues and race conditions. See #48198 for the introduction of the concretization cache, manifest, etc. That PR only added testing for the cache itself, not the manifest or cleanup procedures. This PR: - [x] Adds tests for cleanup - [x] Refactors the cache to store base32 hashes like the rest of Spack's hashes - [x] gzips cache entries for a smaller memory footprint - [x] Removes redundant stat calls - [x] Less aggressive/error prone cleanup phase - [x] Adds an entry limit to the cache, removes the size limit - [x] Rewrite pruning to use LRU instead of FIFO - [x] Canonicalizes and strips `clingo` input before hashing, for greater hit rate This PR simplifies the locking around the concretization cache. Entries are stored by the base32 hash of their inputs, and there is a single lock file (as we do in many places in spack) for all such entries. We use byte range locks to manage all the entries. Cleanup has changed to be less expensive overall. There is no more size limit; there is only an entry limit, and specs in the cache are compressed. Specs are ~35k, and the initial limit is 1000 cache entries, or ~35 MB. This limit is customizable in configuration. To check if the cache needs cleanup, we need only list files in the cache directory. To clean up, we stat the files and clean up the least recently used concretizations. We remove half the cache at a time so that we do not pay for cleanup on every concretization when we're close to the cache limit. The concretization cache is stored in the `misc_cache` by default, so `spack clean -m` is all that is needed to get rid of it. The location of the concretization cache can be specified independently (with a `url` config option) but we will document and expand on that use later as we integrate it in CI. --------- Signed-off-by: John Parent <[email protected]> Signed-off-by: Todd Gamblin <[email protected]> Co-authored-by: Todd Gamblin <[email protected]>
Spack Concretization Cache
This PR introduces an initial attempt to cache concretization results. I'm sure the design/function can be improved, this is PR is intended to be an initial implementation to be built from.
This PR:
Adds two config options, one to provide a location for the concretization cache on disk (default is
~/.spack/concretization), the other assigns a limit to how many cache entries we allow (default is 1k)Adds the concretization cache to
spack cleanas a standalone clean operation as well as part ofspack clean -aWill prune the cache during each solver run to ensure the cache does not grow unchecked
Makes a number of changes to
SpackSolverSetupand related classes to ensure output of solver setup is deterministicCaching approach:
Cache hashes are the sha256 hash of the string representation of:
.lpasp files involved in a given concretization runSpackSolverSetup.setupsplit by newline, sorted, and rejoined into a single stringThis hash determines if a cache hit has occurred.
Cache entries are stored as a file named by the given hash under a directory with a name based on the first two characters of the cache hash (similar to the source cache).
The contents of a cache entry are
spack.solver.asp.Resultobjects serialized to disk via json, when cache hits are encountered, these result json representations are read from disk and recomposed intospack.solver.asp.Resultobjects, allowing the reproduction of a solver run exactly as if it had just occurred with no loss of context or statistics.Benchmarks
3 specs were selected for bench-marking,
readline,hdf5+mpi build_system=cmakeandparaview+mpi+python.These packages were selected as they represent a spectrum of DAG sizes, and range from simple to including virtuals, and splicing.
Benchmarks were performed on an M1 Mac
readlineWithout a cache:
With a cache:
hdf5+mpi build_system=cmakeWithout a cache:
With a cache:
paraview+mpi+pythonWithout a cache:
With a cache: