Solver: Cache Concretization Results by johnwparent · Pull Request #48198 · spack/spack

johnwparent · 2024-12-18T21:40:54Z

Spack Concretization Cache

This PR introduces an initial attempt to cache concretization results. I'm sure the design/function can be improved, this is PR is intended to be an initial implementation to be built from.

This PR:

Adds two config options, one to provide a location for the concretization cache on disk (default is ~/.spack/concretization), the other assigns a limit to how many cache entries we allow (default is 1k)
Adds the concretization cache to spack clean as a standalone clean operation as well as part of spack clean -a
Will prune the cache during each solver run to ensure the cache does not grow unchecked
- Cache entries are pruned based on age of the entry, this is subject to change, input encouraged.
- Cache entries are allowed to accumulate to max concretizer cache entries, and once that limit is reached, the oldest 10% are pruned automatically.
Makes a number of changes to SpackSolverSetup and related classes to ensure output of solver setup is deterministic
- Adds a test to validate this

Caching approach:

Cache hashes are the sha256 hash of the string representation of:

All .lp asp files involved in a given concretization run
The output of SpackSolverSetup.setup split by newline, sorted, and rejoined into a single string

This hash determines if a cache hit has occurred.
Cache entries are stored as a file named by the given hash under a directory with a name based on the first two characters of the cache hash (similar to the source cache).

The contents of a cache entry are spack.solver.asp.Result objects serialized to disk via json, when cache hits are encountered, these result json representations are read from disk and recomposed into spack.solver.asp.Result objects, allowing the reproduction of a solver run exactly as if it had just occurred with no loss of context or statistics.

Benchmarks

3 specs were selected for bench-marking, readline, hdf5+mpi build_system=cmake and paraview+mpi+python.
These packages were selected as they represent a spectrum of DAG sizes, and range from simple to including virtuals, and splicing.
Benchmarks were performed on an M1 Mac

`readline`

Without a cache:

    setup          3.518s
    cache-check     0.041s
    ordering       0.034s
    load           0.648s
    ground         2.362s
    solve          1.174s
    construct_specs     0.329s
    total          8.093s

With a cache:

    setup          3.459s
    cache-check     0.041s
    ordering       0.034s
    total          3.510s

`hdf5+mpi build_system=cmake`

Without a cache:

    setup          3.523s
    cache-check     0.041s
    ordering       0.035s
    load           0.642s
    ground         2.401s
    solve          1.510s
    construct_specs     0.426s
    total          8.557s

With a cache:

    setup          3.523s
    cache-check     0.063s
    ordering       0.034s
    total          3.595s

`paraview+mpi+python`

Without a cache:

    setup          7.983s
    cache-check     0.091s
    ordering       0.076s
    load           1.473s
    ground         5.570s
    solve          5.335s
    construct_specs     1.104s
    total         21.566s

With a cache:

    setup         7.985ms
    cache-check  0.121s
    ordering     0.0898s
    total      8.196s

johnwparent · 2024-12-18T21:50:32Z

cc @tgamblin @becker33

lib/spack/spack/solver/asp.py

johnwparent · 2024-12-19T17:13:18Z

~~Addressing the import check failure will require moving around some additional blocks of code into new modules, pushing up a pass at that now, but very open to alternative suggestions~~

I've had to move significantly more code than planned. The spack.solver module either needs a substantive re-org or the concretization cache should be left out of the spack.caches module, I'll hold on general consensus on how we want to handle that.

haampie · 2024-12-19T17:43:48Z

concretization cache should be left out of the spack.caches module

Go for it

haampie · 2024-12-19T17:57:13Z

I only quickly glanced at this, but what I've seen does not look process safe.

You could do file locking per cache entry / dir, but probably easier is to do atomic adds (write to safe tempfile in target dir and os.rename atomically) and reads (just an open call), minimize tree traversal, and assume that any time you traverse most operations can raise.

Also write in tell don't ask style, any if is_dir test may not be the case anymore the next line.

lib/spack/spack/solver/cache.py

johnwparent · 2024-12-20T17:33:11Z

I only quickly glanced at this, but what I've seen does not look process safe.

You're right, Windows is single-spack only at the moment, so I often forget this, will address, thanks.

johnwparent · 2025-01-17T22:14:13Z

Looks like this still needs to handle splicing a bit better, will investigate further monday

lib/spack/spack/schema/config.py

johnwparent · 2025-02-01T00:23:08Z

Gitlab failures are currently rouge pipeline failures.

lib/spack/spack/schema/config.py

progress Major updates Overhaul

lib/spack/spack/schema/config.py

lib/spack/spack/solver/asp.py

lib/spack/docs/config_yaml.rst

alalazo · 2025-02-28T11:05:54Z

Did anybody try any performance benchmark on this PR? Otherwise I can try to compare with develop on a cold and hot cache.

kwryankrattiger · 2025-02-28T11:20:45Z

@johnwparent had shared some numbers previously.

I agree it would be nice to have a record of that in the PR too.

kwryankrattiger

Approving now, will merge after performance numbers are posted.

johnwparent · 2025-02-28T17:03:43Z

@kwryankrattiger @alalazo benchmarks have been added to the PR description

alalazo · 2025-02-28T17:44:39Z

Do we have a lockfile per cache entry:

$ ls -larth ~/.spack/cache/concretization/12
total 1,1M
-rwxrwxr-x 1 culpo culpo    0 feb 28 18:36 .128f2dc2834e4e295aca1f50be39c5003340cbcaaff2db8e94ee565d92c3bea1.lock
drwxrwxr-x 3 culpo culpo 4,0K feb 28 18:36 ..
-rw-rw-r-- 1 culpo culpo 1,1M feb 28 18:36 128f2dc2834e4e295aca1f50be39c5003340cbcaaff2db8e94ee565d92c3bea1
drwxrwxr-x 2 culpo culpo 4,0K feb 28 18:36 .

?

johnwparent · 2025-02-28T17:54:56Z

Do we have a lockfile per cache entry:

$ ls -larth ~/.spack/cache/concretization/12
total 1,1M
-rwxrwxr-x 1 culpo culpo    0 feb 28 18:36 .128f2dc2834e4e295aca1f50be39c5003340cbcaaff2db8e94ee565d92c3bea1.lock
drwxrwxr-x 3 culpo culpo 4,0K feb 28 18:36 ..
-rw-rw-r-- 1 culpo culpo 1,1M feb 28 18:36 128f2dc2834e4e295aca1f50be39c5003340cbcaaff2db8e94ee565d92c3bea1
drwxrwxr-x 2 culpo culpo 4,0K feb 28 18:36 .

?

Yes, each entry is locked individually as per the behavior of the underlying FileCache used to manage the cache entries.
We get two benefits with this approach, the entry locks are more available than if they were managed by a single, central lock, and using FileCache allows us to switch out the caching backend in the future to allow for remote caches.

alalazo · 2025-02-28T17:57:17Z

I guess we can probably optimize in a later PR to use a single lock file per the entire cache, like we do for store and stages (and lock different bytes based on the entry, iirc).

Concretizer caching for reusing solver results

haampie · 2025-03-11T10:58:01Z

@kwryankrattiger why did you merge this? Did you try it out? Please check with other reviewers before merging.

I would suggest to disable this feature, because getting cache hits reliably for even the simplest spec is impossible:

concretization $ spack -d  spec zlib |& grep -E '(hit|miss)'
==> Concretization cache miss at 18/18607911a1944d36e1a49f3728783e0a957a0a21c390138f22044afc1715ba4e
concretization $ spack -d  spec zlib |& grep -E '(hit|miss)'
==> Concretization cache miss at 28/286c42d84b58dbbc59d6e9197fb31560b1fafe6d1044a69b04f546e4f6e02f6d
concretization $ spack -d  spec zlib |& grep -E '(hit|miss)'
==> Concretization cache hit at 28/286c42d84b58dbbc59d6e9197fb31560b1fafe6d1044a69b04f546e4f6e02f6d
concretization $ spack -d  spec zlib |& grep -E '(hit|miss)'
==> Concretization cache hit at 18/18607911a1944d36e1a49f3728783e0a957a0a21c390138f22044afc1715ba4e
concretization $ spack -d  spec zlib |& grep -E '(hit|miss)'
==> Concretization cache miss at de/de4b86777b02dbba18b370a56ce6d40213ab03d775432dd888909622b9687cff
concretization $ spack -d  spec zlib |& grep -E '(hit|miss)'
==> Concretization cache miss at dc/dcdd9c18c0480908ae5676b4ec026a091f5c999e4d1f1bb366c0dd4e990f3f79
concretization $ spack -d  spec zlib |& grep -E '(hit|miss)'
==> Concretization cache miss at 4f/4f998c4b067724eab7de84ebcc43ba2bb3450648819d75fe0a14b816e7d24a0e
concretization $ spack -d  spec zlib |& grep -E '(hit|miss)'
==> Concretization cache miss at 43/4301e1911df4c87eaab53d3991c22986c8d64948ce6129564aaed2cfad570028

2 of 7 hits, that's really bad.

Also don't understand why the config option was put under config:concretization_cache instead of concretizer:cache. Also why is config:concretization_cache:url the key if it's a path, it doesn't accept arbitrary urls and does not even accept file urls.

kwryankrattiger · 2025-03-11T11:21:21Z

I had tried it out and we had discussed this PR in another meeting with @tgamblin and @alalazo and it was fine to merge.

The issues causing non determinism were found post merge. The issues are known and being resolved by @johnwparent now. It isn't breaking workflows so no one jumped to revert.

kwryankrattiger · 2025-03-11T11:23:27Z

The key is URL is because it is going to accept URLs in the near term for CI sharing of the cache.

The location of the config was chosen to match the other misc cache configs. We don't put mirror misc cache configs in mirrors.yaml for example.

haampie · 2025-03-11T11:50:41Z

We don't put mirror misc cache configs in mirrors.yaml for example.

Not the best example given that misc_cache is a combination of repo metadata, compiler output and build cache indices.

I still think it's better to put it in concretizer given it has various sub-options.

This reverts commit d234df6.

#48198 Introduces the concretization cache, and associated manifest. However, it only added testing for the cache itself, not the manifest or cleanup procedures. This PR: - [x] Adds tests for the manifest and cleanup - [x] Refactors the manifest to update correctly - [x] Refactors the cache to store base32 hashes like the rest of Spack's hashes - [x] gzips the cache entries for a smaller memory footprint `FileCache` class was refactored into an abstract base `Cache` class which provides the general cache read/write transaction interface, and a `FileCache` and `DirectoryCache` child classes, `FileCache` which should be functionally identical to the `FileCache` class pre refactor, deals with caches at the individual file level, and `DirectoryCache`, which is mean to provide a directory level or "bucket" caching level.

Third try to enable concretization caching in `develop`. Past attempts were reverted due to cache management issues and race conditions. See #48198 for the introduction of the concretization cache, manifest, etc. That PR only added testing for the cache itself, not the manifest or cleanup procedures. This PR: - [x] Adds tests for cleanup - [x] Refactors the cache to store base32 hashes like the rest of Spack's hashes - [x] gzips cache entries for a smaller memory footprint - [x] Removes redundant stat calls - [x] Less aggressive/error prone cleanup phase - [x] Adds an entry limit to the cache, removes the size limit - [x] Rewrite pruning to use LRU instead of FIFO - [x] Canonicalizes and strips `clingo` input before hashing, for greater hit rate This PR simplifies the locking around the concretization cache. Entries are stored by the base32 hash of their inputs, and there is a single lock file (as we do in many places in spack) for all such entries. We use byte range locks to manage all the entries. Cleanup has changed to be less expensive overall. There is no more size limit; there is only an entry limit, and specs in the cache are compressed. Specs are ~35k, and the initial limit is 1000 cache entries, or ~35 MB. This limit is customizable in configuration. To check if the cache needs cleanup, we need only list files in the cache directory. To clean up, we stat the files and clean up the least recently used concretizations. We remove half the cache at a time so that we do not pay for cleanup on every concretization when we're close to the cache limit. The concretization cache is stored in the `misc_cache` by default, so `spack clean -m` is all that is needed to get rid of it. The location of the concretization cache can be specified independently (with a `url` config option) but we will document and expand on that use later as we integrate it in CI. --------- Signed-off-by: John Parent <[email protected]> Signed-off-by: Todd Gamblin <[email protected]> Co-authored-by: Todd Gamblin <[email protected]>

johnwparent requested review from alalazo, becker33, scheibelp and tgamblin December 18, 2024 21:40

spackbot-app bot added commands core PR affects Spack core functionality shell-support tests General test capability(ies) labels Dec 18, 2024

johnwparent force-pushed the concretization-caching branch from 33c066d to 193bd00 Compare December 18, 2024 21:49

alalazo reviewed Dec 19, 2024

View reviewed changes

lib/spack/spack/solver/asp.py Show resolved Hide resolved

haampie reviewed Dec 19, 2024

View reviewed changes

lib/spack/spack/solver/cache.py Outdated Show resolved Hide resolved

haampie reviewed Dec 20, 2024

View reviewed changes

lib/spack/spack/solver/cache.py Outdated Show resolved Hide resolved

haampie reviewed Dec 20, 2024

View reviewed changes

lib/spack/spack/solver/cache.py Outdated Show resolved Hide resolved

johnwparent force-pushed the concretization-caching branch from ef537ee to e5003fc Compare January 8, 2025 20:17

johnwparent requested review from alalazo and haampie January 10, 2025 16:37

johnwparent force-pushed the concretization-caching branch 2 times, most recently from 9b4f1a8 to 1a77c8a Compare January 16, 2025 23:53

alalazo reviewed Jan 28, 2025

View reviewed changes

lib/spack/spack/schema/config.py Outdated Show resolved Hide resolved

spackbot-app bot added the utilities label Jan 31, 2025

kwryankrattiger reviewed Feb 25, 2025

View reviewed changes

lib/spack/spack/schema/config.py Outdated Show resolved Hide resolved

johnwparent added 2 commits February 27, 2025 11:29

in-situ

52fe458

progress Major updates Overhaul

more deterministic (but its not enough)

14572ec

kwryankrattiger reviewed Feb 27, 2025

View reviewed changes

lib/spack/spack/schema/config.py Show resolved Hide resolved

lib/spack/spack/solver/asp.py Show resolved Hide resolved

lib/spack/spack/solver/asp.py Outdated Show resolved Hide resolved

lib/spack/spack/solver/asp.py Show resolved Hide resolved

johnwparent added 3 commits February 27, 2025 17:35

Add docs

835faa9

bug fix and improved limit handling

32e4e1c

Update schema for better limit handling

e3ac8a4

spackbot-app bot added the documentation Improvements or additions to documentation label Feb 27, 2025

LoL

a1e77f8

kwryankrattiger reviewed Feb 27, 2025

View reviewed changes

lib/spack/docs/config_yaml.rst Outdated Show resolved Hide resolved

lib/spack/docs/config_yaml.rst Outdated Show resolved Hide resolved

lib/spack/docs/config_yaml.rst Outdated Show resolved Hide resolved

lib/spack/docs/config_yaml.rst Outdated Show resolved Hide resolved

Refactor based on doc review

995689d

kwryankrattiger approved these changes Feb 28, 2025

View reviewed changes

kwryankrattiger merged commit d234df6 into spack:develop Feb 28, 2025
36 of 37 checks passed

mtaillefumier pushed a commit to mtaillefumier/spack that referenced this pull request Mar 2, 2025

Solver: Cache Concretization Results (spack#48198)

df5fdef

Concretizer caching for reusing solver results

white238 pushed a commit that referenced this pull request Mar 3, 2025

Solver: Cache Concretization Results (#48198)

6c98f1c

Concretizer caching for reusing solver results

greenc-FNAL mentioned this pull request Mar 4, 2025

recipes/jsonnet CMake FNALssi/spack#521

Closed

tmadlener mentioned this pull request Mar 6, 2025

Concretization error for environments with inconsistent variant settings #49340

Open

3 tasks

haampie added a commit that referenced this pull request Mar 19, 2025

Revert "Solver: Cache Concretization Results (#48198)"

8bc0b54

This reverts commit d234df6.

haampie added a commit that referenced this pull request Mar 19, 2025

Revert "Solver: Cache Concretization Results (#48198)"

a660d78

This reverts commit d234df6.

johnwparent mentioned this pull request Mar 19, 2025

Concretization Cache: Refactor + manifest testing #49589

Merged

4 tasks

johnwparent mentioned this pull request Jun 18, 2025

Enable concretization caching (redux) #50905

Merged

6 tasks

Conversation

johnwparent commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Spack Concretization Cache

Caching approach:

Benchmarks

readline

hdf5+mpi build_system=cmake

paraview+mpi+python

Uh oh!

johnwparent commented Dec 18, 2024

Uh oh!

Uh oh!

johnwparent commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haampie commented Dec 19, 2024

Uh oh!

haampie commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnwparent commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnwparent commented Jan 17, 2025

Uh oh!

Uh oh!

johnwparent commented Feb 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alalazo commented Feb 28, 2025

Uh oh!

kwryankrattiger commented Feb 28, 2025

Uh oh!

kwryankrattiger left a comment

Choose a reason for hiding this comment

Uh oh!

johnwparent commented Feb 28, 2025

Uh oh!

alalazo commented Feb 28, 2025

Uh oh!

johnwparent commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalazo commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

haampie commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwryankrattiger commented Mar 11, 2025

Uh oh!

kwryankrattiger commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haampie commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnwparent commented Dec 18, 2024 •

edited

Loading

`readline`

`hdf5+mpi build_system=cmake`

`paraview+mpi+python`

johnwparent commented Dec 19, 2024 •

edited

Loading

haampie commented Dec 19, 2024 •

edited

Loading

johnwparent commented Dec 20, 2024 •

edited

Loading

johnwparent commented Feb 28, 2025 •

edited

Loading

alalazo commented Feb 28, 2025 •

edited

Loading

haampie commented Mar 11, 2025 •

edited

Loading

kwryankrattiger commented Mar 11, 2025 •

edited

Loading

haampie commented Mar 11, 2025 •

edited

Loading