Skip to content

variants: Unify metadata dictionaries to index by when#44425

Merged
tgamblin merged 49 commits intodevelopfrom
when-indexed-variants
Sep 17, 2024
Merged

variants: Unify metadata dictionaries to index by when#44425
tgamblin merged 49 commits intodevelopfrom
when-indexed-variants

Conversation

@tgamblin
Copy link
Copy Markdown
Member

@tgamblin tgamblin commented May 28, 2024

Fixes #38302.
Enables #43961.

Continuing the work started in #40326, his changes the structure of Variant metadata on Packages from a single variant definition per name with a list of when specs:

name: (Variant, [when_spec, ...])

to a Variant definition per when_spec per name:

when_spec: { name: Variant }

With this change, everything on a package except versions is keyed by when spec. This:

  1. makes things consistent, in that conditional things are (nearly) all modeled in the same way; and
  2. fixes an issue where we would lose information about multiple variant definitions in a package (see Variant when=: Change of Default for Single/Multi-Variant Transition #38302). We can now have, e.g., different defaults for the same variant in different versions of a package.

Some notes:

  1. This required some pretty deep changes to the solver. Previously, the solver's job was to select value(s) for a single variant definition per name per package. Now, the solver needs to:

    a. Determine which variant definition should be used for a given node, which can depend on the node's version, compiler, target, other variants, etc.
    b. Select valid value(s) for variants for each node based on the selected variant definition.

    When multiple variant definitions are enabled via their when= clause, we will always prefer the last matching definition, by declaration order in packages. This is implemented by adding a precedence to each variant at definition time, and we ensure they are added to the solver in order of precedence.

    This has the effect that variant definitions from derived classes are preferred over definitions from superclasses, and the last definition within the same class sticks. This matches python semantics. Some examples:

    class ROCmPackage(PackageBase):
        variant("amdgpu_target", ..., when="+rocm")
    
    class Hipblas(ROCmPackage):
        variant("amdgpu_target", ...)

    The global variant in hipblas will always supersede the when="+rocm" variant in ROCmPackage. If hipblas's variant was also conditional on +rocm (as it probably should be), we would again filter out the definition from ROCmPackage because it could never be activated. If you instead have:

    class ROCmPackage(PackageBase):
        variant("amdgpu_target", ..., when="+rocm")
    
    class Hipblas(ROCmPackage):
        variant("amdgpu_target", ..., when="+rocm+foo")

    The variant on hipblas will win for +rocm+foo but the one on ROCmPackage will win with rocm~foo.

    So, if we can statically determine if a variant is overridden, we filter it out. This isn't strictly necessary, as the solver can handle many definitions fine, but this reduces the complexity of the problem instance presented to clingo, and simplifies output in spack info for derived packages. e.g., spack info hipblas now shows only one definition of amdgpu_target where before it showed two, one of which would never be used.

  2. Nearly all access to the variants dictionary on packages has been refactored to use the following class methods on PackageBase:

    • variant_names(cls) -> List[str]: get all variant names for a package
    • has_variant(cls, name) -> bool: whether a package has a variant with a given name
    • variant_definitions(cls, name: str) -> List[Tuple[Spec, Variant]]: all definitions of variant name that are possible, along with their when specs.
    • variant_items() -> : iterate over pkg.variants.items(), with impossible variants filtered out.

    Consolidating to these methods seems to simplify the code a lot.

  3. The solver does a lot more validation on variant values at setup time now. In particular, it checks whether a variant value on a spec is valid given the other constraints on that spec. This allowed us to remove the crufty logic in update_variant_validate, which was needed because we previously didn't know after a solve which variant definition had been used. Now, variant values from solves are constructed strictly based on which variant definition was selected -- no more heuristics.

  4. The same prevalidation can now be done in package audits, and you can run:

    spack audit packages --strict-variants
    

    This turns up around 18 different places where a variant specification isn't valid given the conditions on variant definitions in packages. I haven't fixed those here but will open a separate PR to iterate on them. I plan to make strict checking the defaults once all existing package issues are resolved. It's not clear to me that strict checking should be the default for the prevalidation done at solve time.

There are a few other changes here that might be of interest:

  1. The generator variant in CMakePackage is now only defined when build_system=cmake.
  2. spack info has been updated to support the new metadata layout.
  3. split out variant propagation into its own .lp file in the solver code.
  4. Add better typing and clean up code for variant types in variant.py.
  5. Add tests for new variant behavior.

@tgamblin
Copy link
Copy Markdown
Member Author

@ax3l FYI

@tgamblin tgamblin force-pushed the when-indexed-variants branch 3 times, most recently from fbea315 to f5552a9 Compare May 29, 2024 05:34
@tgamblin
Copy link
Copy Markdown
Member Author

@alalazo can you do a concretizer timing on this?

@alalazo
Copy link
Copy Markdown
Member

alalazo commented May 29, 2024

Timed on:

  • Spack: 0.23.0.dev0 (e0f1a5a)
  • Python: 3.11.5
  • Platform: linux-ubuntu20.04-icelake
  • Concretizer: clingo

radiuss.develop.csv
radiuss.pr.csv
radiuss.txt

Concretization is generally slower (15%-20%), mostly due to setup and solve:

radiuss

Note that hiptt lately has a high variability - it seems very sensible to the initial guess used by clingo (and I've seen that in other PRs too). I need to check why that is the case.

Details
Spec solved Total time [develop, secs.] Total time [PR, secs.] Speedup [%]
aluminum 17.686 19.5471 -10.5233
ascent 38.0098 46.2725 -21.7382
axom 36.5582 44.0027 -20.3634
blt 4.09433 4.55592 -11.274
caliper 17.6884 18.3533 -3.75878
care 15.7715 17.4838 -10.857
chai 14.9926 18.1896 -21.3234
conduit 20.1764 19.5516 3.09671
dihydrogen 22.1536 26.1041 -17.8322
flux-core 16.6073 14.5891 12.1524
flux-sched 14.3001 15.8548 -10.8719
flux-security 4.09363 4.24293 -3.6471
glvis 34.3579 42.1594 -22.7068
hiptt 34.511 14.905 56.8109
hydrogen 19.3025 19.1098 0.998145
hypre 21.0155 26.0365 -23.8918
lbann 40.9187 54.123 -32.2693
lvarray 19.0671 22.7016 -19.0612
mfem 36.4991 46.7736 -28.1499
py-hatchet 20.3522 19.9948 1.75594
py-maestrowf 5.1714 5.04867 2.37317
py-merlin 17.294 17.2565 0.217068
py-shroud 4.79966 4.56172 4.95735
raja 16.2235 14.0636 13.3138
samrai 16.4028 15.876 3.21123
scr 16.2863 17.4933 -7.41099
sundials 29.9733 38.0698 -27.0125
umpire 12.7429 15.4972 -21.6139
visit 37.9248 44.327 -16.8813
xbraid 15.8677 15.5064 2.27727
zfp 12.1468 13.1778 -8.48792

@tgamblin tgamblin force-pushed the when-indexed-variants branch from 2d2dd80 to f6ed5a8 Compare May 30, 2024 06:36
@tgamblin
Copy link
Copy Markdown
Member Author

@alalazo ok try it with the latest commit. I avoided inverting the variant map in many places and setup should be much faster.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented May 30, 2024

ok try it with the latest commit. I avoided inverting the variant map in many places and setup should be much faster.

The setup time seems much better. Tried using:

  • Spack: 0.23.0.dev0 (a60ff27)
  • Python: 3.11.5
  • Platform: linux-ubuntu20.04-icelake
  • Concretizer: clingo

radiuss.pr.csv
radiuss.develop.csv
radiuss.txt

radiuss

It seems we have some slight speed-up and some slight slowdown at this point. Changes comes from solve.

Details
Spec solved Total time [develop, secs.] Total time [PR, secs.] Speedup [%]
aluminum 18.0329 18.1712 -0.767287
ascent 36.8934 39.6857 -7.56854
axom 34.0978 37.5483 -10.1194
blt 4.25347 4.26754 -0.330851
caliper 16.4508 16.6535 -1.23197
care 16.2562 16.1845 0.441572
chai 15.9483 17.5706 -10.1721
conduit 19.2016 19.3933 -0.998557
dihydrogen 20.4174 23.6815 -15.9873
flux-core 15.9937 14.2334 11.0064
flux-sched 14.4843 15.398 -6.30815
flux-security 4.27073 4.54498 -6.42163
glvis 33.646 34.2314 -1.7397
hiptt 13.203 15.3946 -16.5996
hydrogen 18.7665 18.8272 -0.323507
hypre 20.5423 22.0512 -7.34581
lbann 41.03 43.454 -5.90791
lvarray 19.379 22.1225 -14.157
mfem 36.1251 39.8627 -10.3463
py-hatchet 18.8887 18.2099 3.59373
py-maestrowf 5.21986 4.9354 5.44971
py-merlin 15.3739 16.1249 -4.88485
py-shroud 4.72149 4.37077 7.42825
raja 16.1667 14.2046 12.1366
samrai 15.2192 14.8014 2.74464
scr 15.1334 15.4539 -2.11825
sundials 28.4785 31.4437 -10.4119
umpire 12.7163 15.128 -18.9654
visit 35.9903 37.399 -3.91409
xbraid 15.3935 15.6943 -1.95362
zfp 12.8059 12.6897 0.907191

@tgamblin tgamblin force-pushed the when-indexed-variants branch from e37be0d to 9392a67 Compare May 31, 2024 03:35
@spackbot-app spackbot-app bot added the gitlab Issues related to gitlab integration label May 31, 2024
@tgamblin
Copy link
Copy Markdown
Member Author

ok @alalazo @becker33 this is passing and ready for review

@tgamblin tgamblin force-pushed the when-indexed-variants branch 4 times, most recently from 230b393 to 803c228 Compare June 5, 2024 19:54
@tgamblin tgamblin force-pushed the when-indexed-variants branch from bc91daf to b70227c Compare September 17, 2024 06:25
Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through all the recent changes. LGTM. I'm going to benchmark concretization again, and post the results.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Sep 17, 2024

I timed this PR against develop at:

  • Spack: 0.23.0.dev0 (673565a)
  • Python: 3.11.5
  • Platform: linux-ubuntu20.04-icelake

radiuss.txt

for two times. First run is:

radiuss.develop.csv
radiuss.pr.csv

Second one is:

radiuss.pr.csv
radiuss.develop.csv

In both cases I see consistent results that this branch is slightly slower on solve. The plot below is from the second run:

radiuss

EDIT: Briefly looking at commits since last review, I don't see changes in this PR that modify concretize.lp. Wondering if this is just "sampling" with a slightly different repository.

@tgamblin
Copy link
Copy Markdown
Member Author

I have some ideas for a heuristic for this but think it's good enough to merge as-is. Thanks!

@tgamblin
Copy link
Copy Markdown
Member Author

Gitlab failures are unrelated to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build-systems commands core PR affects Spack core functionality dependencies directives gitlab Issues related to gitlab integration new-variant new-version refactoring shell-support stand-alone-tests Stand-alone (or smoke) tests for installed packages tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Variant when=: Change of Default for Single/Multi-Variant Transition

3 participants