Experimental ASP-based concretizer by alalazo · Pull Request #19501 · spack/spack

alalazo · 2020-10-23T20:21:15Z

This adds an experimental concretizer to Spack that makes use of Answer Set Programming. The new solver uses clingo, the combined grounder (gringo) and solver (clasp) from Potassco, instead of attempting to do the entire solve in Python as we did before.

Problem

The current concretizer is not exhaustive and is not guaranteed to find a solution if one exists. It is greedy, does not do backtracking, and as a result the errors it gives can be confusing. In many cases it can resolve a constraint too early -- it assigns final values to nodes as it builds the graph, and it may assign a version, variant, or other value to a node without considering the entire problem. This can result in the concretizer finding conflicts where solutions exist.

Extending the current concretizer is difficult. There are many moving pieces, and rewriting any of them can be a lot of work. Making major changes to concretization semantics is difficult, and increasingly we're unable to add features that users need.

The concretization problem that Spack solves is NP-complete (as is regular old dependency resolution with just versions), so we have decided to start using more powerful solvers to address it. Using Clingo is a step in this direction.

How it works

Answer Set programming looks like Prolog but has the advantage that, rather than trying to be a full programming language, it reduces logic problems to CDCL SAT with optimization. ASP has the nice property that solves will always terminate (unlike Prolog programs), and it follows the stable model semantics, which makes it easier to model some of the default-driven behavior in Spack. The constraints in Spack are essentially SAT clauses, and our preferences (like packages.yaml) add optimization weights to this. We don't just want a solution -- we want the best one, with preferred variants, compilers, etc.

The prior concretizer iteratively expanded nodes in the graph and attempted to unify them greedily as they were expanded. It used a fixed-point algorithm; iterating and expanding optional dependencies, replacing virtuals with providers, setting variants, versions, and compilers, etc. until there were no more changes to the DAG.

The new concretizer is less dynamic. It works like this:

Look at the input specs and read in all possible dependencies

Translate constraints in Spack's package DSL to rules and facts in ASP. e.g., zlib has some facts like this:

version_declared("zlib","1.2.11",0).
version_declared("zlib","1.2.8",1).
version_declared("zlib","1.2.3",2).

variant("zlib","optimize").
variant_single_value("zlib","optimize").
variant_default_value_from_package_py("zlib","optimize","True").
variant_possible_value("zlib","optimize","False").
variant_possible_value("zlib","optimize","True").

These come directly from version() and variant() directives in the zlib/package.py file.

Combine these facts with the logic program in lib/spack/spack/solver/concretize.lp. This defines all the rules and optimization criteria used in concretization in Spack.
Send the whole program off to Clingo
Read back one the optimal stable model found by the solver, and build a concrete Spec from it.

One nice thing about this is that it's declarative; we can add extra constraints and features more easily now, without having to reengineer the entire algorithm. Another is that we can solve much more complex problems than we could before. Finally, it will be easier going forward to implement many of the features we have planned (compilers as dependencies, optimizing for installed packages, etc.) using a real solver.

Error reporting

We are using Clingo's support for finding unsatisfiable cores to generate error messages. If something fails to concretize, the solver will tell you why by reporting a set of facts and rules that cannot be resolved together. This can be very useful, as often it points directly to conflicts and can be used, with a little staring, to debug issues. In other cases the core can still be quite large, and thus hard to interpret. Cores are not guaranteed to be minimal by Clingo. Improving these messages and converting them to understandable sentences is on our list of follow-on activities.

Alternatives

There are more details on alternatives in this FOSDEM talk. So far, Clingo works well, but we are also investigating Z3, an SMT solver. Z3 is more widely used and has a richer API, but its input language is in some sense more primitive, in that it doesn't accept or automatically ground first-order logic like Clingo. We are investigating which solver will be easiest to include (vendor) with Spack so that we can use the new concretizer everywhere.

Using the new concretizer

There is a new section in config.yaml:

  # The concretization algorithm to use in Spack. Options are:
  #
  #   'original': Spack's original greedy, fixed-point concretizer. This
  #       algorithm can make decisions too early and will not backtrack
  #       sufficiently for many specs.
  #
  #   'clingo': Uses a logic solver under the hood to solve DAGs with full
  #       backtracking and optimization for user preferences.
  #
  # 'clingo' currently requires the clingo ASP solver to be installed and
  # built with python bindings. 'original' is built in.
  concretizer: original

You'll need to set concretizer: to clingo to use clingo everywhere.
You will also need to spack install clingo@master and get clingo on your PATH to use the new concretizer.

Even if the new concretizer is not enabled, you can use the spack solve command as you would normally use spack spec -- it will try to use the new concretizer. spack solve will eventually go away but for now we have two commands.

Bugs fixed

fixes #267
fixes #303
fixes #1781
fixes #2310
fixes #2632
fixes #3628
fixes #4635
fixes #5651
fixes #7339
fixes #8082
fixes #9025
fixes #9414
fixes #9696
fixes #9744
fixes #9753
fixes #9911
fixes #12431
fixes #19764

adamjstewart · 2020-10-23T20:57:51Z

Will spack solve be merged into spack spec? I guess I don't see the need for both if there is an option in config.yaml to choose which algo to use.

lib/spack/spack/solver/asp.py

tgamblin · 2020-10-23T21:26:25Z

Will spack solve be merged into spack spec? I guess I don't see the need for both if there is an option in config.yaml to choose which algo to use.

Eventually we'll replace the old solver with the new one, but we at least need the old one to bootstrap an install of clingo at this point. The plan is to put this into 0.16.0 and look at other options like Z3 for a full switchover in 0.17.0.

alalazo · 2020-10-27T22:46:00Z

lib/spack/spack/solver/asp.py

+        for i, cspec in enumerate(reversed(matches)):
            self.gen.fact(fn.node_compiler_preference(
-                pkg.name, cspec.name, cspec.version, i))
+                pkg.name, cspec.name, cspec.version, -i * 100
+            ))


@tgamblin This is a bit of magic I had to do to make the concretizer pass a test on compiler preferences. The gist of it is that with multiple compilers we need to ensure that the weight of a preference for a single package is "negative enough" to be selected for minimization. Not sure this is the best formula though. Maybe:

- (i + 1) * 100

would be better as a weight, or maybe we should just change the target of minimization.

alalazo · 2020-10-28T14:41:37Z

Dependencies that are conditional on another dependency should now be treated correctly. I added a simple test to check for that case as well.

aweits · 2020-10-28T16:25:13Z

Dependencies that are conditional on another dependency should now be treated correctly. I added a simple test to check for that case as well.

Just repulled and am testing a few things. Have we considered something like this BTW?

diff --git a/lib/spack/spack/solver/asp.py b/lib/spack/spack/solver/asp.py
index 691b0ac..c2f8c91 100644
--- a/lib/spack/spack/solver/asp.py
+++ b/lib/spack/spack/solver/asp.py
@@ -602,6 +602,7 @@ def solve(
         # Initialize the control object for the solver
         self.control = clingo.Control()
         self.control.configuration.solve.models = nmodels
+        self.control.configuration.solve.parallel_mode = spack.config.get('config:build_jobs')
         self.control.configuration.configuration = 'tweety'
         self.control.configuration.solver.opt_strategy = "bb,dec"

alalazo · 2020-10-28T16:50:51Z

@aweits No, we didn't discuss any optimization yet. I think it's worth adding even though right now the bottleneck is setting up the problem:

$ spack solve --timers hdf5~mpi
Time:
    setup:         31.0345
    load:          0.0040
    ground:        4.4527
    solve:         14.9748
Total: 50.7482

With 4 cores at it with parallel_mode:

$ spack solve --timers hdf5~mpi
Time:
    setup:         29.2326
    load:          0.0039
    ground:        4.2661
    solve:         5.8168
Total: 39.5874

My guess is that #19539 and package inspection are still the biggest targets to improve speed of the ASP solver.

aweits · 2020-10-28T17:02:47Z

@aweits No, we didn't discuss any optimization yet. I think it's worth adding even though right now the bottleneck is setting up the problem:

I see this:

single-thread:

[aweits@skl-a-00 spack]$ spack solve --timers py-tensorflow
Time:
    setup:         21.1258
    load:          0.0015
    ground:        3.5706
    solve:         339.2976
Total: 364.6276

16 threads:

[aweits@skl-a-00 spack]$ spack solve --timers py-tensorflow
Time:
    setup:         20.8383
    load:          0.0015
    ground:        3.6184
    solve:         28.3805
Total: 53.5030

lib/spack/spack/solver/concretize.lp

alalazo · 2020-10-29T07:04:28Z

@bvanessen Thanks for the report. This seems to point to the following conflicts in the opencv package:

# TODO For Cuda >= 10, make sure 'dynlink_nvcuvid.h' or 'nvcuvid.h'                                                                           
# exists, otherwise build will fail                                                                                                           
# See https://github.com/opencv/opencv_contrib/issues/1786                                                                                    
conflicts('cuda@10:', when='+cudacodec')
conflicts('cuda', when='~contrib', msg='cuda support requires +contrib')

At a first glance both conflicts seem to be wrong, as they should start with a ^ sigil:

# TODO For Cuda >= 10, make sure 'dynlink_nvcuvid.h' or 'nvcuvid.h'                                                                           
# exists, otherwise build will fail                                                                                                           
# See https://github.com/opencv/opencv_contrib/issues/1786                                                                                    
conflicts('^cuda@10:', when='+cudacodec')
conflicts('^cuda', when='~contrib', msg='cuda support requires +contrib')

This poses the question of what to do in cases where we have a typo in one of the conflict directive. Should we error out like now, maybe with a better error message, or should we disregard the conflict and eventually print a warning to screen?

As reported, conflicts with compiler ranges were not treated correctly. This commit adds tests to verify the expected behavior for the new concretizer. The new rules to enforce a correct behavior involve: - Adding a rule to prefer the compiler selected for the root package, if no other preference is set - Give a strong negative weight to compiler preferences expressed in packages.yaml - Maximize on compiler AND compiler version match

TODO: Investigate the need to remove memoization on Spec.patches (infinite recursion when testing `__contains__`)

If the version list passed to one_of_iff is empty, it still generates a rule like this: node_compiler_version_satisfies("fujitsu-mpi", "arm", ":") :- 1 { } 1. 1 { } 1 :- node_compiler_version_satisfies("fujitsu-mpi", "arm", ":"). The cardinality rules on the right and left above are never satisfiale, and these rules do nothing. - [x] Skip generating any rules at all for empty version lists.

…e bodies. `node_compiler_hard()` means that something explicitly asked for a node's compiler to be set -- i.e., it's not inherited, it's required. We're generating this in spec_clauses even for specs in rule bodies, which results in conditions like this for optional dependencies: In py-torch/package.py: depends_on('llvm-openmp', when='%apple-clang +openmp') In the generated ASP: declared_dependency("py-torch","llvm-openmp","build") :- node("py-torch"), variant_value("py-torch","openmp","True"), node_compiler("py-torch","apple-clang"), node_compiler_hard("py-torch","apple-clang"), node_compiler_version_satisfies("py-torch","apple-clang",":"). The `node_compiler_hard` there means we would have to *explicitly* set py-torch's compiler to trigger the llvm-openmp dependency, rather than just letting it be set by preferences. This is wrong; the dependency should be there regardless of how the compiler was set. - [x] remove fn.node_compiler_hard() call from spec_clauses when generating rule body clauses.

…ernal This commit introduces a new rule: real_node(Package) :- not external(Package), node(Package). that permits to distinguish between an external node and a real node that shouldn't trim dependency. It solves the case of concretizing ninja with an external Python.

…variant If a the default of a multi-valued variant is set to multiple values either in package.py or in packages.yaml we need to ensure that all the values are present in the concretized spec. Since each default value has a weight of 0 and the variant value is set implicitly by the concretizer we need to add a rule to maximize on the number of default values that are used.

This commit address the case of concretizing a root spec with a transitive conditional dependency on a virtual package, provided by an external. Before these modifications default variant values for the dependency bringing in the virtual package were not respected, and the external package providing the virtual was added to the DAG. The issue stems from two facts: - Selecting a provider has higher precedence than selecting default variants - To ensure that an external is preferred, we used a negative weight To solve it we shift all the providers weight so that: - External providers have a weight of 0 - Non external provider have a weight of 10 or more Using a weight of zero for external providers is such that having an external provider, if present, or not having a provider at all has the same effect on the higher priority minimization. Also fixed a few minor bugs in concretize.lp, that were causing spurious entries in the final answer set. Cleaned concretize.lp from leftover rules.

tgamblin

Ok, we've stomped out most of the lingering bugs here, tests are passing, and thanks to @aweits and everyone else for finding bugs and making some major speed improvements.
Thanks to @alalazo for pushing this over the finish line and stomping out all the corner cases!

This is ready to merge as experimental, and getting it in develop will enable more people to take a stab at using it.

We'll do docs in a follow-up PR so that we don't have to keep waiting on CI here.

Other things that really need doing before this is the default concretizer:

Better error reporting (we need to add error messages to facts and rules and print them in a more understandable way)
Figure out how best to vendor a solver. We may use Z3 in the end for this, or we may figure out how to vendor Clingo.
More in-depth performance analysis, and a performance comparison. @aweits has played with some tactics here, but there are a few other things we can do to make this better.

Notably, when I run r-condop through the Clingo Python API, it takes 24 seconds, and performance stats look like this:

Time:
    setup:         6.6440
    load:          0.0037
    ground:        7.8134
    solve:         8.2264
Total: 25.3642

If, however, I run:

$ spack solve --show asp r-condop > r-condop.lp
...
$ clingo ./r-condop.lp
...
SATISFIABLE
Models       : 1
Calls        : 1
Time         : 1.374s (Solving: 0.01s 1st Model: 0.00s Unsat: 0.01s)
CPU Time     : 1.362s

It spits out a model in 1.5s -- so there's a lot of room to improve here.

adamjstewart added the concretization label Oct 23, 2020

alalazo requested a review from tgamblin October 23, 2020 21:06

tgamblin changed the title ~~Clingo based solver~~ Experimental ASP-based concretizer Oct 23, 2020

tgamblin reviewed Oct 23, 2020

View reviewed changes

lib/spack/spack/solver/asp.py Outdated Show resolved Hide resolved

alalazo mentioned this pull request Oct 24, 2020

py-coverage: added v5.3 #19509

Merged

alalazo force-pushed the features/solver-rebased branch 3 times, most recently from 6a7da1c to 9585d74 Compare October 27, 2020 07:17

alalazo marked this pull request as ready for review October 27, 2020 08:24

This comment has been minimized.

Sign in to view

alalazo commented Oct 27, 2020

View reviewed changes

This comment has been minimized.

Sign in to view

alalazo force-pushed the features/solver-rebased branch from 391122b to 7bf3a49 Compare October 28, 2020 19:52

becker33 reviewed Oct 28, 2020

View reviewed changes

lib/spack/spack/solver/concretize.lp Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

alalazo and others added 19 commits November 17, 2020 14:11

tests: verify to handle dependencies conditional on other dependencies

2e9d315

concretizer: handle dependencies conditional on other dependencies

cc72001

Add unit tests for dependencies being patched by parent

f219748

Fixed branch after rebase (port to archspec)

b8723a4

TODO: Investigate the need to remove memoization on Spec.patches (infinite recursion when testing `__contains__`)

install: one less concretization when installing from file

e581fea

concretizer: set target preference for inheritance from root

11cf74a

Reworked optimization rules

b56c2fa

Changed clingo options

a54bc25

Added unit tests to for regressions on open concretizer bugs

04ace54

Fixup for target preferences

d073d2c

Don't fail if MV variants have a tuple as default value

812150a

External packages have a consistent hash across different concretizers

d24d742

concretizer: add a rule to avoid cycles in the graph of dependencies

62ecf53

alalazo force-pushed the features/solver-rebased branch from bb3782b to 623ed6a Compare November 17, 2020 13:13

tgamblin approved these changes Nov 17, 2020

View reviewed changes

tgamblin merged commit 7ffad27 into develop Nov 17, 2020

tgamblin deleted the features/solver-rebased branch November 17, 2020 18:04

alalazo mentioned this pull request Nov 19, 2020

hip cannot be concretized due to an invalid variant value error #19951

Closed

3 tasks

This was referenced Dec 2, 2020

add patches to bootstrap z3 with python 2.6 #20219

Closed

consider z3 implementation of ASP concretizer #20221

Closed

cosmicexplorer mentioned this pull request Dec 22, 2020

use gringo from clingo to find ground expressions for z3 rules #20501

Open

cosmicexplorer mentioned this pull request Dec 29, 2020

implement concretizer features with z3 #20607

Open

25 tasks

sjpb mentioned this pull request Mar 4, 2021

concretizer fails to install gromacs #14339

Closed

scheibelp mentioned this pull request Mar 23, 2023

retry: bugfix: package requirements with git commits (#35057) #36347

Merged

2 tasks

Conversation

alalazo commented Oct 23, 2020 • edited by tgamblin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

How it works

Error reporting

Alternatives

Using the new concretizer

Bugs fixed

Uh oh!

adamjstewart commented Oct 23, 2020

Uh oh!

Uh oh!

tgamblin commented Oct 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

alalazo Oct 27, 2020

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

alalazo commented Oct 28, 2020

Uh oh!

aweits commented Oct 28, 2020

Uh oh!

This comment has been minimized.

alalazo commented Oct 28, 2020

Uh oh!

aweits commented Oct 28, 2020

Uh oh!

This comment has been minimized.

This comment has been minimized.

Uh oh!

This comment has been minimized.

alalazo commented Oct 29, 2020

Uh oh!

tgamblin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

alalazo commented Oct 23, 2020 •

edited by tgamblin

Loading

tgamblin commented Oct 23, 2020 •

edited

Loading