Experimental ASP-based concretizer#19501
Conversation
|
Will |
Eventually we'll replace the old solver with the new one, but we at least need the old one to bootstrap an install of |
6a7da1c to
9585d74
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| for i, cspec in enumerate(reversed(matches)): | ||
| self.gen.fact(fn.node_compiler_preference( | ||
| pkg.name, cspec.name, cspec.version, i)) | ||
| pkg.name, cspec.name, cspec.version, -i * 100 | ||
| )) |
There was a problem hiding this comment.
@tgamblin This is a bit of magic I had to do to make the concretizer pass a test on compiler preferences. The gist of it is that with multiple compilers we need to ensure that the weight of a preference for a single package is "negative enough" to be selected for minimization. Not sure this is the best formula though. Maybe:
- (i + 1) * 100
would be better as a weight, or maybe we should just change the target of minimization.
This comment has been minimized.
This comment has been minimized.
|
Dependencies that are conditional on another dependency should now be treated correctly. I added a simple test to check for that case as well. |
Just repulled and am testing a few things. Have we considered something like this BTW? |
This comment has been minimized.
This comment has been minimized.
|
@aweits No, we didn't discuss any optimization yet. I think it's worth adding even though right now the bottleneck is setting up the problem: $ spack solve --timers hdf5~mpi
Time:
setup: 31.0345
load: 0.0040
ground: 4.4527
solve: 14.9748
Total: 50.7482With 4 cores at it with $ spack solve --timers hdf5~mpi
Time:
setup: 29.2326
load: 0.0039
ground: 4.2661
solve: 5.8168
Total: 39.5874My guess is that #19539 and package inspection are still the biggest targets to improve speed of the ASP solver. |
I see this: single-thread: 16 threads: |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
391122b to
7bf3a49
Compare
This comment has been minimized.
This comment has been minimized.
|
@bvanessen Thanks for the report. This seems to point to the following conflicts in the At a first glance both conflicts seem to be wrong, as they should start with a This poses the question of what to do in cases where we have a typo in one of the conflict directive. Should we error out like now, maybe with a better error message, or should we disregard the conflict and eventually print a warning to screen? |
As reported, conflicts with compiler ranges were not treated correctly. This commit adds tests to verify the expected behavior for the new concretizer. The new rules to enforce a correct behavior involve: - Adding a rule to prefer the compiler selected for the root package, if no other preference is set - Give a strong negative weight to compiler preferences expressed in packages.yaml - Maximize on compiler AND compiler version match
TODO: Investigate the need to remove memoization on Spec.patches (infinite recursion when testing `__contains__`)
If the version list passed to one_of_iff is empty, it still generates a
rule like this:
node_compiler_version_satisfies("fujitsu-mpi", "arm", ":") :- 1 { } 1.
1 { } 1 :- node_compiler_version_satisfies("fujitsu-mpi", "arm", ":").
The cardinality rules on the right and left above are never
satisfiale, and these rules do nothing.
- [x] Skip generating any rules at all for empty version lists.
…e bodies.
`node_compiler_hard()` means that something explicitly asked for a node's
compiler to be set -- i.e., it's not inherited, it's required. We're
generating this in spec_clauses even for specs in rule bodies, which
results in conditions like this for optional dependencies:
In py-torch/package.py:
depends_on('llvm-openmp', when='%apple-clang +openmp')
In the generated ASP:
declared_dependency("py-torch","llvm-openmp","build")
:- node("py-torch"),
variant_value("py-torch","openmp","True"),
node_compiler("py-torch","apple-clang"),
node_compiler_hard("py-torch","apple-clang"),
node_compiler_version_satisfies("py-torch","apple-clang",":").
The `node_compiler_hard` there means we would have to *explicitly* set
py-torch's compiler to trigger the llvm-openmp dependency, rather than
just letting it be set by preferences. This is wrong; the dependency
should be there regardless of how the compiler was set.
- [x] remove fn.node_compiler_hard() call from spec_clauses when
generating rule body clauses.
…ernal This commit introduces a new rule: real_node(Package) :- not external(Package), node(Package). that permits to distinguish between an external node and a real node that shouldn't trim dependency. It solves the case of concretizing ninja with an external Python.
…variant If a the default of a multi-valued variant is set to multiple values either in package.py or in packages.yaml we need to ensure that all the values are present in the concretized spec. Since each default value has a weight of 0 and the variant value is set implicitly by the concretizer we need to add a rule to maximize on the number of default values that are used.
This commit address the case of concretizing a root spec with a transitive conditional dependency on a virtual package, provided by an external. Before these modifications default variant values for the dependency bringing in the virtual package were not respected, and the external package providing the virtual was added to the DAG. The issue stems from two facts: - Selecting a provider has higher precedence than selecting default variants - To ensure that an external is preferred, we used a negative weight To solve it we shift all the providers weight so that: - External providers have a weight of 0 - Non external provider have a weight of 10 or more Using a weight of zero for external providers is such that having an external provider, if present, or not having a provider at all has the same effect on the higher priority minimization. Also fixed a few minor bugs in concretize.lp, that were causing spurious entries in the final answer set. Cleaned concretize.lp from leftover rules.
bb3782b to
623ed6a
Compare
tgamblin
left a comment
There was a problem hiding this comment.
Ok, we've stomped out most of the lingering bugs here, tests are passing, and thanks to @aweits and everyone else for finding bugs and making some major speed improvements.
Thanks to @alalazo for pushing this over the finish line and stomping out all the corner cases!
This is ready to merge as experimental, and getting it in develop will enable more people to take a stab at using it.
We'll do docs in a follow-up PR so that we don't have to keep waiting on CI here.
Other things that really need doing before this is the default concretizer:
- Better error reporting (we need to add error messages to facts and rules and print them in a more understandable way)
- Figure out how best to vendor a solver. We may use Z3 in the end for this, or we may figure out how to vendor Clingo.
- More in-depth performance analysis, and a performance comparison. @aweits has played with some tactics here, but there are a few other things we can do to make this better.
Notably, when I run r-condop through the Clingo Python API, it takes 24 seconds, and performance stats look like this:
Time:
setup: 6.6440
load: 0.0037
ground: 7.8134
solve: 8.2264
Total: 25.3642
If, however, I run:
$ spack solve --show asp r-condop > r-condop.lp
...
$ clingo ./r-condop.lp
...
SATISFIABLE
Models : 1
Calls : 1
Time : 1.374s (Solving: 0.01s 1st Model: 0.00s Unsat: 0.01s)
CPU Time : 1.362s
It spits out a model in 1.5s -- so there's a lot of room to improve here.
This adds an experimental concretizer to Spack that makes use of Answer Set Programming. The new solver uses
clingo, the combined grounder (gringo) and solver (clasp) from Potassco, instead of attempting to do the entire solve in Python as we did before.Problem
The current concretizer is not exhaustive and is not guaranteed to find a solution if one exists. It is greedy, does not do backtracking, and as a result the errors it gives can be confusing. In many cases it can resolve a constraint too early -- it assigns final values to nodes as it builds the graph, and it may assign a version, variant, or other value to a node without considering the entire problem. This can result in the concretizer finding conflicts where solutions exist.
Extending the current concretizer is difficult. There are many moving pieces, and rewriting any of them can be a lot of work. Making major changes to concretization semantics is difficult, and increasingly we're unable to add features that users need.
The concretization problem that Spack solves is NP-complete (as is regular old dependency resolution with just versions), so we have decided to start using more powerful solvers to address it. Using Clingo is a step in this direction.
How it works
Answer Set programming looks like Prolog but has the advantage that, rather than trying to be a full programming language, it reduces logic problems to CDCL SAT with optimization. ASP has the nice property that solves will always terminate (unlike Prolog programs), and it follows the stable model semantics, which makes it easier to model some of the default-driven behavior in Spack. The constraints in Spack are essentially SAT clauses, and our preferences (like
packages.yaml) add optimization weights to this. We don't just want a solution -- we want the best one, with preferred variants, compilers, etc.The prior concretizer iteratively expanded nodes in the graph and attempted to unify them greedily as they were expanded. It used a fixed-point algorithm; iterating and expanding optional dependencies, replacing virtuals with providers, setting variants, versions, and compilers, etc. until there were no more changes to the DAG.
The new concretizer is less dynamic. It works like this:
zlibhas some facts like this:version()andvariant()directives in thezlib/package.pyfile.lib/spack/spack/solver/concretize.lp. This defines all the rules and optimization criteria used in concretization in Spack.Specfrom it.One nice thing about this is that it's declarative; we can add extra constraints and features more easily now, without having to reengineer the entire algorithm. Another is that we can solve much more complex problems than we could before. Finally, it will be easier going forward to implement many of the features we have planned (compilers as dependencies, optimizing for installed packages, etc.) using a real solver.
Error reporting
We are using Clingo's support for finding unsatisfiable cores to generate error messages. If something fails to concretize, the solver will tell you why by reporting a set of facts and rules that cannot be resolved together. This can be very useful, as often it points directly to conflicts and can be used, with a little staring, to debug issues. In other cases the core can still be quite large, and thus hard to interpret. Cores are not guaranteed to be minimal by Clingo. Improving these messages and converting them to understandable sentences is on our list of follow-on activities.
Alternatives
There are more details on alternatives in this FOSDEM talk. So far, Clingo works well, but we are also investigating Z3, an SMT solver. Z3 is more widely used and has a richer API, but its input language is in some sense more primitive, in that it doesn't accept or automatically ground first-order logic like Clingo. We are investigating which solver will be easiest to include (vendor) with Spack so that we can use the new concretizer everywhere.
Using the new concretizer
There is a new section in
config.yaml:concretizer:toclingoto useclingoeverywhere.spack install clingo@masterand getclingoon yourPATHto use the new concretizer.Even if the new concretizer is not enabled, you can use the
spack solvecommand as you would normally usespack spec-- it will try to use the new concretizer.spack solvewill eventually go away but for now we have two commands.Bugs fixed
fixes #267
fixes #303
fixes #1781
fixes #2310
fixes #2632
fixes #3628
fixes #4635
fixes #5651
fixes #7339
fixes #8082
fixes #9025
fixes #9414
fixes #9696
fixes #9744
fixes #9753
fixes #9911
fixes #12431
fixes #19764