Skip to content

Speed-up the ASP based solver#21289

Closed
alalazo wants to merge 6 commits intospack:developfrom
alalazo:concretizer/optimize_singleshot
Closed

Speed-up the ASP based solver#21289
alalazo wants to merge 6 commits intospack:developfrom
alalazo:concretizer/optimize_singleshot

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Jan 26, 2021

depends on #21148 (it incorporates the same small change)

This PR speeds-up the ASP based solver most of the times by:

  1. Trying a first solve with a reduced number of facts (only preferred targets, compilers and providers)
  2. Reverting back to solve using all the facts we know if 1 is unsat

I'll post data on the speed-up in the comments below, but most of the time it's roughly a 30%-50% cut of the wall-time compared to develop.

Being based on an heuristic reduction of the search space, I found during development that sometimes 1. may yield different results than 2. when soft-preferences are involved. One thing we should decide during review is therefore if we want this strategy to be:

  • hard-wired like it is now (maybe skipped in potentially problematic cases)
  • opt-in (by default use all the facts)
  • opt-out (by default use the fast strategy, but skip it on demand)

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Jan 26, 2021

On this system:

  • Spack: 0.16.0-974-c647469a38
  • Python: 3.8.5
  • Platform: darwin-bigsur-cannonlake
  • Concretizer: clingo

using this PR I obtain:

% spack solve --timers -l hdf5
Time:
    setup:         2.2555
    facts [fast]:  0.2137
    load [fast]:   0.0030
    ground [fast]: 0.3860
    solve [fast]:  0.4136
Total: 3.3483

==> Best of 0 answers.
==> Optimization: [0, 0, 0, 30, -2, 1, 0, 0, -14, 0]
hkqq3qs  [email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=darwin-bigsur-cannonlake
x7uoimm      ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=darwin-bigsur-cannonlake
rg2lvuz          ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml~pci+shared arch=darwin-bigsur-cannonlake
cn6al67              ^[email protected]%[email protected]~python arch=darwin-bigsur-cannonlake
sqfxpje                  ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
myhb4f4                  ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
klk4s54                  ^[email protected]%[email protected]~pic arch=darwin-bigsur-cannonlake
xj67msn                  ^[email protected]%[email protected]+optimize+pic+shared arch=darwin-bigsur-cannonlake
oaha5wg          ^[email protected]%[email protected]+openssl arch=darwin-bigsur-cannonlake
dkwz5wf              ^[email protected]%[email protected]+systemcerts arch=darwin-bigsur-cannonlake
dhggqbx                  ^[email protected]%[email protected]+cpanm+shared+threads patches=517afe8ca1cb12f798f20b21583d91463fe3f4fa9244c6c8e054a92c8135da8f arch=darwin-bigsur-cannonlake
fyidgzf                      ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
czqittu                      ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
yemqcvc                          ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
6aulg5u                              ^[email protected]%[email protected]~symlinks+termlib arch=darwin-bigsur-cannonlake

while using develop on the same spec:

% spack solve --timers -l hdf5
Time:
    setup:         2.6316
    load:          0.0032
    ground:        0.9854
    solve:         2.6441
Total: 6.3340

==> Best of 0 answers.
==> Optimization: [0, 0, 0, 0, 30, -2, 1, 0, 0, -14, 0]
hkqq3qs  [email protected]%[email protected]~cxx~debug~fortran~hl~java+mpi+pic+shared~szip~threadsafe api=none arch=darwin-bigsur-cannonlake
x7uoimm      ^[email protected]%[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none arch=darwin-bigsur-cannonlake
rg2lvuz          ^[email protected]%[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml~pci+shared arch=darwin-bigsur-cannonlake
cn6al67              ^[email protected]%[email protected]~python arch=darwin-bigsur-cannonlake
sqfxpje                  ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
myhb4f4                  ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
klk4s54                  ^[email protected]%[email protected]~pic arch=darwin-bigsur-cannonlake
xj67msn                  ^[email protected]%[email protected]+optimize+pic+shared arch=darwin-bigsur-cannonlake
oaha5wg          ^[email protected]%[email protected]+openssl arch=darwin-bigsur-cannonlake
dkwz5wf              ^[email protected]%[email protected]+systemcerts arch=darwin-bigsur-cannonlake
dhggqbx                  ^[email protected]%[email protected]+cpanm+shared+threads patches=517afe8ca1cb12f798f20b21583d91463fe3f4fa9244c6c8e054a92c8135da8f arch=darwin-bigsur-cannonlake
fyidgzf                      ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
czqittu                      ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
yemqcvc                          ^[email protected]%[email protected] arch=darwin-bigsur-cannonlake
6aulg5u                              ^[email protected]%[email protected]~symlinks+termlib arch=darwin-bigsur-cannonlake

@alalazo alalazo requested a review from tgamblin January 26, 2021 15:23
@adamjstewart
Copy link
Copy Markdown
Member

This is exciting! The only reasons I'm not using the new concretizer at the moment are because a) it's slower than the old concretizer, and b) it still doesn't take into account already-installed packages. Once I have a good reason to use the new concretizer, I can start submitting bug reports for it 😄

@alalazo alalazo marked this pull request as ready for review February 4, 2021 12:18
@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from 27f7cd1 to 69eaf36 Compare February 8, 2021 15:12
@alalazo alalazo closed this Feb 9, 2021
@alalazo alalazo reopened this Feb 9, 2021
@alalazo alalazo mentioned this pull request Feb 20, 2021
@alalazo alalazo requested a review from trws February 23, 2021 20:40
@trws
Copy link
Copy Markdown
Contributor

trws commented Feb 24, 2021

I like this idea, a lot. Out of curiosity, why as a post-pass rather than checking before creating the fact objects? It probably doesn't matter much, since the solve is so much of the cost, but it made me wonder.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Feb 25, 2021

Out of curiosity, why as a post-pass rather than checking before creating the fact objects?

That's because I want to have the full set of facts in case the fast solve is unsat. Filtering is done on heuristics that will work most of the time, but we are effectively reducing the search space and thus solving a different problem. If that problem is unsat the current implementation falls back to use the full set of facts and solve the original problem.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Feb 25, 2021

Another interesting takeaway from the experiments I have done here is that, after this change, the most part of the time is spent setting up the problem. I think we have a lot of space for improvements there, so I am quite confident we can beat the original concretizer in terms of speed as soon as we do a better job of retrieving package metadata to setup the ASP solve.

@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from 69eaf36 to 986a80a Compare February 25, 2021 10:14
@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from 986a80a to a957f48 Compare March 4, 2021 11:30
@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented Mar 12, 2021

Trying a first solve with a reduced number of facts (only preferred targets, compilers and providers)

@alalazo can you clarify this? Is it just the preferred targets/compilers/providers, or the ones that would be optimal if there were no other constraints (or are those always the same thing -- they might be).

I'm asking b/c this is concerning:

I found during development that sometimes 1. may yield different results than 2

If all the criteria we limited things to in (1) are optimal with no conflicting constraints, isn't it provable that the result of (2) should be the same as (1)?

@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from a957f48 to 8dff55a Compare March 17, 2021 17:03
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Mar 17, 2021

can you clarify this? Is it just the preferred targets/compilers/providers, or the ones that would be optimal if there were no other constraints (or are those always the same thing -- they might be).

In general I try to throw into the solve the ones that would be optimal if there are no other constraints, but I use heuristics not some provably optimal set of reduced facts. Take this as an example:

preferred = preferred_targets[0]
target_str = str(preferred.architecture.target)
self.user_targets.append(target_str)
self.gen.fact(fn.package_target_weight(target_str, pkg_name, -30))

In the code above I scan packages.yaml and if a package is a possible dependency and has preferred targets declared, I record the most preferred for the fast solve. I do the same for each target=uarch that appears in the root spec. In the end I solve a problem with a reduced set of facts, for instance allowing only skylake, haswell and x86_64 because they appear in some place as preferences or hard settings.

I think with some effort I can craft examples where a set of weird conflicts may cause the fast solution to be different from the solution one would obtain allowing for all the targets in the solve, but this case should be extremely rare in practice. For what is worth using this PR in the current state it never happened to me to have a fast solution that was different from the full solve, but it may happen. It would be a valid but sub-optimal solution.

There are two ways that comes to my mind to deal with this:

  1. Make it a user choice to opt-in or opt-out of the fast solve and document that the fast solve may give sub-optimal result
  2. Don't try to reduce a dimension (compiler/targets) if scanning the facts there is more than one obvious candidate for that dimension

The first approach would sacrifice some "correctness" for performance (I put it into quotes since the optimization rules and the weights are partially arbitrary) the second one should guarantee that the two solutions are equal, but wouldn't kick-in in cases where 1 would do.

@bvanessen
Copy link
Copy Markdown
Contributor

@alalazo @tgamblin This fixes problems that I am seeing with concretizing multiple packages together in environments with the new concretizer.

@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from 8dff55a to 26c6a59 Compare April 15, 2021 16:55
@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from 26c6a59 to 90a2e21 Compare April 27, 2021 12:46
@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch 2 times, most recently from d5a61ef to 3cddb45 Compare June 4, 2021 08:49
@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from 3cddb45 to 70cca8c Compare July 28, 2021 14:35
@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from 70cca8c to 13c8069 Compare August 11, 2021 09:25
@alalazo alalazo force-pushed the concretizer/optimize_singleshot branch from 13c8069 to 513f51a Compare August 31, 2021 18:25
All the facts are recorded by an ad-hoc object. During a
first solve we discard information from targets that are
unlikely to be chosen. If the problem is unsat, we revert
back to using all the facts at our disposal.
Similarly to what was done for targets in the previous commit,
to speed-up the concretization process here we try to use
only the default compiler in a first solve.
Finally try to reduce the number of providers considered
during the first solve. This requires to introduce some
heuristics to deal with edge cases, mainly with the ones
involving soft preferences (a soft preference that is not
met due to the search space reduction will produce a
sub-optimal solution, instead of being unsat).
…sages

This would happen in specs that don't have any possible virtual
in the DAG.
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Sep 26, 2023

Outdated by changes to the concretizer and other optimizations in #38447

@alalazo alalazo closed this Sep 26, 2023
@alalazo alalazo deleted the concretizer/optimize_singleshot branch September 26, 2023 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants