Commit f2aca86
authored
Distributed builds (#13100)
Fixes #9394
Closes #13217.
## Background
Spack provides the ability to enable/disable parallel builds through two options: package `parallel` and configuration `build_jobs`. This PR changes the algorithm to allow multiple, simultaneous processes to coordinate the installation of the same spec (and specs with overlapping dependencies.).
The `parallel` (boolean) property sets the default for its package though the value can be overridden in the `install` method.
Spack's current parallel builds are limited to build tools supporting `jobs` arguments (e.g., `Makefiles`). The number of jobs actually used is calculated as`min(config:build_jobs, # cores, 16)`, which can be overridden in the package or on the command line (i.e., `spack install -j <# jobs>`).
This PR adds support for distributed (single- and multi-node) parallel builds. The goals of this work include improving the efficiency of installing packages with many dependencies and reducing the repetition associated with concurrent installations of (dependency) packages.
## Approach
### File System Locks
Coordination between concurrent installs of overlapping packages to a Spack instance is accomplished through bottom-up dependency DAG processing and file system locks. The runs can be a combination of interactive and batch processes affecting the same file system. Exclusive prefix locks are required to install a package while shared prefix locks are required to check if the package is installed.
Failures are communicated through a separate exclusive prefix failure lock, for concurrent processes, combined with a persistent store, for separate, related build processes. The resulting file contains the failing spec to facilitate manual debugging.
### Priority Queue
Management of dependency builds changed from reliance on recursion to use of a priority queue where the priority of a spec is based on the number of its remaining uninstalled dependencies.
Using a queue required a change to dependency build exception handling with the most visible issue being that the `install` method *must* install something in the prefix. Consequently, packages can no longer get away with an install method consisting of `pass`, for example.
## Caveats
- This still only parallelizes a single-rooted build. Multi-rooted installs (e.g., for environments) are TBD in a future PR.
Tasks:
- [x] Adjust package lock timeout to correspond to value used in the demo
- [x] Adjust database lock timeout to reduce contention on startup of concurrent
`spack install <spec>` calls
- [x] Replace (test) package's `install: pass` methods with file creation since post-install
`sanity_check_prefix` will otherwise error out with `Install failed .. Nothing was installed!`
- [x] Resolve remaining existing test failures
- [x] Respond to alalazo's initial feedback
- [x] Remove `bin/demo-locks.py`
- [x] Add new tests to address new coverage issues
- [x] Replace built-in package's `def install(..): pass` to "install" something
(i.e., only `apple-libunwind`)
- [x] Increase code coverage1 parent 2f4881d commit f2aca86
File tree
100 files changed
+2954
-963
lines changed- etc/spack/defaults
- lib/spack
- llnl/util
- tty
- spack
- cmd
- test
- cmd
- llnl/util
- var/spack/repos
- builtin.mock/packages
- a
- boost
- b
- conflicting-dependent
- c
- dep-diamond-patch-mid1
- dep-diamond-patch-mid2
- dep-diamond-patch-top
- develop-test2
- develop-test
- direct-mpich
- dt-diamond-bottom
- dt-diamond-left
- dt-diamond-right
- dt-diamond
- dtbuild1
- dtbuild2
- dtbuild3
- dtlink1
- dtlink2
- dtlink3
- dtlink4
- dtlink5
- dtrun1
- dtrun2
- dtrun3
- dttop
- dtuse
- externalmodule
- externalprereq
- externaltool
- externalvirtual
- e
- fake
- flake8
- git-svn-top-level
- git-test
- git-top-level
- git-url-svn-top-level
- git-url-top-level
- hash-test1
- hash-test2
- hash-test3
- hg-test
- hg-top-level
- hypre
- indirect-mpich
- maintainers-1
- maintainers-2
- mixedversions
- module-path-separator
- multi-provider-mpi
- multimodule-inheritance
- multivalue_variant
- netlib-blas
- netlib-lapack
- nosource-install
- openblas-with-lapack
- openblas
- optional-dep-test-2
- optional-dep-test-3
- optional-dep-test
- othervirtual
- override-context-templates
- override-module-templates
- patch-a-dependency
- patch-several-dependencies
- patch
- perl
- preferred-test
- python
- simple-inheritance
- singlevalue-variant-dependent
- svn-test
- svn-top-level
- url-list-test
- url-test
- when-directives-false
- when-directives-true
- zmpi
- builtin/packages/apple-libunwind
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
100 files changed
+2954
-963
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
137 | 137 | | |
138 | 138 | | |
139 | 139 | | |
140 | | - | |
| 140 | + | |
141 | 141 | | |
142 | 142 | | |
143 | 143 | | |
| |||
0 commit comments