Skip to content

Distributed builds#13100

Merged
tgamblin merged 164 commits intospack:developfrom
tldahlgren:feature/distributed-builds
Feb 19, 2020
Merged

Distributed builds#13100
tgamblin merged 164 commits intospack:developfrom
tldahlgren:feature/distributed-builds

Conversation

@tldahlgren
Copy link
Copy Markdown
Contributor

@tldahlgren tldahlgren commented Oct 9, 2019

Fixes #9394
Closes #13217.

Background

Spack provides the ability to enable/disable parallel builds through two options: package parallel and configuration build_jobs. This PR changes the algorithm to allow multiple, simultaneous processes to coordinate the installation of the same spec (and specs with overlapping dependencies.).

The parallel (boolean) property sets the default for its package though the value can be overridden in the install method.

Spack's current parallel builds are limited to build tools supporting jobs arguments (e.g., Makefiles). The number of jobs actually used is calculated asmin(config:build_jobs, # cores, 16), which can be overridden in the package or on the command line (i.e., spack install -j <# jobs>).

This PR adds support for distributed (single- and multi-node) parallel builds. The goals of this work include improving the efficiency of installing packages with many dependencies and reducing the repetition associated with concurrent installations of (dependency) packages.

Approach

File System Locks

Coordination between concurrent installs of overlapping packages to a Spack instance is accomplished through bottom-up dependency DAG processing and file system locks. The runs can be a combination of interactive and batch processes affecting the same file system. Exclusive prefix locks are required to install a package while shared prefix locks are required to check if the package is installed.

Failures are communicated through a separate exclusive prefix failure lock, for concurrent processes, combined with a persistent store, for separate, related build processes. The resulting file contains the failing spec to facilitate manual debugging.

Priority Queue

Management of dependency builds changed from reliance on recursion to use of a priority queue where the priority of a spec is based on the number of its remaining uninstalled dependencies.

Using a queue required a change to dependency build exception handling with the most visible issue being that the install method must install something in the prefix. Consequently, packages can no longer get away with an install method consisting of pass, for example.

Caveats

Built-in Package's Provider Cache

Update: The fix has been merged into this PR.

This PR does not address contention related to the initial build of built-ins provider cache, which is performed on an as-needed basis as part of spack install. Until that issue is addressed (in a separate PR), it is recommended that you ensure the cache, which can be found in $HOME/.spack/cache, exists before attempting spack install from more than one process. The cache can be created by running spack spec <spec> from a single process before starting the installs.

Concretizing Environments

Update: The fix, from #14621 , has been merged into this PR as of commit 032745b.

Distributed builds in environments encounter No such file or directory: '/<env>/.spack.lock.tmp'\n==> Installing environment /<env> errors when attempting to run spack install [-j<#>] & in parallel. This has been tracked down to Environment.write() not being thread-safe.

Requested Spec Failures

There appears to be a failure notification timing issue between processes should the requested spec fail to install. This results in each process taking a turn at attempting to install the final spec.

TODO

  • Adjust package lock timeout to correspond to value used in the demo
  • Adjust database lock timeout to reduce contention on startup of concurrent spack install <spec> calls
  • Replace (test) package's install: pass methods with file creation since post-install sanity_check_prefix will otherwise error out with Install failed .. Nothing was installed!
  • Resolve remaining existing test failures
  • Respond to alalazo's initial feedback
  • Remove bin/demo-locks.py
  • Add new tests to address new coverage issues
  • Replace built-in package's def install(..): pass to "install" something (i.e., only apple-libunwind)
  • Address tgamblin's feedback
  • Increase code coverage
    - [ ] Update install docs

@tldahlgren tldahlgren force-pushed the feature/distributed-builds branch from 3727e23 to 8d24870 Compare October 9, 2019 01:17
@tldahlgren tldahlgren changed the title WIP: Support distributed builds [WIP] Support distributed builds Oct 22, 2019
tgamblin pushed a commit that referenced this pull request Feb 27, 2020
The new build process, introduced in #13100 , relies on a spec's dependents in addition to their dependencies. Loading a spec from a yaml file was not initializing the dependents.

- [x] populate dependents when loading from yaml
tgamblin pushed a commit that referenced this pull request Mar 2, 2020
…15197)

The distributed build PR (#13100) -- did not check the install status of dependencies when using the `--only package` option so would refuse to install a package with the claim that it had uninstalled dependencies whether that was the case or not.

- [x] add install status checks for the `--only package` case.
- [x] add initial set of tests
tgamblin added a commit that referenced this pull request Mar 7, 2020
This is a minor permission fix on the new installer.py introduced in #13100.
tgamblin added a commit that referenced this pull request Mar 8, 2020
This is a minor permission fix on the new installer.py introduced in #13100.
tgamblin pushed a commit that referenced this pull request Mar 20, 2020
…15197)

The distributed build PR (#13100) -- did not check the install status of dependencies when using the `--only package` option so would refuse to install a package with the claim that it had uninstalled dependencies whether that was the case or not.

- [x] add install status checks for the `--only package` case.
- [x] add initial set of tests
tgamblin pushed a commit that referenced this pull request Mar 20, 2020
The new build process, introduced in #13100 , relies on a spec's dependents in addition to their dependencies. Loading a spec from a yaml file was not initializing the dependents.

- [x] populate dependents when loading from yaml
tgamblin added a commit that referenced this pull request Mar 20, 2020
This is a minor permission fix on the new installer.py introduced in #13100.
tgamblin added a commit that referenced this pull request Mar 28, 2020
With the addition of #13100 (parallel builds), people are going to try to
run Spack in the background more often.  This makes Spack handle that
situation gracefully, the way a good POSIX program should.

Specifically:

1. When `spack install` is running, we disable echo and canonical input
   so that users can type `v` to toggle build output.  We do that in a
   safe way now, so that it does not generate `SIGTTOU` in the background
   (#14682 did this).

2. We properly disable keyboard input mode when Spack is placed in the
   background, and re-enable it when Spack is in the foreground.  This
   means that if you Ctrl-Z a spack install, your terminal won't be left
   in a weird state.

3. We'll continue writing verbose output when Spack is in the background.
   If you have `stty +tostop` on, it'll end up stopping the build when
   you try to run in the background, unless you redirect output. This is
   normal behavior and it lets you easily do thingslike `spack install &>
   log.txt &`

4. Spack works fine when stopped in the background or when running in the
   background.

(2) is handled mostly with signal handlers (the way things like `vi` and
`emacs` do it) -- see the code for how that's done -- it's a bit tricky
in Python, as Python did not support blocking signals until 3.8.  It
turns out we can still make it work.
tgamblin added a commit that referenced this pull request Mar 28, 2020
With the addition of #13100 (parallel builds), people are going to try to
run Spack in the background more often.  This makes Spack handle that
situation gracefully, the way a good POSIX program should.

Specifically:

1. When `spack install` is running, we disable echo and canonical input
   so that users can type `v` to toggle build output.  We do that in a
   safe way now, so that it does not generate `SIGTTOU` in the background
   (#14682 did this).

2. We properly disable keyboard input mode when Spack is placed in the
   background, and re-enable it when Spack is in the foreground.  This
   means that if you Ctrl-Z a spack install, your terminal won't be left
   in a weird state.

3. We'll continue writing verbose output when Spack is in the background.
   If you have `stty +tostop` on, it'll end up stopping the build when
   you try to run in the background, unless you redirect output. This is
   normal behavior and it lets you easily do things like this:

       spack install -v &> log.txt &

4. Spack works fine when stopped in the background or when running in the
   background.

(2) is handled mostly with signal handlers (the way things like `vi` and
`emacs` do it) -- see the code for how that's done -- it's a bit tricky
in Python, as Python did not support blocking signals until 3.8.  It
turns out we can still make it work.
tgamblin added a commit that referenced this pull request Mar 28, 2020
With the addition of #13100 (parallel builds), people are going to try to
run Spack in the background more often.  This makes Spack handle that
situation gracefully, the way a good POSIX program should.

Specifically:

1. When `spack install` is running, we disable echo and canonical input
   so that users can type `v` to toggle build output.  We do that in a
   safe way now, so that it does not generate `SIGTTOU` in the background
   (#14682 did this).

2. We properly disable keyboard input mode when Spack is placed in the
   background, and re-enable it when Spack is in the foreground.  This
   means that if you Ctrl-Z a spack install, your terminal won't be left
   in a weird state.

3. We'll continue writing verbose output when Spack is in the background.
   If you have `stty +tostop` on, it'll end up stopping the build when
   you try to run in the background, unless you redirect output. This is
   normal behavior and it lets you easily do things like this:

       spack install -v &> log.txt &

4. Spack works fine when stopped in the background or when running in the
   background.

(2) is handled mostly with signal handlers (the way things like `vi` and
`emacs` do it) -- see the code for how that's done -- it's a bit tricky
in Python, as Python did not support blocking signals until 3.8.  It
turns out we can still make it work.
tgamblin added a commit that referenced this pull request Mar 28, 2020
With the addition of #13100 (parallel builds), people are going to try to
run Spack in the background more often.  This makes Spack handle that
situation gracefully, the way a good POSIX program should.

Specifically:

1. When `spack install` is running, we disable echo and canonical input
   so that users can type `v` to toggle build output.  We do that in a
   safe way now, so that it does not generate `SIGTTOU` in the background
   (#14682 did this).

2. We properly disable keyboard input mode when Spack is placed in the
   background, and re-enable it when Spack is in the foreground.  This
   means that if you Ctrl-Z a spack install, your terminal won't be left
   in a weird state.

3. We'll continue writing verbose output when Spack is in the background.
   If you have `stty +tostop` on, it'll end up stopping the build when
   you try to run in the background, unless you redirect output. This is
   normal behavior and it lets you easily do things like this:

       spack install -v &> log.txt &

4. Spack works fine when stopped in the background or when running in the
   background.

(2) is handled mostly with signal handlers (the way things like `vi` and
`emacs` do it) -- see the code for how that's done -- it's a bit tricky
in Python, as Python did not support blocking signals until 3.8.  It
turns out we can still make it work.
tgamblin added a commit that referenced this pull request Apr 1, 2020
With the addition of #13100 (parallel builds), people are going to try to
run Spack in the background more often.  This makes Spack handle that
situation gracefully, the way a good POSIX program should.

Specifically:

1. When `spack install` is running, we disable echo and canonical input
   so that users can type `v` to toggle build output.  We do that in a
   safe way now, so that it does not generate `SIGTTOU` in the background
   (#14682 did this).

2. We properly disable keyboard input mode when Spack is placed in the
   background, and re-enable it when Spack is in the foreground.  This
   means that if you Ctrl-Z a spack install, your terminal won't be left
   in a weird state.

3. We'll continue writing verbose output when Spack is in the background.
   If you have `stty +tostop` on, it'll end up stopping the build when
   you try to run in the background, unless you redirect output. This is
   normal behavior and it lets you easily do things like this:

       spack install -v &> log.txt &

4. Spack works fine when stopped in the background or when running in the
   background.

(2) is handled mostly with signal handlers (the way things like `vi` and
`emacs` do it) -- see the code for how that's done -- it's a bit tricky
in Python, as Python did not support blocking signals until 3.8.  It
turns out we can still make it work.
tgamblin added a commit that referenced this pull request Jul 27, 2020
A bug was introduced in #13100 where ChildErrors would be redundantly
printed when raised during a build. We should eventually revisit error
handling in builds and figure out what the right separation of
responsibilities is for distributed builds, but for now just skip
printing.

- [x] SpackErrors were designed to be printed by the forked process, not
      by the parent, so check if they've already been printed.
tgamblin added a commit that referenced this pull request Jul 27, 2020
A bug was introduced in #13100 where ChildErrors would be redundantly
printed when raised during a build. We should eventually revisit error
handling in builds and figure out what the right separation of
responsibilities is for distributed builds, but for now just skip
printing.

- [x] SpackErrors were designed to be printed by the forked process, not
      by the parent, so check if they've already been printed.
tgamblin added a commit that referenced this pull request Jul 27, 2020
A bug was introduced in #13100 where ChildErrors would be redundantly
printed when raised during a build. We should eventually revisit error
handling in builds and figure out what the right separation of
responsibilities is for distributed builds, but for now just skip
printing.

- [x] SpackErrors were designed to be printed by the forked process, not
      by the parent, so check if they've already been printed.
- [x] update tests
tgamblin added a commit that referenced this pull request Jul 27, 2020
A bug was introduced in #13100 where ChildErrors would be redundantly
printed when raised during a build. We should eventually revisit error
handling in builds and figure out what the right separation of
responsibilities is for distributed builds, but for now just skip
printing.

- [x] SpackErrors were designed to be printed by the forked process, not
      by the parent, so check if they've already been printed.
- [x] update tests
tgamblin added a commit that referenced this pull request Jul 27, 2020
A bug was introduced in #13100 where ChildErrors would be redundantly
printed when raised during a build. We should eventually revisit error
handling in builds and figure out what the right separation of
responsibilities is for distributed builds, but for now just skip
printing.

- [x] SpackErrors were designed to be printed by the forked process, not
      by the parent, so check if they've already been printed.
- [x] update tests
tgamblin pushed a commit that referenced this pull request Nov 17, 2020
As of #13100, Spack installs the dependencies of a _single_ spec in parallel.
Environments, when installed, can only get parallelism from each individual
spec, as they're installed in order.  This PR makes entire environments build
in parallel by extending Spack's package installer to accept multiple root
specs.  The install command and Environment class have been updated to use
the new parallel install method.

The specs and kwargs for each *uninstalled* package (when not force-replacing
installations) of an environment are collected, passed to the `PackageInstaller`,
and processed using a single build queue.

This introduces a `BuildRequest` class to track install arguments, and it
significantly cleans up the code used to track package ids during installation.
Package ids in the build queue are now just DAG hashes as you would expect,

Other tasks:

- [x] Finish updating the unit tests based on `PackageInstaller`'s use of
      `BuildRequest` and the associated changes
- [x] Change `environment.py`'s `install_all` to use the `PackageInstaller` directly
- [x] Change the `install` command to leverage the new installation process for multiple specs
- [x] Change install output messages for external packages, e.g.:
       `[+] /usr` -> `[+] /usr (external bzip2-1.0.8-<dag-hash>`
- [x] Fix incomplete environment install's view setup/update and not confirming all 
       packages are installed (?)
- [x] Ensure externally installed package dependencies are properly accounted for in 
       remaining build tasks
- [x] Add tests for coverage (if insufficient and can identity the appropriate, uncovered non-comment lines)
- [x] Add documentation
- [x] Resolve multi-compiler environment install issues
- [x] Fix issue with environment installation reporting (restore CDash/JUnit reports)
@tldahlgren tldahlgren deleted the feature/distributed-builds branch August 27, 2024 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Corrupt installation directory during simultaneous spack installs

7 participants