Support parallel environment builds#18131
Conversation
|
While the unit tests still need to be updated, I was able to successfully build |
bc78588 to
5d75aa4
Compare
…l specs installed
|
@becker33 I addressed your feedback and appear to have resolved matz-e's issues. |
Good to know. Thanks.
This is great news. Thank you for testing this PR and keeping me posted on how it works for you. |
Thanks! I've managed to fix most of our build issues and get another trial-deployment going. With this PR folded in, we get a nice speed-up, and a full deployment takes probably around 1/3 of the time (it's a little hard to measure with the required restarts). I'm currently testing this with 12 concurrent processes (each in turn running with Sporadically, I notice that a spec is attempted to be built twice and fails the whole installation (I suspect this may have to do something with locking and GPFS, since I'm using two nodes). Otherwise, I also noticed that the following environment: Fails with errors like: I would probably solve this by factoring out LLVM and running it after a bootstrap stage to compile GCC. |
Glad to hear of the speed up!
Good to know. We've seen things like this before with building the provider cache the first time. Do you have debug output from where the builds were hanging before you staggered the start time?
Interesting. Thanks for the example. I'll see if I can reproduce it on my machine. |
They are not really hanging, building seems to go on without adverse effects (that I have noticed so far, still busy patching up build failures). I don't have any debug output right now, but I see messages like this in our Jenkins log, and they disappear with ~1-2 second delay between the individual processes: |
becker33
left a comment
There was a problem hiding this comment.
I'll probably have to go over this again since some of the logic is pretty hard to reason about, but it mostly looks like a reasonable architecture and I've noted everything I noticed this time through.
lib/spack/spack/environment.py
Outdated
| if spec.package.installed: | ||
| self._install_log_links(spec) |
There was a problem hiding this comment.
Should this be in a try/except block (still inside the finally block) to make sure they all get added even if one fails?
Adjust exception wording Co-authored-by: Greg Becker <[email protected]>
lib/spack/spack/installer.py
Outdated
| _print_installed_pkg(pkg.prefix) | ||
| if not task.explicit: | ||
| if _handle_external_and_upstream(pkg, False): | ||
| self._flag_installed(pkg, get_dependent_ids(spec)) |
There was a problem hiding this comment.
I had some issues when, e.g., an external cmake showed up several times in separately concretized specs, and it was not removed from all dependents' uninstalled dependencies. This fixed said scenario with my small reproducer (using our internal packages, unfortunately):
| self._flag_installed(pkg, get_dependent_ids(spec)) | |
| self._flag_installed(pkg, task.dependents) |
At least here, I think it's better to flag for the superset of dependents (which is also readily available), rather than just the ones from one of the specs in the tree.
There was a problem hiding this comment.
I had some issues when, e.g., an external
cmakeshowed up several times in separately concretized specs, and it was not removed from all dependents' uninstalled dependencies. This fixed said scenario with my small reproducer (using our internal packages, unfortunately):At least here, I think it's better to flag for the superset of dependents (which is also readily available), rather than just the ones from one of the specs in the tree.
Good point, thanks!
Reviewing other calls to _flag_installed makes me realize there's a small, related refactor that should make the processing being done here a bit clearer.
…actor _flag_installed for clarity
|
|
||
| Args: | ||
| args (Namespace): argparse namespace with command arguments | ||
| install_args (dict): keyword install arguments |
There was a problem hiding this comment.
It's a bit confusing to have both args and install_args -- this should probably be refactored at some point.
| """ | ||
| # Ensure dealing with a package that has a concrete spec | ||
| if not isinstance(pkg, spack.package.PackageBase): | ||
| raise ValueError("{0} must be a package".format(str(pkg))) |
|
should this work out of the box? at least for me on WSL2 within an environment it does not install multiple packages in parallel, but only does the normal make-based parallelisation. I'm unsure if this should work just as is, or I have to enable it manually in some config and I didn't manage to find out via the documentation so I thought I just ask here |
|
How do you launch it? I can easily trigger build parallelism by launching the same command |
If by "out of the box" you mean it will launch multiple processes to run the build in parallel, then the answer is "no". The jobs option ( As Which means you can launch multiple The effective number of coordinating processes for a given environment or spec is the maximum number of packages in the dependency DAG(s) that have no uninstalled dependencies. |
|
Ah so I expected: to spawn as many (at least number of top-level specs many) processes in order to parallelise the installation. What I saw is that the typical utilisation is 1 core and during normal If I understand you correctly then I should do correct @matz-e ? I was hoping for a more automated process (and actually understood the discussion here in that light, but I only skimmed over it) |
|
@healther correct, as @tldahlgren points out. I have my own wrapper for our deployment, which also ensures that JUnit reports end up in unique files. It's certainly a bit more work than having everything built in, but I appreciate the flexibility to launch building across several nodes of our cluster. |
|
I see, yeah for cluster management I think I see the appeal of doing it explicitly, but for me just rebuilding my own stuff locally "batteries included" would be more convenient ;) thanks for the feedback |
| env.install_all(args) | ||
| specs = env.all_specs() | ||
| if not args.log_file and not reporter.filename: | ||
| reporter.filename = default_log_file(specs[0]) |
There was a problem hiding this comment.
@tldahlgren environments don't always have a nonzero number of specs, and then spack install fails. Could you potentially fix this? I'm not sure what reporter is.
Currently on an env with specs: []:
$ spack install -v
==> Error: list index out of range
which is thanks to specs[0] in this line ^
There was a problem hiding this comment.
There's a quick change (or fix?) at #28031
Replaces #15415
Closes #16724.
As of #13100, Spack installs the dependencies of a single spec in parallel. This PR extends that support to multiple specifications as found in environment builds.
The specs and kwargs for each uninstalled package (when not force-replacing installations) of an environment are collected, passed to the PackageInstaller, and processed using a single build queue.
Note: A locking issue was detected prior to commit (
58b1036) starting an environment build from two separate processes at the same time. Restoration of the process of skipping already installed packages seemed to alleviate this problem.TODO
PackageInstaller's use ofBuildRequestand the associated changesenvironment.py'sinstall_allto use thePackageInstallerdirectlyinstallcommand to leverage the new installation process for multiple specs[+] /usrto[+] /usr (external bzip2-1.0.8-<dag-hash>)- [ ] Address coordination issues between multiple environmentspack installs.matz-e)