Cap the maximum number of build jobs#11373
Conversation
|
Thanks. My guess is somewhat higher is safe. I routinely do j20
…On Sun, May 5, 2019 at 06:58 Massimiliano Culpo ***@***.***> wrote:
@jjwilke <https://github.com/jjwilke>
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#11373 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAOVY55QS3B6EK7GWY44TELPT24WPANCNFSM4HK27GIA>
.
|
|
IMO, 16 is sufficient for a default. You can always go higher, |
There was a problem hiding this comment.
I have some simplifying requests.
Basically I think having two config options that control build jobs is confusing, and we should try not to do that. See my comments below. I think we can do this with just build_jobs if we just say Spack will never run with more than ncpu build processes, even if you tell it to (which I think is reasonable).
What do people think?
lib/spack/spack/build_environment.py
Outdated
| ' exceeds the maximum number of build jobs allowed ' \ | ||
| '[requested: {1}, maximum: {2}, used: {2}]' | ||
| tty.warn(msg.format(pkg.name, build_jobs, max_build_jobs)) | ||
| build_jobs = max_build_jobs |
There was a problem hiding this comment.
I have some simplifying suggestions for this.
Consider:
- I think having both
build_jobsandmax_build_jobsis confusing and adds complexity. We should not have two options. - I think we should just never run with more processes than
multiprocessing.cpu_count(). Can you think of a case where that is a reasonable thing to do?
Given both of those things, how about we just make multiprocessing.cpu_count() the hard maximum, unless build_jobs is set to less than that.
That simplifies the logic in this function to:
build_jobs = min(
spack.config.get('config:build_jobs'),
multiprocessing.cpu_count()
)And it's much easier to explain -- there's no "request"; we let the config system handle all the places the setting could come from, as it should.
I think we can get rid of the warning above. I think it's just extra noise and it'll get old for users. Could we convert it to a debug message?
There was a problem hiding this comment.
Fine with me. I assume this also means we'll change the default in config.yaml to be:
config:
build_jobs: 16in order to fix #11072, is that right?
There was a problem hiding this comment.
There may be a reason to run with more than multiprocessing.cpu_count() jobs: In certain cases (mostly high-latency-but-large-bandwidth HD) it can give a moderate speed up to use more threads than CPUs (even hyper threaded ones). However I don't think we should be too worried about that, as the whole point of spack is to automate software installations. If someone really really wants it we could still introduce a ignore_multiprocessing_count option (in some later PR)
OTOH the user should be warned if s/he requested more jobs (with spack install -j) than were delivered. I.e. we shouldn't silently cap explicit requests
There was a problem hiding this comment.
@alalazo: yes
@healther: If we really want to make the user able to override, we could add something like config.get_blame_scope(‘config:build_jobs’) to check if it was set on the CLI, and allow manual overrides.
I think it’s a minor use case though, especially since multiprocessing.cpu_count() usually reports number of hyperthreads (not physical cores) anyway.
There was a problem hiding this comment.
Agreed, I was only mentioning it, because currently we are trying to push down our build time (to get some reasonable turnaround times for our software environment deployment). We are currently far away from wanting to optimise build times of single packages on the level of oversubscribing cores, but at some point someone might want to chase down these last percentages.
For the moment I'd be more than happy with a warning if a user wants more than the system provides and maybe a comment on how to get around it (like a pointer to the code in ?config? where the parameter handling will be done).
lib/spack/spack/build_environment.py
Outdated
| ' exceeds the maximum number of build jobs allowed ' \ | ||
| '[requested: {1}, maximum: {2}, used: {2}]' | ||
| tty.warn(msg.format(pkg.name, build_jobs, max_build_jobs)) | ||
| build_jobs = max_build_jobs |
There was a problem hiding this comment.
There may be a reason to run with more than multiprocessing.cpu_count() jobs: In certain cases (mostly high-latency-but-large-bandwidth HD) it can give a moderate speed up to use more threads than CPUs (even hyper threaded ones). However I don't think we should be too worried about that, as the whole point of spack is to automate software installations. If someone really really wants it we could still introduce a ignore_multiprocessing_count option (in some later PR)
OTOH the user should be warned if s/he requested more jobs (with spack install -j) than were delivered. I.e. we shouldn't silently cap explicit requests
|
@healther <https://github.com/healther>: the scope argument is for blame reporting so that spack config blame can tell the user how the option was set. I’m not sure there’s a good what to set it automatically... there isn’t an “active” scope.
When scopes are merged, it should be possible (in theory) to keep track of
settings on a per-scope basis.
|
@citibeth: we do keep track (this is how This is really a semantic issue. The value is being set by a Spack command based on a command line argument, but the config system doesn't know who's calling it or why. So the command tells it. Different parts of Spack can set things differently. e.g., another part might want to add something to the |
This PR implements several refactors requested in #11373, specifically: - Config scopes are used to handle builtin defaults, command line overrides and package overrides (`parallel=False`) - `Package.make_jobs` attribute has been removed; `make_jobs` remains as a module-scope variable in the build environment. - The use of the argument `-j` has been rationalized across commands - move '-j'/'--jobs' argument into `spack.cmd.common.arguments` - Add unit tests to check that setting parallel jobs works as expected - add new test to ensure that build job setting is isolated to each build - Fix packages that used `Package.make_jobs` (i.e. `bazel`)
config:build_jobs controls the number of parallel jobs to spawn during builds, but cannot ever exceed the number of cores on the machine. The default is set to 16 or the number of available cores, whatever is lowest.
deaa474 to
1c5fb9b
Compare
|
@tgamblin Ready for another review |
|
Also, updated the description at the top. |
tgamblin
left a comment
There was a problem hiding this comment.
LGTM! One minor nitpick, and needs docs here:
- https://spack.readthedocs.io/en/latest/config_yaml.html#build-jobs
- https://spack.readthedocs.io/en/latest/packaging_guide.html#parallel-builds
And maybe other places.
|
Apologies for the double commit, but it seems the soft link for |
|
Anyhow, let me know if you want something rephrased or expanded in the docs. |
|
Awesome and merged! Thanks! |
This PR implements several refactors requested in spack#11373, specifically: - Config scopes are used to handle builtin defaults, command line overrides and package overrides (`parallel=False`) - `Package.make_jobs` attribute has been removed; `make_jobs` remains as a module-scope variable in the build environment. - The use of the argument `-j` has been rationalized across commands - move '-j'/'--jobs' argument into `spack.cmd.common.arguments` - Add unit tests to check that setting parallel jobs works as expected - add new test to ensure that build job setting is isolated to each build - Fix packages that used `Package.make_jobs` (i.e. `bazel`)
* config:build_jobs now controls the number of parallel jobs to spawn during builds, but cannot ever exceed the number of cores on the machine. * The default is set to 16 or the number of available cores, whatever is lowest. * Updated docs to reflect the changes done to limit parallel builds
fixes #11072
config:build_jobsis capped by the number of cores availableconfig:build_jobscontrols the number of parallel jobs to spawn during builds, but cannot ever exceed the number of cores on the machine. The default is set to 16 or the number of available cores, whatever is lowest.