Skip to content

Cap the maximum number of build jobs#11373

Merged
tgamblin merged 3 commits intospack:developfrom
alalazo:features/cap_maximum_number_of_build_jobs
May 28, 2019
Merged

Cap the maximum number of build jobs#11373
tgamblin merged 3 commits intospack:developfrom
alalazo:features/cap_maximum_number_of_build_jobs

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented May 5, 2019

fixes #11072

config:build_jobs is capped by the number of cores available

config:build_jobs controls the number of parallel jobs to spawn during builds, but cannot ever exceed the number of cores on the machine. The default is set to 16 or the number of available cores, whatever is lowest.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 5, 2019

@jjwilke

@citibeth
Copy link
Copy Markdown
Member

citibeth commented May 5, 2019 via email

@mwkrentel
Copy link
Copy Markdown
Member

IMO, 16 is sufficient for a default. You can always go higher,
you just have to be explicit about it.

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some simplifying requests.

Basically I think having two config options that control build jobs is confusing, and we should try not to do that. See my comments below. I think we can do this with just build_jobs if we just say Spack will never run with more than ncpu build processes, even if you tell it to (which I think is reasonable).

What do people think?

' exceeds the maximum number of build jobs allowed ' \
'[requested: {1}, maximum: {2}, used: {2}]'
tty.warn(msg.format(pkg.name, build_jobs, max_build_jobs))
build_jobs = max_build_jobs
Copy link
Copy Markdown
Member

@tgamblin tgamblin May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some simplifying suggestions for this.

Consider:

  1. I think having both build_jobs and max_build_jobs is confusing and adds complexity. We should not have two options.
  2. I think we should just never run with more processes than multiprocessing.cpu_count(). Can you think of a case where that is a reasonable thing to do?

Given both of those things, how about we just make multiprocessing.cpu_count() the hard maximum, unless build_jobs is set to less than that.

That simplifies the logic in this function to:

build_jobs = min(
    spack.config.get('config:build_jobs'), 
    multiprocessing.cpu_count()
)

And it's much easier to explain -- there's no "request"; we let the config system handle all the places the setting could come from, as it should.

I think we can get rid of the warning above. I think it's just extra noise and it'll get old for users. Could we convert it to a debug message?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. I assume this also means we'll change the default in config.yaml to be:

config:
  build_jobs: 16

in order to fix #11072, is that right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a reason to run with more than multiprocessing.cpu_count() jobs: In certain cases (mostly high-latency-but-large-bandwidth HD) it can give a moderate speed up to use more threads than CPUs (even hyper threaded ones). However I don't think we should be too worried about that, as the whole point of spack is to automate software installations. If someone really really wants it we could still introduce a ignore_multiprocessing_count option (in some later PR)

OTOH the user should be warned if s/he requested more jobs (with spack install -j) than were delivered. I.e. we shouldn't silently cap explicit requests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalazo: yes

@healther: If we really want to make the user able to override, we could add something like config.get_blame_scope(‘config:build_jobs’) to check if it was set on the CLI, and allow manual overrides.

I think it’s a minor use case though, especially since multiprocessing.cpu_count() usually reports number of hyperthreads (not physical cores) anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I was only mentioning it, because currently we are trying to push down our build time (to get some reasonable turnaround times for our software environment deployment). We are currently far away from wanting to optimise build times of single packages on the level of oversubscribing cores, but at some point someone might want to chase down these last percentages.

For the moment I'd be more than happy with a warning if a user wants more than the system provides and maybe a comment on how to get around it (like a pointer to the code in ?config? where the parameter handling will be done).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 1c5fb9b

' exceeds the maximum number of build jobs allowed ' \
'[requested: {1}, maximum: {2}, used: {2}]'
tty.warn(msg.format(pkg.name, build_jobs, max_build_jobs))
build_jobs = max_build_jobs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a reason to run with more than multiprocessing.cpu_count() jobs: In certain cases (mostly high-latency-but-large-bandwidth HD) it can give a moderate speed up to use more threads than CPUs (even hyper threaded ones). However I don't think we should be too worried about that, as the whole point of spack is to automate software installations. If someone really really wants it we could still introduce a ignore_multiprocessing_count option (in some later PR)

OTOH the user should be warned if s/he requested more jobs (with spack install -j) than were delivered. I.e. we shouldn't silently cap explicit requests

@citibeth
Copy link
Copy Markdown
Member

citibeth commented May 6, 2019 via email

@tgamblin
Copy link
Copy Markdown
Member

tgamblin commented May 6, 2019

@healther https://github.com/healther: the scope argument is for blame reporting so that spack config blame can tell the user how the option was set. I’m not sure there’s a good what to set it automatically... there isn’t an “active” scope.

When scopes are merged, it should be possible (in theory) to keep track of
settings on a per-scope basis.

@citibeth: we do keep track (this is how spack config blame works). We actually keep all the scopes in memory and look up values dynamically in precedence order.

This is really a semantic issue. The value is being set by a Spack command based on a command line argument, but the config system doesn't know who's calling it or why. So the command tells it.

Different parts of Spack can set things differently. e.g., another part might want to add something to the defaults scope, or to some custom scope for another purpose.

tgamblin pushed a commit that referenced this pull request May 24, 2019
This PR implements several refactors requested in #11373, specifically:

- Config scopes are used to handle builtin defaults, command line overrides 
  and package overrides (`parallel=False`)
- `Package.make_jobs` attribute has been removed; `make_jobs` remains
  as a module-scope variable in the build environment.
- The use of the argument `-j` has been rationalized across commands
  - move '-j'/'--jobs' argument into `spack.cmd.common.arguments`
- Add unit tests to check that setting parallel jobs works as expected
  - add new test to ensure that build job setting is isolated to each build
- Fix packages that used `Package.make_jobs` (i.e. `bazel`)
config:build_jobs controls the number of parallel jobs to spawn during
builds, but cannot ever exceed the number of cores on the machine.

The default is set to 16 or the number of available cores, whatever
is lowest.
@alalazo alalazo force-pushed the features/cap_maximum_number_of_build_jobs branch from deaa474 to 1c5fb9b Compare May 27, 2019 13:21
@alalazo alalazo removed the blocked label May 27, 2019
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 27, 2019

@tgamblin Ready for another review

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 27, 2019

Also, updated the description at the top.

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 28, 2019

Apologies for the double commit, but it seems the soft link for _spack_root is creating some troubles to PyCharm's built-in support for git 😄

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented May 28, 2019

Anyhow, let me know if you want something rephrased or expanded in the docs.

@tgamblin tgamblin merged commit 01ece82 into spack:develop May 28, 2019
@tgamblin
Copy link
Copy Markdown
Member

Awesome and merged! Thanks!

@alalazo alalazo deleted the features/cap_maximum_number_of_build_jobs branch May 28, 2019 13:42
dev-zero pushed a commit to dev-zero/spack that referenced this pull request Aug 13, 2019
This PR implements several refactors requested in spack#11373, specifically:

- Config scopes are used to handle builtin defaults, command line overrides 
  and package overrides (`parallel=False`)
- `Package.make_jobs` attribute has been removed; `make_jobs` remains
  as a module-scope variable in the build environment.
- The use of the argument `-j` has been rationalized across commands
  - move '-j'/'--jobs' argument into `spack.cmd.common.arguments`
- Add unit tests to check that setting parallel jobs works as expected
  - add new test to ensure that build job setting is isolated to each build
- Fix packages that used `Package.make_jobs` (i.e. `bazel`)
dev-zero pushed a commit to dev-zero/spack that referenced this pull request Aug 13, 2019
* config:build_jobs now controls the number of parallel jobs to spawn during
builds, but cannot ever exceed the number of cores on the machine.

* The default is set to 16 or the number of available cores, whatever
is lowest.

* Updated docs to reflect the changes done to limit parallel builds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spack uses too many processes by default and runs out of memory

5 participants