Make -j flag less exceptional by haampie · Pull Request #22360 · spack/spack

haampie · 2021-03-17T18:12:16Z

The -j flag in spack behaves different from make, ctest, ninja, etc,
because it caps the number of jobs to an arbitrary number 16.

What makes a lot more sense is if spack install uses some reasonable
default, and spack install -j <num> overrides that default. This is
how it's done in all other build systems.

In particular I want to be able to write spack install -j256 on some
AMD EPYC system if I feel like it, and spack shouldn't silently stop me
from doing that. Maybe spack was meant for HPC login nodes, but even
login nodes get fat and in some centers you are encouraged to compile on
login nodes with many cores instead of on compute nodes.

Also if I don't specify -j on the command line, I'm fine if spack
limits the number of build jobs to min(number of cpus, 16) -- I can
see that's a reasonable default, although the 16 is still quite peculiar
and unlike other build systems -- however, as it is right now, spack
does a poor job at determining the number of cpus on linux, since it
doesn't take cgroups into account. In particular this is problematic if
you use distributed builds with slurm. On an AMD EPYC machine with 256
threads, if I would build a big spack environment with many tasks like
srun -c2 -n128 spack -e my_env install, spack will happily start 128
processes using make -j256, instead of actually checking the process
affinity, which reveals only 2 threads can be used. So this PR
introduces spack.util.cpus.cpus_available() which does the sensible
thing on linux. This should also improve the situation with Docker /
Kubernetes, which also use cgroups.

Closes #17598

lib/spack/spack/util/cpus.py

haampie · 2021-03-18T14:44:36Z

lib/spack/spack/test/bindist.py

-    njobs = spack.config.get('config:build_jobs')
-    if not njobs:
-        spack.config.set('config:build_jobs', 4, scope='user')


I thought this was redundant, because config:build_jobs is always populated https://github.com/spack/spack/blob/develop/lib/spack/spack/config.py#L103-L117

lib/spack/spack/build_environment.py

lib/spack/spack/util/cpus.py

haampie · 2021-03-29T14:55:42Z

~~@alalazo I've tried to address the bug introduced in #19605. The problem was spack.config.use_configuration(*config_scopes) calls clear_caches which deletes InternalConfigScope's values.~~

Ok, this is no longer necessary since you can't expect spack.config.get('config:build_jobs') to be not-None in tests.

The -j flag in spack behaves different from make, ctest, ninja, etc, because it caps the number of jobs to an arbitrary number 16. What makes a lot more sense is if `spack install` uses some reasonable default, and `spack install -j <num>` *overrides* that default. This is how it's done in all other build systems. In particular I want to be able to write `spack install -j256` on some AMD EPYC system if I feel like it, and spack shouldn't silently stop me from doing that. Maybe spack was meant for HPC login nodes, but even login nodes get fat and in some centers you are encouraged to compile on login nodes with many cores instead of on compute nodes. Also if I don't specify `-j` on the command line, I'm fine if spack limits the number of build jobs to `min(number of cpus, 16)` -- I can see that's a reasonable default, although the 16 is still quite peculiar and unlike other build systems -- however, as it is right now, spack does a poor job at determining the number of cpus on linux, since it doesn't take cgroups into account. In particular this is problematic if you use distributed builds with slurm. On an AMD EPYC machine with 256 threads, if I would build a big spack environment with many tasks like `srun -c2 -n128 spack -e my_env install`, spack will happily start 128 processes using make -j256, instead of actually checking the process affinity, which reveals only 2 threads can be used. So this PR introduces `spack.util.cpus.cpus_available()` which does the sensible thing on linux. This should also improve the situation with Docker / Kubernetes, which also use cgroups.

becker33 · 2021-03-30T14:49:06Z

lib/spack/spack/build_environment.py

+    config_default = config_default or spack.config.get('config:build_jobs')
+
+    if config_default is None:
+        return max_cpus


The previous behavior was that 16 was both the config default and the code default if no config was specified. We chose this because someone ran a study that showed there is virtually no performance gains beyond 16 threads for most build jobs.

I like that this allows the user to override that default, but I think we should preserve the code default of 16 if no config specifies a value.

Suggested change

config_default = config_default or spack.config.get('config:build_jobs')

if config_default is None:

return max_cpus

config_default = config_default or spack.config.get('config:build_jobs', 16)

I didn't want to duplicate that, since it is already fixed in spack.config.default_config which is used for the _builtin scope:

$ spack python -c "spack.config.scopes()['_builtin'].get_section('config')['config']['build_jobs']" 16

This scope is used almost everywhere, except when spack.config.use_config(...): is used and spack.config.default_config is not added as the "bottom" scope (which is the case in tests with config_scope fixtures, and it was the case too during bootstrap, but that's a potential bug addressed in #22610).

Do you think the 16 should still be there?

Oh well, I can just do , 16) and drop the two lines below that then. The only problem I see is that the _builtin scope tries to solve this problem of providing a default whenever no config files are available... yet there is no guarantee that _builtin is an active scope (in tests usually it's not), so what problem does that solve ¯_(ツ)_/¯.

haampie · 2021-03-30T16:36:15Z

restarted ci :( a random job was "cancelled"

haampie force-pushed the fix-cpus-available-spack-part-2 branch from bf44c88 to 57fb11b Compare March 17, 2021 18:12

tldahlgren reviewed Mar 17, 2021

View reviewed changes

lib/spack/spack/util/cpus.py Outdated Show resolved Hide resolved

tldahlgren requested review from becker33 and scheibelp March 17, 2021 18:18

haampie mentioned this pull request Mar 17, 2021

Use process cpu affinity instead of hardware specs to get cpu count #17566

Closed

tldahlgren added the build label Mar 17, 2021

haampie commented Mar 18, 2021

View reviewed changes

becker33 requested changes Mar 24, 2021

View reviewed changes

lib/spack/spack/build_environment.py Outdated Show resolved Hide resolved

lib/spack/spack/build_environment.py Outdated Show resolved Hide resolved

lib/spack/spack/util/cpus.py Outdated Show resolved Hide resolved

haampie force-pushed the fix-cpus-available-spack-part-2 branch from c7277b4 to 3971f24 Compare March 29, 2021 10:40

haampie force-pushed the fix-cpus-available-spack-part-2 branch 2 times, most recently from 28556f6 to 877530c Compare March 30, 2021 12:35

haampie force-pushed the fix-cpus-available-spack-part-2 branch from 877530c to 1afca67 Compare March 30, 2021 12:37

becker33 requested changes Mar 30, 2021

View reviewed changes

default to 16 as config default in case the _builtin scope is missing

1359c73

becker33 approved these changes Mar 30, 2021

View reviewed changes

becker33 merged commit 1db6cd5 into spack:develop Mar 30, 2021

haampie deleted the fix-cpus-available-spack-part-2 branch March 30, 2021 19:12

haampie mentioned this pull request Sep 24, 2021

Proposal: Use 2 threads per CPU core by default #26242

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make -j flag less exceptional#22360

Make -j flag less exceptional#22360
becker33 merged 2 commits intospack:developfrom
haampie:fix-cpus-available-spack-part-2

haampie commented Mar 17, 2021 •

edited

Loading

Uh oh!

Uh oh!

haampie Mar 18, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haampie commented Mar 29, 2021 •

edited

Loading

Uh oh!

becker33 Mar 30, 2021

Uh oh!

haampie Mar 30, 2021 •

edited

Loading

Uh oh!

haampie Mar 30, 2021 •

edited

Loading

Uh oh!

haampie Mar 30, 2021

Uh oh!

haampie commented Mar 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

haampie commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

haampie Mar 18, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haampie commented Mar 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

becker33 Mar 30, 2021

Choose a reason for hiding this comment

Uh oh!

haampie Mar 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haampie Mar 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haampie Mar 30, 2021

Choose a reason for hiding this comment

Uh oh!

haampie commented Mar 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

haampie commented Mar 17, 2021 •

edited

Loading

haampie commented Mar 29, 2021 •

edited

Loading

haampie Mar 30, 2021 •

edited

Loading

haampie Mar 30, 2021 •

edited

Loading