stdenv: single make jobserver across multiple nix builds#143820
stdenv: single make jobserver across multiple nix builds#143820pennae wants to merge 175 commits intoNixOS:stagingfrom pennae:stdenv-jobserver
Conversation
|
the token loss problem could be solved on linux with a little fuse filesystem that emulates pipes, tracks how many tokens were taken from each file description of a pipe, and returns missing tokens when a pipe file description is released. darwin also has a fuse implementation, so there may just be a chance that if this works on linux it could also work reasonably well on darwin at some point. |
|
updated with a CUSE-based jobserver fifo-alike. this variant survives nix-daemon killing builds hard without losing tokens, and in theory it could be implemented as a FUSE filesystem too (eg for darwin, or if the CUSE variant is deemed too exotic). at the moment it's CUSE mostly because FUSE doesn't seem to have a way to forbid mmapping of files, and while mmapping a token fd wouldn't break anything it would consume 4k tokens in one fell swoop and not give them back for a good while. |
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
|
updated with support for cargo, reintroduced per-build core limits, and added a nixos module. getting there 🤞 |
| throw "Unsupported architecture"; | ||
|
|
||
| buildStdenv = if stdenv.isDarwin then | ||
| overrideCC clangStdenv [ clang_9 llvmPackages_9.llvm llvmPackages_9.lld ] |
There was a problem hiding this comment.
Did this slip in by mistake?
There was a problem hiding this comment.
nope. without this change this package will fail to build trying to build stdenv.jscall (because after this override stdenv.cc is a list, which is illegal. the jscall drv interpolates stdenv.cc into a bash string directly, thus failing here)
python312Packages.plugwise: 0.37.8 -> 0.37.9
zoom-us: 6.0.2.4680 -> 6.0.10.5325
kde-rounded-corners: 0.6.5 -> 0.6.6
jnv: 0.2.2 -> 0.2.3
flashmq: 1.13.0 -> 1.13.1
…s.django-modeltranslation python311Packages.django-modeltranslation: 0.18.13 -> 0.19.0
firebase-tools: 13.10.0 -> 13.10.1
python312Packages.foolscap: fix build
openjdk17, openjfx17, corretto17: update
jetbrains-jdk: 17.0.11-b1000.8 -> 17.0.11-b1207.24)
…s.ytmusicapi python311Packages.ytmusicapi: 1.7.1 -> 1.7.2
binutils: Add --undefined-version on lld 17+
libxml2: Test for pthread_create instead of pthread_join on FreeBSD
I have no further plan to review CppNix code anymore as I will dedicate myself to Lix development. Signed-off-by: Raito Bezarius <[email protected]>
initially only make and cargo support using the jobserver. other build systems may follow suit later.
Signed-off-by: Raito Bezarius <[email protected]>
Motivation for this change
make -jN -lNin stdenv is a very blunt instrument. it works well when max-jobs=1, but as nix-level paralellism increases it becomes increasingly deficient. starting from a low-load situation we start max-jobs * N compilers, loadavg goes through the roof, the-lNload limit kicks in and inhibits new compilers starting until loadavg has fallen below N—at which point all make instances spawn a lot of new compilers and loadavg goes through the roof again. this oscillation leaves the system underutilized in low phases and overcommitted in high phases.testing the current stdenv against a jobserver with 26 tokens on a 12C/24T machine shows that parallel builds of llvm_{8..11} run about 7% faster (35:52min for stdenv, 33:30min with jobserver), a larger build of llvm{5..13} is about about 11% faster (1:27h for stdenv, 1:17h with jobserver). (
removing the[more testing says that-lfrom stdenv also improves utilization but is less efficient. preliminary testing here shows that-l${1.5 * N}may be a good alternative to-lNas used currently, #141266 could be a good vector to go for that instead of this whole mess.-l2Nwould be a minimum to get better utilization, but so far every-lsetting we've tried has produced some underutilization except excessive large numbers like 6N or higher])nothing in here should be regarded as a final suggestion in any way, it's more of a "hey look, this might just work". as such it's extremely rough around the edges, eg to use the jobserver the experimenter currently has to bring a
/jobserverfifo filled with tokens into the nix sandbox:is this something worth pursuing? a 10% speedup for hydra does seem tempting
todos before this is more generally usable:
cc @vcunat @Artturin