Conversation
f5ecea6 to
0cc2922
Compare
|
|
lilyball
left a comment
There was a problem hiding this comment.
Just some minor comments on the bash scripting
0cc2922 to
383d569
Compare
383d569 to
6678847
Compare
|
I made some changes in my fork to run a big matrix of 4x4 jobs, each eval-system on each runner-type. Made two runs:
I guess, I'll need to turn the sandbox off for this test. Will continue later. |
|
Read the error in the later run, it tells you to set |
|
Here's the run with sandbox=relaxed, but still running: https://github.com/wolfgangwalther/nixpkgs/actions/runs/14980771666/job/42084126711 |
wolfgangwalther
left a comment
There was a problem hiding this comment.
According to the latest run above, this works on both x86_64-darwin and aarch64-darwin.
I'm thinking that we might want to have some way to confirm this in CI every now and then - eval should keep working on all platforms, so that all contributors can run it locally, too. I have a few ideas, but we don't need to do this in here.
6678847 to
3a6da35
Compare
|
Rebased. Not sure if we should wait till |
I'd like to do that soon anyway, as I'd like to get #405853 into the pin. |
3a6da35 to
a4c79d1
Compare
Before this change, the eval derivations would never actually complete on any platform other than sandboxed Linux because the mem stats job would never be terminated. For reasons I don't know, the Linux sandbox somehow results in Bash terminating the job on its own. On every other platform (and when the Linux sandbox is disabled), we have to do it correctly and kill it manually.
a4c79d1 to
8d49a2f
Compare
| # For some reason, `kill` (or `kill -9`) won't | ||
| # actually kill an entire process tree on macOS | ||
| # as it does on Linux. So instead, we'll just | ||
| # use xargs' native termination feature, which | ||
| # will at least guarantee that the iteration | ||
| # will be stopped. | ||
| # | ||
| # (Yes, technically we can just keep the `kill` | ||
| # command so that xargs dies and then the pending | ||
| # chunks will finish, but I'd rather just do this | ||
| # if it's going to have the same effect anyways.) | ||
| exit 255 |
There was a problem hiding this comment.
Maybe there is a way to get this to work if there's a way to get the parent's process group ID? Not sure.
There was a problem hiding this comment.
As discussed on Discord, I believe this should actually just be kill 0 for all platforms. AFAICT kill $PPID only works on linux because of the sandbox (that causes the command to fail with an error which set -e causes the builder to exit, and presumably that causes the pid namespace to get torn down), xargs does not proactively kill its spawned processes when it dies.
If you do use kill 0, that signals the entire process group, which is going to include the shell itself since it's a non-interactive shell (which means job control defaults to off). If you want just the xargs process tree to shut down you can enable job control first with set -m, that way the xargs pipeline will be its own process group and the kill 0 will only affect it, at which point bash's set -e behavior will take over. So doing this really just depends on whether there's any benefits to log output (or whether Nix cares at all about whether the builder exits with a code or a signal).
There was a problem hiding this comment.
AFAICT
kill $PPIDonly works on linux because of the sandbox
This would match the issue with the memory stats job that I observed and also bisected down to the Linux sandbox. Fun.
Thanks, I’ll give that a shot. I don’t think we want to set -m at least without somehow explicitly detecting whether the xargs pipeline was killed or not (which I’m not sure is possible?).
There was a problem hiding this comment.
set -m just enables job control monitor mode, which makes every pipeline get its own process group (and bash will print a line when background jobs exit). It shouldn't actually hurt anything, it just makes bash behave a bit more like interactive mode. and you can presumably turn it off again after the xargs finishes if you like.
I'm not sure what you mean about explicitly detecting whether the xargs pipeline was killed? If job control is enabled, the kill 0 will just kill xargs (and seq if it hasn't already finished) and its child processes, and bash will treat that as an error and exit (due to set -e). If you want to intercept that, you can wrap the xargs pipeline in an if or add a || after and test the status code (and a code over 128 indicates a signal, where the signal number is the exit status minus 128, e.g. SIGTERM is 143). But normal errexit behavior should be fine here? It's what you get with kill $PPID already.
| done | ||
| ) & | ||
|
|
||
| trap "kill %%" EXIT |
There was a problem hiding this comment.
This isn't going to work if anyone adds another background job later on, since the jobspec here is evaluated at EXIT time instead of right now.
| trap "kill %%" EXIT | |
| trap "kill $(jobs -p %%)" EXIT |
Also I just want to point out that stdenv installs a default exit handler that this overwrites. That may not matter at all, if nothing installs a failureHook or exitHook, but if you want to preserve that you could switch this over to adding both a failureHook and exitHook (or just a failureHook and adding an explicit kill after the xargs finishes).
There was a problem hiding this comment.
Dropped the x86_64-linux eval system change in favor of waiting for #406825, but everything else should be good to go now that I've updated the pin.
Do we still need to wait for #406825 or has the required change maybe already been merged (since I have been splitting off smaller parts repeatedly)?
Is there anything else, especially from the latest comments, that you'd want to address / change in here?
We just had #418153, too, so making eval work on darwin would be valuable to others, too.
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.