Skip to content

NixOS: Systemd unit-linking script rewrite for 60x speedups#479442

Open
Profpatsch wants to merge 4 commits intoNixOS:staging-nixosfrom
Profpatsch:systemd-unit-linking-script-rewrite
Open

NixOS: Systemd unit-linking script rewrite for 60x speedups#479442
Profpatsch wants to merge 4 commits intoNixOS:staging-nixosfrom
Profpatsch:systemd-unit-linking-script-rewrite

Conversation

@Profpatsch
Copy link
Member

The systemd unit linking script has been a thorn in my side
for a while now. It has to be rebuilt every single time any
unit dependency in a system closure changes, and due to the
multiple for-loops, we’d see multi-second build times that
were essentially caused by spawning thousands of ln/cp/etc
processes one after the other.

An example from a server (small-ish deployment) configuration:

building '/nix/store/2lz2v7dfqvc6azih4ppriv0b40c6gwif-system-units-bash-legacy.drv'...
Built system-units with bash in 2788ms
building '/nix/store/r4xqf8g4aikg2ihr0psvnwa6kj1nzci4-user-units-bash-legacy.drv'...
Built user-units with bash in 193ms
building '/nix/store/qir3gc3la4fndpi45c226q9ysl5qs28v-system-units-go-ng.drv'...
Built system-units with Go in 79ms (consider enabling systemd.unitGenerator.useGoImpl)
building '/nix/store/fah3ipi5iwd44fj5q1ypm3r8zd94mcc9-user-units-go-ng.drv'...
Built user-units with Go in 18ms (consider enabling systemd.unitGenerator.useGoImpl)

As you can see the new scripts take 30–60 times less time to run
and are for all intents and purposes is instant, even without
and paralellization. This speedup is purely from replacing
exec with direct syscalls.

The default configuration after this change is optimized
for debugging the new implementation, it will run both
the bash and the go version and compare their outputs.
In case something is different, it displays a warning and
asks the user to open a new issue with the verbose output.
By default, the old bash script output is used.

There is a new option systemd.unitGenerator.useGoImpl
which, when enabled, does not run the old bash scripts
anymore and will give the speedup. This should be flipped
in future versions of nixos, and the bash implementation
dropped.

Further advantages of the new script is exhaustive error
handling, and a verbose debugging mode that can be enabled
with systemd.unitGenerator.debug = true;.

Things done

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Add a 👍 reaction to pull requests you find important.

A simple binary writer for go scripts.
Follows the setup of the other writers.

No support for third-party dependencies yet, only go stdlib.
The wrapper would have the same name as the actual thing to be
built, which was confusing.
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-nixos-tests This PR causes rebuilds for all NixOS tests and should normally target the staging branches. 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Jan 12, 2026
@Profpatsch
Copy link
Member Author

Note that I used golang because I personally think it’s the ideal language for nixpkgs build scripts (good runtime, awesome stdlib, fast compilation & binaries, lots of systems supported), but if we don’t want to have it in the system closure I could also be convinced to rewrite it to rust std.

@Profpatsch
Copy link
Member Author

Just bear in mind that rust std by default does not even include a json parser, so we’d need cargo … crates … nix build dependencies … rustc bootstrap … linked libraries …

@qweered
Copy link
Contributor

qweered commented Jan 13, 2026

I would prefer anything over bash, but if rust requires all that tooling and we can't have it without all this download/compile closure i am fine with go. Thank you for your work

The systemd unit linking script has been a thorn in my side
for a while now. It has to be rebuilt every single time any
unit dependency in a system closure changes, and due to the
multiple for-loops, we’d see multi-second build times that
were essentially caused by spawning thousands of ln/cp/etc
processes one after the other.

An example from a server (small-ish deployment) configuration:

building '/nix/store/2lz2v7dfqvc6azih4ppriv0b40c6gwif-system-units-bash-legacy.drv'...
Built system-units with bash in 2788ms
building '/nix/store/r4xqf8g4aikg2ihr0psvnwa6kj1nzci4-user-units-bash-legacy.drv'...
Built user-units with bash in 193ms
building '/nix/store/qir3gc3la4fndpi45c226q9ysl5qs28v-system-units-go-ng.drv'...
Built system-units with Go in 79ms (consider enabling systemd.unitGenerator.useGoImpl)
building '/nix/store/fah3ipi5iwd44fj5q1ypm3r8zd94mcc9-user-units-go-ng.drv'...
Built user-units with Go in 18ms (consider enabling systemd.unitGenerator.useGoImpl)

As you can see the new scripts take 30–60 times less time to run
and are for all intents and purposes is instant, even without
and paralellization. This speedup is purely from replacing
`exec` with direct syscalls.

The default configuration after this change is optimized
for debugging the new implementation, it will run both
the bash and the go version and compare their outputs.
In case something is different, it displays a warning and
asks the user to open a new issue with the verbose output.
By default, the old bash script output is used.

There is a new option `systemd.unitGenerator.useGoImpl`
which, when enabled, does not run the old bash scripts
anymore and will give the speedup. This should be flipped
in future versions of nixos, and the bash implementation
dropped.

Further advantages of the new script is exhaustive error
handling, and a verbose debugging mode that can be enabled
with `systemd.unitGenerator.debug = true;`.
@Profpatsch Profpatsch force-pushed the systemd-unit-linking-script-rewrite branch from 500fa1c to c0e21b6 Compare January 13, 2026 15:13
@LordGrimmauld
Copy link
Contributor

Imo rust wouldn't be that big of a deal, the linux kernel already requires rust. I also believe the motivation for this change shouldn't be the speedup during build. Downloading go is probably more expensive than the few seconds of bash execution... That said, bash is painful to debug, the motivation should be maintainability here imo.

@Profpatsch
Copy link
Member Author

Profpatsch commented Jan 13, 2026

To be clear: this is a multi-second useSubstitutes = false derivation that has to be built locally nearly every time anyone nixos-rebuilds their system. It is also a root node, so it can never be parallelized with anything else. Basically it slows down every nixos-rebuild, especially on desktop systems (where you usually have the largest amount of systemd units).
The go package can be substituted and is usually already in the build closure or even in the nix store

@ElvishJerricco
Copy link
Contributor

ElvishJerricco commented Jan 13, 2026

So, personally I think this should be written in Rust. Rust seems to be the preferred language among the systemd team, and that's who's ultimately going to be maintaining this long term. It's also good to minimize the number of languages that team has to be good at. And it would help with the future potential of sharing more code among the various in-house Rust programs in NixOS these days.

The things mentioned with tooling and bootstrapping I think aren't an issue. Whether it's the go compiler or the Rust toolchain and a few Rust dependencies, these things are only needed when the program is built. The resulting program will be cached by Hydra, so users will not be compiling it themselves either way. As for bootstrapping, NixOS already requires Rust in the bootstrap path between switch-to-configuration-ng, make-initrd-ng (used in the shutdownRamfs), and I think spidermonkey is in there somewhere.

That said, I don't think we should block this PR on requiring you to rewrite it in rust. I would prefer that we have this in Go than in Bash. But given that I'd prefer Rust even more, I think I will try my hand at doing this in Rust myself, if you'll give me the evening to try that.

@Profpatsch
Copy link
Member Author

@ElvishJerricco Yeah sure, you can just rewrite the golang part and keep the interface, it should be reasonably straightforward. I marked all interesting bits with comments, maybe those should be preserved. Just make sure the semantics of e.g. the symlink function are the same, golang is a trivial wrapper around symlinkat(2).

@Profpatsch
Copy link
Member Author

@ElvishJerricco feel free to push a new commit onto this PR :)

A direct transliteration of the go implementation.
@Profpatsch
Copy link
Member Author

Here you go, I added a direct transliteration of the go implementation.

@ElvishJerricco
Copy link
Contributor

oh, heh, I had also written a rust version already and hadn't gotten around to integrating it here yet 😅 master...ElvishJerricco:nixpkgs:push-wwrosnwuvopx

(though, I am also looking into a couple other options regarding this that's more about getting rid of bash loops than getting rid of bash logic completely)

@Profpatsch
Copy link
Member Author

oh, heh, I had also written a rust version already and hadn't gotten around to integrating it here yet 😅 master...ElvishJerricco:nixpkgs:push-wwrosnwuvopx

(though, I am also looking into a couple other options regarding this that's more about getting rid of bash loops than getting rid of bash logic completely)

I’m very uncomfortable about the fact that this tries to abstract the bash code instead of translating it 1:1, including all weird differences in the loops. We can’t really know which parts of this are intentional and which parts are accidentally doing the right thing, and there’s a good chance we are going to break downstream users this way.

I’d rather take the speedup and not change any of the logic.

So sorry, but if we create a rust version, it should be a line-by-line translation of the bash code.

@ElvishJerricco
Copy link
Contributor

Huh? The code may not literally be a line for line translation of the original bash, but I assure you I took care to have it output the exact same results as the bash. e.g. My copy_or_create_link function was intended to replicate the weird symlink behavior the bash code had, even though that behavior didn't really make sense to me (until I tested it later for fun and found it is necessary).

Anyway, I think I'm going to withdraw from this PR and leave it to others.

@mdaniels5757 mdaniels5757 changed the base branch from master to staging-nixos January 18, 2026 03:27
@nixpkgs-ci nixpkgs-ci bot closed this Jan 18, 2026
@nixpkgs-ci nixpkgs-ci bot reopened this Jan 18, 2026
@Profpatsch
Copy link
Member Author

the weird symlink behavior the bash code had, even though that behavior didn't really make sense to me (until I tested it later for fun and found it is necessary).

Yeah, there’s a few instances of “copy the symlink file verbatim” to preserve the original (possibly relative) target and not introduce another indirection. Though I’m not sure this is what you are referencing.

But that’s why I mean we shouldn’t change behavior because there’s so many combinations of upstream unit files that can happen.

A counterpoint to this is that if you look at the git blame and ignore the formatting commit, most of the bash code was written 10 years ago by Eelco and has not been touched since, so there must be some improvements that can be made.
However, I’d prefer if that is done in a separate step from the speedup.
Because the important part is the speedup, refactoring basically only gives us the chance to break something. The code is already relatively straightforward, if a little verbose.

@Profpatsch
Copy link
Member Author

I’m not sure how I should proceed now, I don’t think anybody really wants to review it, half the people want to RiiR but nobody except @ElvishJerricco has stepped up.

So I’m tempted to just merge the go impl for now.

Though who are the people that currently effectively maintain systemd? Maybe I should ping them on matrix.

@LordGrimmauld
Copy link
Contributor

Mate. The systemd maintainers were already pinged by CI, and you did a great job pissing ElvishJerrico of, who does a lot of the maintainance work. Pinging on matrix won't fix that. I can understand the dislike for the bash implementation, but frankly saying "So sorry, but [...]" when presented with an alternative approach is a wildly demeaning attitude you have towards existing systemd maintainers.

Anyway, I think I'm going to withdraw from this PR and leave it to others.

This is honestly the sanest choice, and i think i will do the same. These systemd internals are intricate, and need proper thorough review. If all alternative approaches are met with such demeaning attitude, there is really not much motivation left to touch this PR at all. Nixpkgs is a volunteer-run project after all.

So I’m tempted to just merge the go impl for now.

Yeah, no, i don't think that will happen in this current form. The PR doesn't even pass the very basic CI checks...

@ElvishJerricco
Copy link
Contributor

Yeah, there’s a few instances of “copy the symlink file verbatim” to preserve the original (possibly relative) target and not introduce another indirection. Though I’m not sure this is what you are referencing.

That is exactly what I was talking about and it's exactly what my version took significant care not to change the behavior of.

So I’m tempted to just merge the go impl for now.

I would recommend against merging this as is at least on the grounds that:

  • There should only be one implementation. If we're switching to Go or Rust or whatever, we should not retain any others.
  • The timing metrics are completely unnecessary noise.
  • When I first looked at this PR, I saw numerous issues in the Go code that I was going to leave a review about until I decided to just try doing this in Rust myself. So if we're going to merge the Go version, I would like someone to review the Go code with a fine tooth comb. I suspect I would find similar issues in your Rust version if it was just a transliteration of the Go version.

Regardless, one of the reasons I decided to avoid this PR is because this whole generateUnits system is an area I've long thought needed to be rearchitected to begin with. For instance, I would like to dissolve the distinction between "upstream", systemd.units, and systemd.packages, and just have a common underlying mechanism that they all boil down to. I'd also like to switch the actual linking / enabling and such to be based on systemd presets. Each of these is its own significantly larger task though.

With all that in mind, I will leave it to others in the systemd team to decide if this should be iterated on and merged. To be completely clear: I'm not against it; there's just plenty that needs to be done before this is merge-worthy, and I have other, more substantial plans for this part of NixOS already.

@Profpatsch
Copy link
Member Author

  • There should only be one implementation. If we're switching to Go or Rust or whatever, we should not retain any others.

Okay, so you are against preventing downstream breakage by having a grace period in which people can report problems? I don’t think this is a good plan tbh.

fwiw this is one of my major gripes with NixOS as it stands, lots of “rearchitecting” that breaks downstream without any migration periods apart from “someone reviewed it very closely and it worked on their machine”.

  • When I first looked at this PR, I saw numerous issues in the Go code that I was going to leave a review about

Cool, please do. I’d be interested in them. Were they pointing to bugs, inconsistencies?

Each of these is its own significantly larger task though.

This is another major gripe I have with nixpkgs, obvious improvements are not done for long stretches of time because people have big plans for redesigns that might or might not happen, meanwhile every single nixos-rebuild takes 5 seconds more than it needs to because the straightforward fix was never reviewed and merged.

Sorry if I come off as a little salty, most people here do this in their free time after all, but I think the base criticism is valid.

@ElvishJerricco
Copy link
Contributor

ElvishJerricco commented Jan 24, 2026

Each of these is its own significantly larger task though.

This is another major gripe I have with nixpkgs, obvious improvements are not done for long stretches of time because people have big plans for redesigns that might or might not happen, meanwhile every single nixos-rebuild takes 5 seconds more than it needs to because the straightforward fix was never reviewed and merged.

I did not say that my other ideas on how to improve this stuff were reasons not to merge your PR. You are the reason I am not reviewing this PR. Someone else is free to.

@Profpatsch
Copy link
Member Author

You are the reason I am not reviewing this PR

sure

@nixpkgs-ci nixpkgs-ci bot requested a review from a team February 7, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 10.rebuild-nixos-tests This PR causes rebuilds for all NixOS tests and should normally target the staging branches.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants