Skip to content

workflows/eval: compare eval consistency and performance between Lix / Nix versions#427724

Merged
wolfgangwalther merged 4 commits intoNixOS:masterfrom
wolfgangwalther:ci-eval-perf
Aug 12, 2025
Merged

workflows/eval: compare eval consistency and performance between Lix / Nix versions#427724
wolfgangwalther merged 4 commits intoNixOS:masterfrom
wolfgangwalther:ci-eval-perf

Conversation

@wolfgangwalther
Copy link
Contributor

@wolfgangwalther wolfgangwalther commented Jul 23, 2025

I extended the approach I used here to compare performance between Nix versions to run automatically in CI and also confirm Eval consistency as described in #428076 (comment).

The first two commits are from #432286 right now, to be discussed and merged there. The last commit is only temporary, to be dropped before merge: It triggers the full eval for all versions, to be able to review the results in this PR.

The results are available in the PR summary page.


Original PR body:

With the announcement of Nix 2.30 having considerably lower memory consumption during Eval, I wanted to see whether this would give us any performance improvement for Eval in CI. The idea would be that lower memory consumption could allow bigger chunkSizes, which would speed up Eval.

Another observation I made recently in #427434, where Nix 2.28 / Nix 2.29 took 9x longer to build the tarball than Lix or Nix 2.30, made me curious whether there were any differences between the different versions / implementations as well.

Things done

I ran a matrix job of 70 eval jobs in pull_request context here. I only evaluated for --evalSystem x86_64-linux, because that's the longest of the 4 platform jobs. I ran with different chunk sizes and different versions of nix. Results, see below.

The reported numbers are the times the eval step takes. The jobs don't run to completion, but all fail, because they create conflicting artifacts.

Conclusion

Nix 2.30 is faster - we need to use it for Eval!

cc @NixOS/nixpkgs-ci

@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. 6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: policy discussion Discuss policies to work in and around Nixpkgs backport release-25.05 labels Jul 23, 2025
@wolfgangwalther

This comment was marked as outdated.

@wolfgangwalther

This comment was marked as outdated.

@emilazy

This comment was marked as resolved.

@xokdvium
Copy link
Contributor

re: memory usage?

I'm a bit confused. I might be wrong, but it doesn't seem that these versions are actually used? At least judging from the logs for lixPackageSets.lix_2_93.lix (https://github.com/NixOS/nixpkgs/actions/runs/16469453994/job/46554806261?pr=427724):

copying path '/nix/store/4gdwf5kxk9rlbdlk6b2nlh1frsniz5wf-nix-flake-c-2.29.1-dev' from 'https://cache.nixos.org/'...
copying path '/nix/store/q6l3w2xp5ayxab4imghhwb6xkx7yllkq-nix-2.29.1-dev' from 'https://cache.nixos.org/'...
building '/nix/store/v2yizp2h25a7dbh5nf19ynb6f2s721c2-attrpaths-superset.json.drv'...

I'm pretty sure this is just measuring noise of nix_2_29 and the --argstr nix is ignored? Am I missing something?

@wolfgangwalther
Copy link
Contributor Author

I'm pretty sure this is just measuring noise of nix_2_29 and the --argstr nix is ignored? Am I missing something?

That would be a very good explanation. I did confirm that the attributes I had set up were working locally, it was downloading different lix versions from cache. Maybe I made a mistake when hooking things up in the workflow file.

In this case, we'd now have a good idea of how noisy this is :D

@wolfgangwalther
Copy link
Contributor Author

Eval Lix 2.91 Lix 2.92 Lix 2.93 Nix 2.24 Nix 2.28 Nix 2.29 Nix 2.30
6000 4:32 3:31 3:40 3:58 3:58 3:58 2:58
8000 3:32 3:26 3:37 4:11 3:59 4:00 2:59
10000 4:26 3:56 5:10 4:06 3:44 4:06 3:14
12000 9:20 10+ 10+ 4:01 3:48 4:04 2:57
14000 10+ 10+ 10+ 8:26 6:26 8:43 3:05
16000 10+ 10+ 10+ 10+ 5:35 5:45 3:01

That looks much better. Nix 2.30 seems to be able to take fewer memory indeed and allow higher chunkSize. There doesn't seem to be an improvement in performance with higher chunk sizes, though. Although, I'd probably need to run with even higher chunkSizes for Nix 2.30, too, to see how that goes.

But, there is an improvement with Nix 2.30, so that's pretty cool.

Also, if we want to run with Lix as well, then we need to lower the chunk size a bit, because it appears it started swapping with 10k once already. Lowering chunk size would not have a negative effect for Nix 2.30, though, so no problem.


Memory consumption from the logs can't be compared well, because in the case of Nix 2.30 the later chunks start much quicker, thus it's not comparable. The data clearly shows improvement with higher chunkSizes, though.

@lf-
Copy link
Member

lf- commented Jul 25, 2025

It appears that what this is measuring is the GC falling over if you eval too much stuff in one process, right? So what you're actually measuring here is memory pressure (and boehmgc sucking), probably. Nevertheless this is deeply interesting data, thank you for measuring it!

@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Aug 7, 2025
@wolfgangwalther wolfgangwalther changed the title ci/eval: test performance (WIP) workflows/eval: compare eval consistency and performance between Lix / Nix versions Aug 9, 2025
@nixpkgs-ci nixpkgs-ci bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Aug 9, 2025
@wolfgangwalther
Copy link
Contributor Author

I updated the PR to run these performance comparison whenever we update ci/pinned.json. Thus we can continually monitor the performance of the different implementations. The results of these Eval runs are also used to check for Eval consistency as described in #428076 (comment).

The pull_request workflow currently doesn't run, yet - and I have no idea why. It fails with an obscure error: https://github.com/NixOS/nixpkgs/actions/runs/16850915977. I had seen this in a different PR already, but wasn't able to fix it, yet. All of this works fine in my fork.

@wolfgangwalther
Copy link
Contributor Author

The pull_request workflow currently doesn't run, yet - and I have no idea why. It fails with an obscure error: https://github.com/NixOS/nixpkgs/actions/runs/16850915977. I had seen this in a different PR already, but wasn't able to fix it, yet. All of this works fine in my fork.

A rebase after merging #432286 helped: The workflow is running now. The results will be visible here: https://github.com/NixOS/nixpkgs/actions/runs/16852783378?pr=427724.

Copied the resulting table here:

Lix/Nix version comparison

Versionaarch64-linuxaarch64-darwinx86_64-linuxx86_64-darwin
lixPackageSets.git.lix⚠️⚠️⚠️⚠️
lixPackageSets.lix_2_91.lix1127813377
lixPackageSets.lix_2_92.lix1087512078
lixPackageSets.lix_2_93.lix1118112785
nixVersions.git1037010470
nixVersions.nix_2_24145116149113
nixVersions.nix_2_28135110150116
nixVersions.nix_2_29147109151114
nixVersions.nix_2_30977110771

Evaluation time in seconds without downloading dependencies.

⚠️ Job did not report a result.

❌ Job produced different outpaths than the target branch.


lixPackageSets.git.lix is currently still broken. It was fixed in #424775, but that's not available in the channel / pin, yet.

The reported times are lower than these in #427724 (comment), because we're not including the download of dependencies. We're also not including the attrpath generation step, but only the outpath step (this could be improved on in the future).

@wolfgangwalther wolfgangwalther marked this pull request as ready for review August 9, 2025 19:23
@wolfgangwalther
Copy link
Contributor Author

lixPackageSets.git.lix is currently still broken. It was fixed in #424775, but that's not available in the channel / pin, yet.

This is now fixed with the latest bump of pinned.json.

This is ready to go from my side.

Copy link
Contributor

@MattSturgeon MattSturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@wolfgangwalther wolfgangwalther force-pushed the ci-eval-perf branch 2 times, most recently from 5b877fe to a3533c7 Compare August 11, 2025 16:34
@nixpkgs-ci nixpkgs-ci bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Aug 11, 2025
The Dependabot update change the hashes to the latest main branch commit
instead of the v5.0.0 tag - also it didn't adjust the tags in the
comments accordingly. Last but not least, one of the references used a
`@v5` reference instead of the commit hash. The latter is probably what
Dependabot tripped on.
Move "Packages" up, because it's much shorter and easier to scroll past.
This way both Packages and Performance are visible immediately.
With this change, we start running Eval on all available Lix and Nix
versions. Because this requires a lot of resources, this complete test
is only run when `ci/pinned.json` is updated.

The resulting outpaths are checked for consistency with the target
branch. A difference will cause the `report` job to fail, thus blocking
the merge, ensuring Eval consistency for Nixpkgs across different
versions.

This implements a kind of "ratchet style" check: Since we originally
confirmed that the versions currently in Nixpkgs at the time of this
commit match Eval behavior of Nix 2.3, we can ensure consistency with
Nix 2.3 down the road, even without testing for it explicitly.

There had been one regression in Eval consistency for Nix between 2.18
and 2.24 - two tests in `tests.devShellTools` produce different results
between Lix 2.91+ (which was forked from Nix 2.18) and Nix 2.24+. I
assume it's unlikely that such a change would be "fixed" by now, thus I
added an exception for these.

As a bonus, we also present the total time in seconds it takes for Eval
to complete for every tested version in a summary table. This allows us
to easily see performance improvements for Eval due to version updates.
At this stage, this time only includes the "outpaths" step of Eval, but
not the generation of attrpaths beforehand.
@wolfgangwalther
Copy link
Contributor Author

Just rebased and used the latest commit hashes for the newly used actions after the recent Dependabot updates.

Also added commit to fix the actions/download-artifact hashes, where Dependabot likely tripped over a non-commit-hash reference and updated to the latest main commit instead of the v5.0.0 tag. The difference was only some READMEs, so I changed all of them to the v5.0.0 tag explicitly now.

@wolfgangwalther wolfgangwalther merged commit 161a4f0 into NixOS:master Aug 12, 2025
76 of 78 checks passed
@wolfgangwalther wolfgangwalther deleted the ci-eval-perf branch August 12, 2025 08:22
@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Aug 12, 2025

@wolfgangwalther
Copy link
Contributor Author

wolfgangwalther commented Aug 12, 2025

Seems like the Eval / compare job for regular PRs is cancelled skipped after merging this. This causes the "request reviewers" job to timeout.

eval:
runs-on: ubuntu-24.04-arm
needs: versions
if: ${{ !cancelled() }}
Copy link
Contributor Author

@wolfgangwalther wolfgangwalther Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what I did here. It seems to actually work, though, which is confusing.

I want the eval job to run when:

  • The versions job succeeds
  • The versions job is not run at all

I don't want eval to run when:

  • The versions job runs, but fails

!cancelled() seems like the wrong condition. I think I need !cancelled() && !failure()?

That's assuming "cancelled" is something different from "skipped" (was that the right term, if the if condition does not apply, @MattSturgeon?)


As for the currently broken compare job - I think these conditions propagate, thus I need the same on compare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: policy discussion Discuss policies to work in and around Nixpkgs 8.has: port to stable This PR already has a backport to the stable release. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants