workflows/eval: compare eval consistency and performance between Lix / Nix versions by wolfgangwalther · Pull Request #427724 · NixOS/nixpkgs

wolfgangwalther · 2025-07-23T11:30:04Z

I extended the approach I used here to compare performance between Nix versions to run automatically in CI and also confirm Eval consistency as described in #428076 (comment).

~~The first two commits are from #432286 right now, to be discussed and merged there.~~ The last commit is only temporary, to be dropped before merge: It triggers the full eval for all versions, to be able to review the results in this PR.

The results are available in the PR summary page.

Original PR body:

With the announcement of Nix 2.30 having considerably lower memory consumption during Eval, I wanted to see whether this would give us any performance improvement for Eval in CI. The idea would be that lower memory consumption could allow bigger chunkSizes, which would speed up Eval.

Another observation I made recently in #427434, where Nix 2.28 / Nix 2.29 took 9x longer to build the tarball than Lix or Nix 2.30, made me curious whether there were any differences between the different versions / implementations as well.

Things done

I ran a matrix job of 70 eval jobs in pull_request context here. I only evaluated for --evalSystem x86_64-linux, because that's the longest of the 4 platform jobs. I ran with different chunk sizes and different versions of nix. Results, see below.

The reported numbers are the times the eval step takes. The jobs don't run to completion, but all fail, because they create conflicting artifacts.

Conclusion

Nix 2.30 is faster - we need to use it for Eval!

cc @NixOS/nixpkgs-ci

xokdvium · 2025-07-24T17:45:49Z

re: memory usage?

I'm a bit confused. I might be wrong, but it doesn't seem that these versions are actually used? At least judging from the logs for lixPackageSets.lix_2_93.lix (https://github.com/NixOS/nixpkgs/actions/runs/16469453994/job/46554806261?pr=427724):

copying path '/nix/store/4gdwf5kxk9rlbdlk6b2nlh1frsniz5wf-nix-flake-c-2.29.1-dev' from 'https://cache.nixos.org/'...
copying path '/nix/store/q6l3w2xp5ayxab4imghhwb6xkx7yllkq-nix-2.29.1-dev' from 'https://cache.nixos.org/'...
building '/nix/store/v2yizp2h25a7dbh5nf19ynb6f2s721c2-attrpaths-superset.json.drv'...

I'm pretty sure this is just measuring noise of nix_2_29 and the --argstr nix is ignored? Am I missing something?

wolfgangwalther · 2025-07-24T17:49:27Z

I'm pretty sure this is just measuring noise of nix_2_29 and the --argstr nix is ignored? Am I missing something?

That would be a very good explanation. I did confirm that the attributes I had set up were working locally, it was downloading different lix versions from cache. Maybe I made a mistake when hooking things up in the workflow file.

In this case, we'd now have a good idea of how noisy this is :D

.github/workflows/eval.yml

wolfgangwalther · 2025-07-24T18:12:00Z

Eval	Lix 2.91	Lix 2.92	Lix 2.93	Nix 2.24	Nix 2.28	Nix 2.29	Nix 2.30
6000	4:32	3:31	3:40	3:58	3:58	3:58	2:58
8000	3:32	3:26	3:37	4:11	3:59	4:00	2:59
10000	4:26	3:56	5:10	4:06	3:44	4:06	3:14
12000	9:20	10+	10+	4:01	3:48	4:04	2:57
14000	10+	10+	10+	8:26	6:26	8:43	3:05
16000	10+	10+	10+	10+	5:35	5:45	3:01

That looks much better. Nix 2.30 seems to be able to take fewer memory indeed and allow higher chunkSize. There doesn't seem to be an improvement in performance with higher chunk sizes, though. Although, I'd probably need to run with even higher chunkSizes for Nix 2.30, too, to see how that goes.

But, there is an improvement with Nix 2.30, so that's pretty cool.

Also, if we want to run with Lix as well, then we need to lower the chunk size a bit, because it appears it started swapping with 10k once already. Lowering chunk size would not have a negative effect for Nix 2.30, though, so no problem.

Memory consumption from the logs can't be compared well, because in the case of Nix 2.30 the later chunks start much quicker, thus it's not comparable. The data clearly shows improvement with higher chunkSizes, though.

lf- · 2025-07-25T02:46:41Z

It appears that what this is measuring is the GC falling over if you eval too much stuff in one process, right? So what you're actually measuring here is memory pressure (and boehmgc sucking), probably. Nevertheless this is deeply interesting data, thank you for measuring it!

wolfgangwalther · 2025-08-09T15:36:40Z

I updated the PR to run these performance comparison whenever we update ci/pinned.json. Thus we can continually monitor the performance of the different implementations. The results of these Eval runs are also used to check for Eval consistency as described in #428076 (comment).

The pull_request workflow currently doesn't run, yet - and I have no idea why. It fails with an obscure error: https://github.com/NixOS/nixpkgs/actions/runs/16850915977. I had seen this in a different PR already, but wasn't able to fix it, yet. All of this works fine in my fork.

wolfgangwalther · 2025-08-09T19:23:15Z

The pull_request workflow currently doesn't run, yet - and I have no idea why. It fails with an obscure error: https://github.com/NixOS/nixpkgs/actions/runs/16850915977. I had seen this in a different PR already, but wasn't able to fix it, yet. All of this works fine in my fork.

A rebase after merging #432286 helped: The workflow is running now. The results will be visible here: https://github.com/NixOS/nixpkgs/actions/runs/16852783378?pr=427724.

Copied the resulting table here:

Lix/Nix version comparison

Version	aarch64-linux	aarch64-darwin	x86_64-linux	x86_64-darwin
lixPackageSets.git.lix	⚠️	⚠️	⚠️	⚠️
lixPackageSets.lix_2_91.lix	112	78	133	77
lixPackageSets.lix_2_92.lix	108	75	120	78
lixPackageSets.lix_2_93.lix	111	81	127	85
nixVersions.git	103	70	104	70
nixVersions.nix_2_24	145	116	149	113
nixVersions.nix_2_28	135	110	150	116
nixVersions.nix_2_29	147	109	151	114
nixVersions.nix_2_30	97	71	107	71

Evaluation time in seconds without downloading dependencies.

⚠️ Job did not report a result.

❌ Job produced different outpaths than the target branch.

lixPackageSets.git.lix is currently still broken. It was fixed in #424775, but that's not available in the channel / pin, yet.

The reported times are lower than these in #427724 (comment), because we're not including the download of dependencies. We're also not including the attrpath generation step, but only the outpath step (this could be improved on in the future).

wolfgangwalther · 2025-08-11T07:31:40Z

lixPackageSets.git.lix is currently still broken. It was fixed in #424775, but that's not available in the channel / pin, yet.

This is now fixed with the latest bump of pinned.json.

This is ready to go from my side.

MattSturgeon

SGTM

.github/workflows/eval.yml

The Dependabot update change the hashes to the latest main branch commit instead of the v5.0.0 tag - also it didn't adjust the tags in the comments accordingly. Last but not least, one of the references used a `@v5` reference instead of the commit hash. The latter is probably what Dependabot tripped on.

Move "Packages" up, because it's much shorter and easier to scroll past. This way both Packages and Performance are visible immediately.

With this change, we start running Eval on all available Lix and Nix versions. Because this requires a lot of resources, this complete test is only run when `ci/pinned.json` is updated. The resulting outpaths are checked for consistency with the target branch. A difference will cause the `report` job to fail, thus blocking the merge, ensuring Eval consistency for Nixpkgs across different versions. This implements a kind of "ratchet style" check: Since we originally confirmed that the versions currently in Nixpkgs at the time of this commit match Eval behavior of Nix 2.3, we can ensure consistency with Nix 2.3 down the road, even without testing for it explicitly. There had been one regression in Eval consistency for Nix between 2.18 and 2.24 - two tests in `tests.devShellTools` produce different results between Lix 2.91+ (which was forked from Nix 2.18) and Nix 2.24+. I assume it's unlikely that such a change would be "fixed" by now, thus I added an exception for these. As a bonus, we also present the total time in seconds it takes for Eval to complete for every tested version in a summary table. This allows us to easily see performance improvements for Eval due to version updates. At this stage, this time only includes the "outpaths" step of Eval, but not the generation of attrpaths beforehand.

This gives us a fixed `lixPackageSets.git`. From the nixpkgs-unstable channel: https://hydra.nixos.org/build/304569381#tabs-buildinputs Changes for treefmt-nix: numtide/treefmt-nix@58bd4da...7d81f6f

wolfgangwalther · 2025-08-12T08:16:41Z

Just rebased and used the latest commit hashes for the newly used actions after the recent Dependabot updates.

Also added commit to fix the actions/download-artifact hashes, where Dependabot likely tripped over a non-commit-hash reference and updated to the latest main commit instead of the v5.0.0 tag. The difference was only some READMEs, so I changed all of them to the v5.0.0 tag explicitly now.

nixpkgs-ci · 2025-08-12T08:22:57Z

Successfully created backport PR for release-25.05:

[Backport release-25.05] workflows/eval: compare eval consistency and performance between Lix / Nix versions #433033

wolfgangwalther · 2025-08-12T08:47:08Z

Seems like the Eval / compare job for regular PRs is ~~cancelled~~ skipped after merging this. This causes the "request reviewers" job to timeout.

wolfgangwalther · 2025-08-12T08:54:53Z

.github/workflows/eval.yml

  eval:
    runs-on: ubuntu-24.04-arm
+    needs: versions
+    if: ${{ !cancelled() }}


Not sure what I did here. It seems to actually work, though, which is confusing.

I want the eval job to run when:

The versions job succeeds

The versions job is not run at all

I don't want eval to run when:

The versions job runs, but fails

!cancelled() seems like the wrong condition. I think I need !cancelled() && !failure()?

That's assuming "cancelled" is something different from "skipped" (was that the right term, if the if condition does not apply, @MattSturgeon?)

As for the currently broken compare job - I think these conditions propagate, thus I need the same on compare.

wolfgangwalther force-pushed the ci-eval-perf branch from ebc5116 to cd5caf6 Compare July 23, 2025 11:32

This comment was marked as outdated.

Sign in to view

wolfgangwalther mentioned this pull request Jul 24, 2025

nix_2_3: drop and raise minver to 2.18 #428076

Merged

2 tasks

This comment was marked as resolved.

Sign in to view

wolfgangwalther commented Jul 24, 2025

View reviewed changes

.github/workflows/eval.yml Outdated Show resolved Hide resolved

wolfgangwalther force-pushed the ci-eval-perf branch from cd5caf6 to 446cece Compare July 24, 2025 17:52

This was referenced Aug 4, 2025

Costs for GitHub Actions NixOS/org#147

Closed

workflows/eval: disable swap #431459

Merged

nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Aug 7, 2025

wolfgangwalther force-pushed the ci-eval-perf branch from 446cece to 64476c7 Compare August 9, 2025 15:28

wolfgangwalther changed the title ~~ci/eval: test performance (WIP)~~ workflows/eval: compare eval consistency and performance between Lix / Nix versions Aug 9, 2025

nixpkgs-ci bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Aug 9, 2025

wolfgangwalther force-pushed the ci-eval-perf branch from 64476c7 to 1c18426 Compare August 9, 2025 19:10

wolfgangwalther marked this pull request as ready for review August 9, 2025 19:23

nix-owners bot requested review from MattSturgeon, Mic92, philiptaron and zowoq August 9, 2025 19:25

wolfgangwalther force-pushed the ci-eval-perf branch from 1c18426 to 840bea3 Compare August 11, 2025 07:30

MattSturgeon reviewed Aug 11, 2025

View reviewed changes

.github/workflows/eval.yml Outdated Show resolved Hide resolved

.github/workflows/eval.yml Outdated Show resolved Hide resolved

.github/workflows/eval.yml Outdated Show resolved Hide resolved

.github/workflows/eval.yml Outdated Show resolved Hide resolved

wolfgangwalther force-pushed the ci-eval-perf branch 2 times, most recently from 5b877fe to a3533c7 Compare August 11, 2025 16:34

MattSturgeon approved these changes Aug 11, 2025

View reviewed changes

nixpkgs-ci bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Aug 11, 2025

wolfgangwalther added 4 commits August 12, 2025 10:13

ci/eval/compare: reorder step summary

f05895f

Move "Packages" up, because it's much shorter and easier to scroll past. This way both Packages and Performance are visible immediately.

ci/pinned: update

14a6d9d

This gives us a fixed `lixPackageSets.git`. From the nixpkgs-unstable channel: https://hydra.nixos.org/build/304569381#tabs-buildinputs Changes for treefmt-nix: numtide/treefmt-nix@58bd4da...7d81f6f

wolfgangwalther force-pushed the ci-eval-perf branch from a3533c7 to 14a6d9d Compare August 12, 2025 08:14

wolfgangwalther enabled auto-merge August 12, 2025 08:16

wolfgangwalther merged commit 161a4f0 into NixOS:master Aug 12, 2025
76 of 78 checks passed

wolfgangwalther deleted the ci-eval-perf branch August 12, 2025 08:22

nixpkgs-ci bot mentioned this pull request Aug 12, 2025

[Backport release-25.05] workflows/eval: compare eval consistency and performance between Lix / Nix versions #433033

Merged

1 task

github-actions bot added the 8.has: port to stable This PR already has a backport to the stable release. label Aug 12, 2025

wolfgangwalther commented Aug 12, 2025

View reviewed changes

wolfgangwalther mentioned this pull request Aug 21, 2025

actions/checkout: parallelize checkout of multiple commits on tmpfs #435526

Merged

2 tasks

MattSturgeon mentioned this pull request Jan 14, 2026

ci/pinned:update #480141

Merged

13 tasks

Uh oh!

Conversation

wolfgangwalther commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Things done

Conclusion

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as resolved.

xokdvium commented Jul 24, 2025

Uh oh!

wolfgangwalther commented Jul 24, 2025

Uh oh!

Uh oh!

wolfgangwalther commented Jul 24, 2025

Uh oh!

lf- commented Jul 25, 2025

Uh oh!

wolfgangwalther commented Aug 9, 2025

Uh oh!

wolfgangwalther commented Aug 9, 2025

Lix/Nix version comparison

Uh oh!

wolfgangwalther commented Aug 11, 2025

Uh oh!

MattSturgeon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wolfgangwalther commented Aug 12, 2025

Uh oh!

Uh oh!

nixpkgs-ci bot commented Aug 12, 2025

Uh oh!

wolfgangwalther commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfgangwalther Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wolfgangwalther commented Jul 23, 2025 •

edited

Loading

wolfgangwalther commented Aug 12, 2025 •

edited

Loading

wolfgangwalther Aug 12, 2025 •

edited

Loading