workflows/eval: compare eval consistency and performance between Lix / Nix versions#427724
Conversation
ebc5116 to
cd5caf6
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
I'm a bit confused. I might be wrong, but it doesn't seem that these versions are actually used? At least judging from the logs for I'm pretty sure this is just measuring noise of nix_2_29 and the |
That would be a very good explanation. I did confirm that the attributes I had set up were working locally, it was downloading different lix versions from cache. Maybe I made a mistake when hooking things up in the workflow file. In this case, we'd now have a good idea of how noisy this is :D |
cd5caf6 to
446cece
Compare
That looks much better. Nix 2.30 seems to be able to take fewer memory indeed and allow higher But, there is an improvement with Nix 2.30, so that's pretty cool. Also, if we want to run with Lix as well, then we need to lower the chunk size a bit, because it appears it started swapping with 10k once already. Lowering chunk size would not have a negative effect for Nix 2.30, though, so no problem. Memory consumption from the logs can't be compared well, because in the case of Nix 2.30 the later chunks start much quicker, thus it's not comparable. The data clearly shows improvement with higher chunkSizes, though. |
|
It appears that what this is measuring is the GC falling over if you eval too much stuff in one process, right? So what you're actually measuring here is memory pressure (and boehmgc sucking), probably. Nevertheless this is deeply interesting data, thank you for measuring it! |
446cece to
64476c7
Compare
|
I updated the PR to run these performance comparison whenever we update The |
64476c7 to
1c18426
Compare
A rebase after merging #432286 helped: The workflow is running now. The results will be visible here: https://github.com/NixOS/nixpkgs/actions/runs/16852783378?pr=427724. Copied the resulting table here: Lix/Nix version comparison
Evaluation time in seconds without downloading dependencies. ❌ Job produced different outpaths than the target branch.
The reported times are lower than these in #427724 (comment), because we're not including the download of dependencies. We're also not including the attrpath generation step, but only the outpath step (this could be improved on in the future). |
1c18426 to
840bea3
Compare
This is now fixed with the latest bump of pinned.json. This is ready to go from my side. |
5b877fe to
a3533c7
Compare
The Dependabot update change the hashes to the latest main branch commit instead of the v5.0.0 tag - also it didn't adjust the tags in the comments accordingly. Last but not least, one of the references used a `@v5` reference instead of the commit hash. The latter is probably what Dependabot tripped on.
Move "Packages" up, because it's much shorter and easier to scroll past. This way both Packages and Performance are visible immediately.
With this change, we start running Eval on all available Lix and Nix versions. Because this requires a lot of resources, this complete test is only run when `ci/pinned.json` is updated. The resulting outpaths are checked for consistency with the target branch. A difference will cause the `report` job to fail, thus blocking the merge, ensuring Eval consistency for Nixpkgs across different versions. This implements a kind of "ratchet style" check: Since we originally confirmed that the versions currently in Nixpkgs at the time of this commit match Eval behavior of Nix 2.3, we can ensure consistency with Nix 2.3 down the road, even without testing for it explicitly. There had been one regression in Eval consistency for Nix between 2.18 and 2.24 - two tests in `tests.devShellTools` produce different results between Lix 2.91+ (which was forked from Nix 2.18) and Nix 2.24+. I assume it's unlikely that such a change would be "fixed" by now, thus I added an exception for these. As a bonus, we also present the total time in seconds it takes for Eval to complete for every tested version in a summary table. This allows us to easily see performance improvements for Eval due to version updates. At this stage, this time only includes the "outpaths" step of Eval, but not the generation of attrpaths beforehand.
This gives us a fixed `lixPackageSets.git`. From the nixpkgs-unstable channel: https://hydra.nixos.org/build/304569381#tabs-buildinputs Changes for treefmt-nix: numtide/treefmt-nix@58bd4da...7d81f6f
a3533c7 to
14a6d9d
Compare
|
Just rebased and used the latest commit hashes for the newly used actions after the recent Dependabot updates. Also added commit to fix the |
|
Successfully created backport PR for |
|
Seems like the |
| eval: | ||
| runs-on: ubuntu-24.04-arm | ||
| needs: versions | ||
| if: ${{ !cancelled() }} |
There was a problem hiding this comment.
Not sure what I did here. It seems to actually work, though, which is confusing.
I want the eval job to run when:
- The
versionsjob succeeds - The
versionsjob is not run at all
I don't want eval to run when:
- The
versionsjob runs, but fails
!cancelled() seems like the wrong condition. I think I need !cancelled() && !failure()?
That's assuming "cancelled" is something different from "skipped" (was that the right term, if the if condition does not apply, @MattSturgeon?)
As for the currently broken compare job - I think these conditions propagate, thus I need the same on compare.
I extended the approach I used here to compare performance between Nix versions to run automatically in CI and also confirm Eval consistency as described in #428076 (comment).
The first two commits are from #432286 right now, to be discussed and merged there.The last commit is only temporary, to be dropped before merge: It triggers the full eval for all versions, to be able to review the results in this PR.The results are available in the PR summary page.
Original PR body:
With the announcement of Nix 2.30 having considerably lower memory consumption during Eval, I wanted to see whether this would give us any performance improvement for Eval in CI. The idea would be that lower memory consumption could allow bigger chunkSizes, which would speed up Eval.
Another observation I made recently in #427434, where Nix 2.28 / Nix 2.29 took 9x longer to build the tarball than Lix or Nix 2.30, made me curious whether there were any differences between the different versions / implementations as well.
Things done
I ran a matrix job of 70 eval jobs in
pull_requestcontext here. I only evaluated for--evalSystem x86_64-linux, because that's the longest of the 4 platform jobs. I ran with different chunk sizes and different versions of nix. Results, see below.The reported numbers are the times the eval step takes. The jobs don't run to completion, but all fail, because they create conflicting artifacts.
Conclusion
Nix 2.30 is faster - we need to use it for Eval!
cc @NixOS/nixpkgs-ci