Skip to content

Parallel GH actions workflow for Nixpkgs eval#356023

Merged
Mic92 merged 2 commits intoNixOS:masterfrom
tweag:gha-eval
Nov 20, 2024
Merged

Parallel GH actions workflow for Nixpkgs eval#356023
Mic92 merged 2 commits intoNixOS:masterfrom
tweag:gha-eval

Conversation

@infinisil
Copy link
Member

@infinisil infinisil commented Nov 14, 2024

This PR primarily introduces a new GitHub Action workflow to effectively do a full Hydra evaluation for every PR, paving the way towards relieving ofborg of its evaluation duty. This is motivated by https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025, see also #355847.

To make this as efficient as possible, 4 workflow jobs are spawned in parallel (one for each supported system, as originally experimented by @Mic92 in #352808), while each job makes full use of its 4 cores and 16GB of memory (based off amjoseph's idea in #269403 and initial work in #269356).

I've also implemented it in a way that limits maximum memory usage as Nixpkgs grows. CI will only take longer to finish, instead of OOMing. Currently it takes about 5 minutes, check out this test run.

Note that ofborg does more than just evaluate Nixpkgs, but this PR can be used as a base to also implement output path comparisons to assign labels based on changed paths, to request appropriate maintainers, and do evaluation time comparisons.

Ping @philiptaron @Mic92 @Erethon @balsoft @JohnRTitor @a-kenji

Things done


This work is funded by Tweag and Antithesis

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: policy discussion Discuss policies to work in and around Nixpkgs 6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions labels Nov 14, 2024
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Nov 15, 2024
Comment on lines 43 to 49
- name: Enable swap
if: env.mergedSha
run: |
sudo fallocate -l 10G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is unused, we can just drop it for now. Especially given that we have another swap already enabled that can act as a canary:

Suggested change
- name: Enable swap
if: env.mergedSha
run: |
sudo fallocate -l 10G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If allocating swap is not slowing down the gh action, I would recommend keeping it as a backup. You never know what happens.

Copy link
Member Author

@infinisil infinisil Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it such that we have a constant chunk size, so that we effectively have an upper limit to memory usage. Then we definitely don't need swap :)

Copy link
Member

@Mic92 Mic92 Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohnRTitor there is already a swap file. So we would see if it is starting to swap before it is a problem and we can add this back as needed.

@Mic92
Copy link
Member

Mic92 commented Nov 16, 2024

I added several improvements to this script: tweag#100 notably more useful json output.

@Mic92
Copy link
Member

Mic92 commented Nov 16, 2024

I like this approach already. I want to incorporate this code in one form or the other in nixpkgs-review as it will make evaluating on normal laptops possible again.

@JohnRTitor
Copy link
Member

Please do! Currently evaluating uses too much memory and time!

@Mic92
Copy link
Member

Mic92 commented Nov 16, 2024

@GaetanLepage wants to look into integrating it into nixpkgs-review.

@Mic92
Copy link
Member

Mic92 commented Nov 17, 2024

@infinisil I would suggest to just address all comments for this pull request for now and maybe merge my pull request but not to extend the scope for the first pull request too much.
It would be good to first test to see how this action scales to the whole org before we proceed with more features.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025/37

@JohnRTitor
Copy link
Member

JohnRTitor commented Nov 17, 2024

It would be good to first test to see how this action scales to the whole org before we proceed with more features.

Agreed, due to slowness of Ofborg recently a few commiters are merging PRs without waiting for a successful eval which can take up to 13-15 hours. (I am guilty of the act myself)

This fast but dangerous approach of review makes nixpkgs prone to a tree with broken eval.

Having this basic CI check would help to detect issues beforehand, as we plan to phase out Ofborg in less than two months.

@github-actions github-actions bot added 6.topic: haskell General-purpose, statically typed, purely functional programming language 6.topic: lib The Nixpkgs function library labels Nov 18, 2024
@infinisil infinisil force-pushed the gha-eval branch 8 times, most recently from 82e4957 to 1348471 Compare November 18, 2024 06:01
@infinisil
Copy link
Member Author

infinisil commented Nov 20, 2024

This is done now imo! I added some docs and some comments, I also updated the PR description. Pinging @Mic92 @Erethon @balsoft @JohnRTitor @a-kenji @azuwis @TheGiraffe3 for review and approval

Motivated by ofborg struggling [1] and its evaluations taking too long,
inspired by Jörg's initial PR [2]
and Adam's previous attempt to parallelise Nixpkgs evaluation [3],
this PR contains initial work to relief ofborg from its evaluation duty
by using GitHub Actions to evaluate Nixpkgs.

For now this doesn't take care of all of what ofborg does, such as
requesting appropriate reviewers or labeling mass rebuilds, but this can
be follow-up work.

[1]: https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025?u=infinisil
[2]: NixOS#352808
[3]: NixOS#269403

Co-Authored-By: Jörg Thalheim <[email protected]>
Co-Authored-By: Adam Joseph <[email protected]>
@github-actions
Copy link
Contributor

Successfully created backport PR for release-24.11:

@infinisil
Copy link
Member Author

Follow-up fix: #357965

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/equinix-metal-cta-infra-short-update/56538/1

@paparodeo
Copy link
Contributor

I'm seeing ofborg eval failing on staging https://gist.github.com/GrahamcOfBorg/f50183ae86ce9f339b6b41cc33e14619 but the parallel evals are passing: #359106

@FliegendeWurst
Copy link
Member

staging will be fixed very soon. In this particular case the GH actions ran first, then staging broke, and finally ofborg got around to the PR.

@infinisil
Copy link
Member Author

WIP follow-up towards adding rebuild labels: #359704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: haskell General-purpose, statically typed, purely functional programming language 6.topic: lib The Nixpkgs function library 6.topic: policy discussion Discuss policies to work in and around Nixpkgs 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants