actions/checkout: parallelize checkout of multiple commits on tmpfs#435526
actions/checkout: parallelize checkout of multiple commits on tmpfs#435526wolfgangwalther merged 1 commit intoNixOS:masterfrom
Conversation
be7cce8 to
750565b
Compare
750565b to
e6ebf64
Compare
e6ebf64 to
442e4fe
Compare
|
Just removed a left-over comment. |
|
This seems like a good change (I'll review shortly). Just to have the answers on record, I'll ask these skeptical questions: How much memory remains available when we have 3 worktrees checked out on a tmpfs volume? I assume this varies per runner, as they each have different memory capacities? In the past, some jobs were running out of memory and using swap. Is this a non-issue now? Would these changes make that more likely to happen again? |
Yeah, this is a valid concern. When considering it, we also need to take #435535 into account, which changes it to 4 checkouts of Nixpkgs on RAM for the Eval workflow. One checkout of Nixpkgs is 300 MB for me locally. So, we'd be at 1.2G. Most runners have 16G, so that's not a problem there for almost all workflows. The MacOS runners have only 7G - but we only have a single job there. See https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories. The Eval workflow is generally the only workflow that would be a problem. In that later PR, I didn't see any problem with Nix 2.30 - in fact, it even became faster. No swapping observed in https://github.com/NixOS/nixpkgs/actions/runs/17124280377/job/48572236799?pr=435535. In fact, even with the 4 checkouts, the available memory was the lowest at 2374 MiB. Now, that might be different when we update |
Here's a run of that: Lix/Nix version comparison
Evaluation time in seconds without downloading dependencies. ❌ Job produced different outpaths than the target branch. No problem at all, Eval is not slower. It's not faster either. Comparison here: #427724 (comment) Since this measures the time the eval inside the sandbox takes, I assume the observation of Eval being potentially faster on tmpfs mostly happens outside the sandbox, aka when Nix goes through the whole repo to calculate the outpath for the eval result, before it checks cachix for it. I did observe swapping for this run, but mostly around 1,3 GB (for the memory worst-case scenario, aka older Lix versions). That matches the 4 checkouts nicely - I think what happens is, that the tmpfs is actually pushed to swap first, which is entirely fine. So nix still has all the memory to do Eval. |
Instead of fetching up to 3 times on each new checkout, we now fetch all the commits we're going to need at once. Afterwards, we checkout the different worktrees in parallel, which doesn't give us much, yet, because it would still be IO-bound on its own. Inconsistent IO performance on disk is also the biggest limitation for checkout right now, where checkout times range everywhere from 20s to 40s. By checking out the worktrees on a tmpfs, the actual checkout only takes 1s and benefits from parallelization. The overall checkout time is now 8-11s, depending on the number of commits. That's a reduction of 10-30s and we get this speedup for almost every job in the PR workflow, which is huge. This potentially has a nice side-effect for Eval, too: Because the repo is in RAM, Eval seems to run slightly faster, up to 10 seconds less.
442e4fe to
4b4aa62
Compare
|
Successfully created backport PR for |
| case 'macOS': | ||
| await run('sudo', 'mount_tmpfs', path) | ||
| // macOS creates this hidden folder to log file system activity. | ||
| // This trips up git when adding a worktree below, because the target folder is not empty. | ||
| await run('sudo', 'rm', '-rf', join(path, '.fseventsd')) | ||
| break |
There was a problem hiding this comment.
This part doesn't work too well for MacOS, yet.
https://github.com/NixOS/nixpkgs/actions/runs/17147656253/job/48646939225
https://github.com/NixOS/nixpkgs/actions/runs/17146824086/job/48644655746
https://github.com/NixOS/nixpkgs/actions/runs/17146214373/job/48643006331
All the same error:
fatal: 'untrusted' already exists
(alternatively "pinned")
The problem here is that either darwin recreates the .fseventsd folder or creates other files/folders in there - which makes git unhappy.
A better fix might be to mount a single tmpfs and create subdirectories on it for the checkouts.
There was a problem hiding this comment.
A better fix might be to mount a single tmpfs and create subdirectories on it for the checkouts.
Agreed
|
Here's a selection of things people have done in GitHub actions to unload various daemons on macOS: |
Instead of fetching up to 3 times on each new checkout, we now fetch all the commits we're going to need at once. Afterwards, we checkout the different worktrees in parallel, which doesn't give us much, yet, because it would still be IO-bound on its own. Inconsistent IO performance on disk is also the biggest limitation for checkout right now, where checkout times range everywhere from 20s to 40s.
By checking out the worktrees on a tmpfs, the actual checkout only takes 1s and benefits from parallelization. The overall checkout time is now 8-11s, depending on the number of commits.
That's a reduction of 10-30s and we get this speedup for almost every job in the PR workflow, which is huge.
This potentially has a nice side-effect for Eval, too: Because the repo is in RAM, Eval seems to run slightly faster, up to 10 seconds less.
Things done
Add a 👍 reaction to pull requests you find important.