Skip to content

arrayUtilities: init#385960

Merged
ConnorBaker merged 2 commits intoNixOS:masterfrom
ConnorBaker:feat/arrayUtilities
Jun 12, 2025
Merged

arrayUtilities: init#385960
ConnorBaker merged 2 commits intoNixOS:masterfrom
ConnorBaker:feat/arrayUtilities

Conversation

@ConnorBaker
Copy link
Contributor

@ConnorBaker ConnorBaker commented Mar 1, 2025

There are patterns throughout the shell scripts in Nixpkgs (particularly those involving ELF files and CUDA) which can/should be componentized and made into a rigorous, tested library of functions. Given the push for __structuredAttrs, it makes sense to offer high(er)-level array utilities rather than rely on each individual's knowledge of Bash. Promoting code-reuse in this way avoids disparate re-implementations of common patterns which introducing bugs, reduces line count, and makes intent unambiguous.

This PR is the beginning of such re-usable library of Bash functions. Currently, it includes:

  • isDeclaredArray: a function which tests whether a nameref refers to a declared, indexed array
  • isDeclaredMap: a function which tests whether a nameref refers to a declared, associative array
  • sortArray: a function which lexicographically sorts the contents of an indexed array
  • getSortedMapKeys: a function which populates an indexed array with the keys of an associative array, sorted in lexicographic order

Functions in (and to be added to) this library are largely written in destination-passing style, where the final arguments are the outputs in which the result of calling the function will be stored. (This is enabled by heavy use of namerefs.) Constraints on whether arguments may alias each other are documented for each relevant function.

The bulk of this PR is dedicated to testing these bash functions (generated with cloc pkgs/build-support/setup-hooks/arrayUtilities --md):

Language files blank comment code
Nix 8 84 12 981
Bourne Again Shell 4 15 47 49
-------- -------- -------- -------- --------
SUM: 12 99 59 1030

The second commit of this PR shows where such functionality would have been useful in implementing testers.testEqualArrayOrMap.

isDeclaredArray and isDeclaredMap

These two functions underpin the library and allow testing the type of variable to which a nameref refers.

sortArray

This function performs a stable-sort of an indexed array passed by reference, storing the results to the second argument, which is also an indexed array passed by reference.

Two points of note in the implementation:

  1. When the number of directives in the format string does not match the number of arguments provided to printf, the format string is applied to each argument. In this way, we produce a NULL-delimited list of values for consumption by sort.
  2. Since the locale changes the behavior of sort, we specify it in a sub-shell to avoid polluting the enclosing environment.

getSortedMapKeys

This function accepts an associative array by reference and stores the lexicographically sorted keys in the second argument, an indexed array passed by reference. Since order of traversal of associative arrays in Bash is not guaranteed to be consistent and Nix pursues reproducibility, being able to iterate over the values of a map in a deterministic fashion is a boon -- especially for test scripts.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@ConnorBaker ConnorBaker self-assigned this Mar 1, 2025
@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 6.topic: testing Tooling for automated testing of packages and modules 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. labels Mar 1, 2025
@doronbehar
Copy link
Contributor

There's common functionality throughout Nixpkgs setup hooks which can/should be componentized and made into a rigorous, tested library of functions.

With the push for __structuredAttrs, it makes sense to offer high(er)-level array utilities.

Could you give please a somewhat simple and self contained example of existing Nixpkgs code that would gain from the changes scattered in these different PRs?

@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch from 40ec963 to 0a22f98 Compare March 10, 2025 19:44
@github-actions github-actions bot removed 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 6.topic: testing Tooling for automated testing of packages and modules labels Mar 10, 2025
@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch from 0a22f98 to 654106a Compare March 10, 2025 23:40
@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch 3 times, most recently from 22aa3b5 to ef724c3 Compare March 24, 2025 04:12
@github-actions github-actions bot added the 6.topic: testing Tooling for automated testing of packages and modules label Mar 24, 2025
@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch 2 times, most recently from a50d370 to 022aadb Compare March 25, 2025 06:06
@SomeoneSerge SomeoneSerge self-requested a review April 8, 2025 15:26
Copy link
Contributor

@ambroisie ambroisie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool :-).


I'm kind of failing to see the value of occursOnlyOrAfterInArray compared to a two-step check of:

  1. elem1 in array
  2. elem1 strictly occurs after elem2 (fails if elem2 is not present).

Same for occursOnlyOrBeforeInArray.

@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch from 022aadb to 952a4e3 Compare April 8, 2025 18:38
@ConnorBaker
Copy link
Contributor Author

Rebased on master and force-pushed -- thank you for the review @ambroisie!

I removed occursOnlyOrAfterInArray and occursOnlyOrBeforeInArray because they are trivial and I'm no longer using them in cuda-packages.

@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch from 952a4e3 to 9587f6c Compare April 8, 2025 21:52
@github-actions github-actions bot added 10.rebuild-darwin: 5001+ This PR causes many rebuilds on Darwin and must target the staging branches. 10.rebuild-darwin: 501+ This PR causes many rebuilds on Darwin and should normally target the staging branches. 10.rebuild-linux: 5001+ This PR causes many rebuilds on Linux and must target the staging branches. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. and removed 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. labels Apr 8, 2025
@github-actions github-actions bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. and removed 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. labels May 14, 2025
@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch from d2fc06d to 1f5adc8 Compare May 14, 2025 23:21
@ConnorBaker
Copy link
Contributor Author

I rebased on master, greatly reduced the number of functions included in this PR (to just those required for the rewrite of testers.testEqualArrayOrMap), updated the PR description, and included a breakdown of the line count (most lines belong to tests).

Copy link
Contributor

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reducing scope, that helps a lot!

Comment on lines +10 to +14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do a bunch of checks for "is array" in stdenv/generic/setup.sh. Even though we use case statements there, I think abstracting away this into two calls to isDeclaredArray / isDeclaredMap makes the code much more readable.

If we wanted to use it there, we'd need to make this part of stdenv, though - not another setup hook. Thus.. I suggest to do just that, for those two functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have more utility than just for stdenv, so I want them to remain separate.

That said, I'm not opposed to this implementation find its way into stdenv -- I just don't want them to be available solely through stdenv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I thought that whatever was available in stdenv was available everywhere? What exactly is the benefit of having them available "separately"? How would that look like? I might be missing something, but I only see a negative effect here - aka that they are not available in stdenv right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is creating scripts which run outside stdenv -- for example, the magma.passthru.testers.all script introduced here: #414612.

Is there an easy way to ensure setup.sh and all the goodies it has are sourced in such scripts, without polluting the shell with a bunch of other variables?

Copy link
Contributor

@wolfgangwalther wolfgangwalther Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds a bit like we'd maybe want a separation of something like stdenv and stdlib? (where stdenv can make use of stdlib!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see something like that and am interested in helping make that happen --
in the short term, I'd like to have these merged so I can bring in fixes for CUDA setup hooks which require more complicated Bash functionality (and I hate writing common functionality in multiple places).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a use-case for sortArray in stdenv now: getAllOutputNames returns a different order of outputs depending on whether __structuredAttrs is turned on or not. This causes things like #425323 (comment). If we were to sort these outputs by name, they would match between both cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that these functions are separate so I can use them outside of things providing setup.sh and have an easy way to test them. I'm not opposed to the ol' copy/paste or including the files in mkDerivation's nativeBuildInputs... what are your thoughts? Although maybe copy-pasting would be better so any changes to these functions don't cause a rebuild of stdenv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to the ol' copy/paste or including the files in mkDerivation's nativeBuildInputs... what are your thoughts?

I don't think we can (or should) use something from mkDerivation.nativeBuildInputs in setup.sh, aka in core stdenv.

I don't like copy&paste either.

I think this was still a good idea:

This sounds a bit like we'd maybe want a separation of something like stdenv and stdlib? (where stdenv can make use of stdlib!)

@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch from 1f5adc8 to d0b738b Compare June 9, 2025 17:44
@nix-owners nix-owners bot requested a review from infinisil June 9, 2025 17:46
@ConnorBaker
Copy link
Contributor Author

Addressed some comments, rebased, and force-pushed.

Generally, I:

  • Want to be explicit about variable scoping
  • Want functions to check their arguments
  • Want functions to be restricted to operating on inputs which were declared with declare or local
  • Want to keep using output-passing style

@ConnorBaker
Copy link
Contributor Author

Sorry for the delay in getting back to your comments, I'll try to address those today.

@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch from d0b738b to e402b65 Compare June 11, 2025 15:57
@ConnorBaker
Copy link
Contributor Author

I've rebased, squashed, and force-pushed.

  • sortArray now replaces the contents of the its output argument instead of appending
  • sortArray includes better documentation (with references!) for the behavior of printf and sort
  • getMapKeys now replaces the contents of its output argument instead of appending

Copy link
Contributor

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not go through the extensive test-suite in detail, but they are a great base and give me confidence that the implementation is solid. I did test whether some sorting tests would fail without sorting behavior - they did, cool.

LGTM in principle.

@ConnorBaker ConnorBaker force-pushed the feat/arrayUtilities branch from e402b65 to 4b80e59 Compare June 11, 2025 17:08
@ConnorBaker
Copy link
Contributor Author

Force-pushed changes:

  • getMapKeys is now getSortedMapKeys and includes a note about why it should be use (to avoid non-determinism)
  • fixed "argument" typos
  • more consistent use of "indexed array" and "associative array" in comments and strings, but keeping "array" and "map" in function names

@ConnorBaker ConnorBaker merged commit 40f1bb6 into NixOS:master Jun 12, 2025
18 of 20 checks passed
@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Jun 12, 2025

Successfully created backport PR for release-25.05:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: testing Tooling for automated testing of packages and modules 8.has: documentation This PR adds or changes documentation 8.has: port to stable This PR already has a backport to the stable release. 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants