Skip to content

Some haskellPackages on x86_64-darwin have a broken package db due to store corruption on a Hydra builder #356741

@gador

Description

@gador

Describe the bug

I looked at my zh.fail log and noticed a bunch of x86_64-darwin failures. Most of them were due to git-annex failing to build. It gave a cryptic error mesage, such as:

Running phase: setupCompilerEnvironmentPhase
Build with /nix/store/ajpi005d0kdk5mfn457wl742zy2kp6k4-ghc-9.6.6.
cp: missing destination file operand after '/private/tmp/nix-build-git-annex-10.20240927.drv-0/tmp.RkFiStSe6C/setup-package.conf.d/'
Try 'cp --help' for more information.

(https://hydra.nixos.org/build/278372545/nixlog/1)

I looked into it and added a NIX_DEBUG=7 flag to the generic-builder for ghc and saw the following:

git-annex-x86_64-darwin> ++ for p in "${pkgsBuildBuild[@]}" "${pkgsBuildHost[@]}" "${pkgsBuildTarget[@]}"
git-annex-x86_64-darwin> ++ '[' -d /nix/store/pxs3wmnps7z53w8ifrfys86gbdgvgkr3-hashable-1.4.4.0/lib/ghc-9.6.6/lib/package.conf.d ']'
git-annex-x86_64-darwin> ++ '[' /nix/store/pxs3wmnps7z53w8ifrfys86gbdgvgkr3-hashable-1.4.4.0 '!=' /nix/store/ajpi005d0kdk5mfn457wl742zy2kp6k4-ghc-9.6.6 ']'
git-annex-x86_64-darwin> ++ '[' /nix/store/pxs3wmnps7z53w8ifrfys86gbdgvgkr3-hashable-1.4.4.0 '!=' /nix/store/ajpi005d0kdk5mfn457wl742zy2kp6k4-ghc-9.6.6 ']'
git-annex-x86_64-darwin> ++ cp -f /nix/store/pxs3wmnps7z53w8ifrfys86gbdgvgkr3-hashable-1.4.4.0/lib/ghc-9.6.6/lib/package.conf.d/hashable-1.4.4.0-1cUwT1YA5NKDNzV11fLCsN.conf /private/tmp/nix-build-git-annex-10.20240927.drv-1/tmp.NrYRS2nTiV/setup-package.conf.d/
git-annex-x86_64-darwin> ++ continue
git-annex-x86_64-darwin> ++ for p in "${pkgsBuildBuild[@]}" "${pkgsBuildHost[@]}" "${pkgsBuildTarget[@]}"
git-annex-x86_64-darwin> ++ '[' -d /nix/store/lgdfy41s7v8zw3fgvxh2zvwcmjcx6fin-os-string-2.0.6/lib/ghc-9.6.6/lib/package.conf.d ']'
git-annex-x86_64-darwin> ++ '[' /nix/store/lgdfy41s7v8zw3fgvxh2zvwcmjcx6fin-os-string-2.0.6 '!=' /nix/store/ajpi005d0kdk5mfn457wl742zy2kp6k4-ghc-9.6.6 ']'
git-annex-x86_64-darwin> ++ '[' /nix/store/lgdfy41s7v8zw3fgvxh2zvwcmjcx6fin-os-string-2.0.6 '!=' /nix/store/ajpi005d0kdk5mfn457wl742zy2kp6k4-ghc-9.6.6 ']'
git-annex-x86_64-darwin> ++ cp -f /nix/store/lgdfy41s7v8zw3fgvxh2zvwcmjcx6fin-os-string-2.0.6/lib/ghc-9.6.6/lib/package.conf.d/os-string-2.0.6-KoVBRYToiZNKBGfpQU5BBD.conf /private/tmp/nix-build-git-annex-10.20240927.drv-1/tmp.NrYRS2nTiV/setup-package.conf.d/
git-annex-x86_64-darwin> ++ continue
git-annex-x86_64-darwin> ++ for p in "${pkgsBuildBuild[@]}" "${pkgsBuildHost[@]}" "${pkgsBuildTarget[@]}"
git-annex-x86_64-darwin> ++ '[' -d /nix/store/2l34mrv2p55rgh2x42m7vr4s65mnl1wd-filepath-bytestring-1.4.100.3.2/lib/ghc-9.6.6/lib/package.conf.d ']'
git-annex-x86_64-darwin> ++ '[' /nix/store/2l34mrv2p55rgh2x42m7vr4s65mnl1wd-filepath-bytestring-1.4.100.3.2 '!=' /nix/store/ajpi005d0kdk5mfn457wl742zy2kp6k4-ghc-9.6.6 ']'
git-annex-x86_64-darwin> ++ '[' /nix/store/2l34mrv2p55rgh2x42m7vr4s65mnl1wd-filepath-bytestring-1.4.100.3.2 '!=' /nix/store/ajpi005d0kdk5mfn457wl742zy2kp6k4-ghc-9.6.6 ']'
git-annex-x86_64-darwin> ++ cp -f /private/tmp/nix-build-git-annex-10.20240927.drv-1/tmp.NrYRS2nTiV/setup-package.conf.d/
git-annex-x86_64-darwin> cp: missing destination file operand after '/private/tmp/nix-build-git-annex-10.20240927.drv-1/tmp.NrYRS2nTiV/setup-package.conf.d/'
git-annex-x86_64-darwin> Try 'cp --help' for more information.

I looked into the specified folder and found that /nix/store/2l34mrv2p55rgh2x42m7vr4s65mnl1wd-filepath-bytestring-1.4.100.3.2/lib/ghc-9.6.6/lib/package.conf.d contains a file named .conf.

This causes cp to fail, as it will not copy the hidden file and therefore only sees one argument and fails. See

if [ -d "$p/${mkGhcLibdir thisGhc}/package.conf.d" ] && [ "$p" != "${ghc}" ] && [ "$p" != "${nativeGhc}" ]; then
cp -f "$p/${mkGhcLibdir thisGhc}/package.conf.d/"*.conf ${packageConfDir}/

Digging deeper into the rabbit hole, I found the relevant code which generates these files

runHook preInstall
${if !isLibrary && buildTarget == "" then "${setupCommand} install"
# ^^ if the project is not a library, and no build target is specified, we can just use "install".
else if !isLibrary then "${setupCommand} copy ${buildTarget}"
# ^^ if the project is not a library, and we have a build target, then use "copy" to install
# just the target specified; "install" will error here, since not all targets have been built.
else ''
${setupCommand} copy ${buildTarget}
local packageConfDir="$out/${ghcLibdir}/package.conf.d"
local packageConfFile="$packageConfDir/${pname}-${version}.conf"
mkdir -p "$packageConfDir"
${setupCommand} register --gen-pkg-config=$packageConfFile
if [ -d "$packageConfFile" ]; then
mv "$packageConfFile/"* "$packageConfDir"
rmdir "$packageConfFile"
fi
for packageConfFile in "$packageConfDir/"*; do
local pkgId=$(gawk -f ${unprettyConf} "$packageConfFile" \
| grep '^id:' | cut -d' ' -f2)
mv "$packageConfFile" "$packageConfDir/$pkgId.conf"
done
# delete confdir if there are no libraries
find $packageConfDir -maxdepth 0 -empty -delete;
''}

They should generate the correcly named file, but doesn't for some reason.

Looking through the (successful!) build of the first failing package git-annex wants, I see

/nix/store/czsf60xzd6xjc14zx51y7wh5b76shx03-stdenv-darwin/setup: line 1736: grep: command not found

in the build log (https://hydra.nixos.org/build/277962042/nixlog/1)

This seems to cause the file .conf to be generated, which in turn fails the package which tries to import this file.

On hydra (and our cache) are successfully built packages with broken /lib/ghc-9.6.6/lib/package.conf.d/ files.
Due to these broken builds, others fail (e.g. git-annex)
If I build it locally without substituers it will build correctly and actually generate the correct file and the hash differs!

Note, I build on aarch64-darwin

Steps To Reproduce

Steps to reproduce the behavior:

  1. go to nixpkgs on master (0fdc918)
  2. nix-build -A haskellPackages.filepath-bytestring --system "x86_64-darwin"
  3. path should be /nix/store/2l34mrv2p55rgh2x42m7vr4s65mnl1wd-filepath-bytestring-1.4.100.3.2
  4. ls -la result/lib/ghc-9.6.6/lib/package.conf.d shows .conf
  5. rm result
  6. nix-store --delete /nix/store/2l34mrv2p55rgh2x42m7vr4s65mnl1wd-filepath-bytestring-1.4.100.3.2
  7. nix-build -A haskellPackages.filepath-bytestring --system "x86_64-darwin" --substituters ""
  8. Notice the hash difference (/nix/store/2l34mrv2p55rgh2x42m7vr4s65mnl1wd-filepath-bytestring-1.4.100.3.2) . Also now ls will show a file called filepath-bytestring-1.4.100.3.2-GJFDqHc1yFp4bne8zZKnAq.conf in the folder

Expected behavior

Hash should be identical, Hydra builder should not build broken packages.

Screenshots

Additional context

I don't know why the builder ( or the darwin-stdenv) does not have grep installed. But this causes quite some problems. Also why doesn't a "command not found" error stop and fail the build?

Metadata

  • system: "aarch64-darwin"
  • host os: Darwin 24.0.0, macOS 15.0.1
  • multi-user?: yes
  • sandbox: no
  • version: nix-env (Nix) 2.24.10
  • channels(root): "nixpkgs"
  • nixpkgs: /nix/store/5q79468yd486p049jk147djamjz9v0nv-source

Notify maintainers

@emilazy
@NixOS/darwin-maintainers
@NixOS/haskell
ZHF: #352882


Note for maintainers: Please tag this issue in your PR.


Add a 👍 reaction to issues you find important.

Metadata

Metadata

Assignees

No one assigned

    Labels

    0.kind: build failureA package fails to build2.status: stalehttps://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md6.topic: darwinRunning or building packages on Darwin6.topic: haskellGeneral-purpose, statically typed, purely functional programming language

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions