Skip to content

fix: add temp roots for references before auto-GC to prevent race#15158

Closed
domenkozar wants to merge 1 commit intoNixOS:masterfrom
cachix:auto-gc-drv
Closed

fix: add temp roots for references before auto-GC to prevent race#15158
domenkozar wants to merge 1 commit intoNixOS:masterfrom
cachix:auto-gc-drv

Conversation

@domenkozar
Copy link
Member

We've seen this one happen for a while during a bunch of heavy-load nixos tests and I've finally had time to fix it.

I haven't seen errors since, and before they happened for at least one test at the time, so it's helping.

@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Feb 6, 2026
@edolstra
Copy link
Member

edolstra commented Feb 6, 2026

I don't think this is correct. It's the caller's responsibility to ensure that the references are registered as GC roots before addToStore() is called, otherwise there is a still a time window in which they can be GCed. For example, when the fetcher cache checks whether a store path already exists, it does something like

store.addTempRoot(storePath);
if (!store.isValidPath(storePath))
  ... create `storePath` ...

so after this point, it's safe to use storePath as a reference.

Do you have more information under what circumstances this issue triggers?

Store::writeDerivation did not call addTempRoot when the derivation
already existed (early return path at isValidPath check). This meant
the current daemon worker never held a temp root for paths written by
previous processes whose daemon workers have since exited.

The race:

  1. Process X writes hosts.drv — its daemon worker holds the temp root
  2. Process X finishes, daemon worker exits, temp roots file is
     cleaned up by findTempRoots
  3. Current evaluator calls writeDerivation(hosts.drv) — isValidPath
     returns true — early return with NO addTempRoot
  4. Current evaluator calls writeDerivation(dependent.drv) which
     references hosts.drv — addToStoreFromDump is called
  5. autoGC() inside addToStoreFromDump deletes hosts.drv (no temp root)
  6. registerValidPath fails: "path '…-hosts.drv' is not valid"

This manifests during heavy-load NixOS tests where many concurrent
evaluations share common derivations.

Fix: call addTempRoot for the derivation path and all its references
in Store::writeDerivation, before the isValidPath check. This follows
the established pattern (addTempRoot then check validity) and ensures
the current daemon worker protects all paths it depends on.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@domenkozar
Copy link
Member Author

domenkozar commented Feb 6, 2026

@edolstra thanks, I've moved the patch to the writeDerivation side so that derivation references are always under gc roots, I'm not sure if that function does any memoization since this could be expensive via the daemon.

Going to deploy to our CI builders and see if it still doesn't reproduce.

domenkozar added a commit to cachix/cachix-ci-agents that referenced this pull request Feb 6, 2026
domenkozar added a commit to cachix/cachix-ci-agents that referenced this pull request Feb 6, 2026
Comment on lines +146 to +148
references it. The early return below is especially important:
without addTempRoot the current daemon worker would never hold
a temp root for an already-valid path. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah, the early return is a rather recent thing and that does seem incorrect now. But from the looks of it we never added a temp root for derivations and previously the local store would create it in addToStoreFromDump, so that might be an argument to revert the early bailout completely. But considering that the issue is old it I wonder if there are more issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, seems like we need to add temp roots in fetchToStore2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be related to another gc related issue I'm seeing with paths not being added to temp gc roots:

error: path '/nix/store/p3l1a5y7nllfyrjn2krlwgcc3z0cd3fq-make-
  │ symlinks-relative.sh' is not valid

Comment on lines +150 to +151
for (auto & ref : references)
addTempRoot(ref);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this should be needed if all the references have already been added to temproots, right? We can keep the temp root for derivation and that should handle all the references that are inputDrvs. For input sources this should be enough I think? #15158 (comment).

@edolstra
Copy link
Member

This still doesn't fix the race, because the problem probably isn't in Nix itself but in how it's used. Since the comment mentions

a path written by a previous process

the issue here is that the previous process should have registered a GC root. For instance, something like

nix-store -r $(nix-instantiate foo.nix)

is racy because the output of nix-instantiate can be GC'ed before nix-store has a chance to build it. In this example, the fix is to do something like

nix-store -r $(nix-instantiate --add-root ./tmp-root foo.nix)

which removes the window in which the path isn't a GC root.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

store Issues and pull requests concerning the Nix store

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants