fix: add temp roots for references before auto-GC to prevent race#15158
fix: add temp roots for references before auto-GC to prevent race#15158domenkozar wants to merge 1 commit intoNixOS:masterfrom
Conversation
|
I don't think this is correct. It's the caller's responsibility to ensure that the references are registered as GC roots before so after this point, it's safe to use Do you have more information under what circumstances this issue triggers? |
Store::writeDerivation did not call addTempRoot when the derivation
already existed (early return path at isValidPath check). This meant
the current daemon worker never held a temp root for paths written by
previous processes whose daemon workers have since exited.
The race:
1. Process X writes hosts.drv — its daemon worker holds the temp root
2. Process X finishes, daemon worker exits, temp roots file is
cleaned up by findTempRoots
3. Current evaluator calls writeDerivation(hosts.drv) — isValidPath
returns true — early return with NO addTempRoot
4. Current evaluator calls writeDerivation(dependent.drv) which
references hosts.drv — addToStoreFromDump is called
5. autoGC() inside addToStoreFromDump deletes hosts.drv (no temp root)
6. registerValidPath fails: "path '…-hosts.drv' is not valid"
This manifests during heavy-load NixOS tests where many concurrent
evaluations share common derivations.
Fix: call addTempRoot for the derivation path and all its references
in Store::writeDerivation, before the isValidPath check. This follows
the established pattern (addTempRoot then check validity) and ensures
the current daemon worker protects all paths it depends on.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
43f8246 to
31f1cde
Compare
|
@edolstra thanks, I've moved the patch to the Going to deploy to our CI builders and see if it still doesn't reproduce. |
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
| references it. The early return below is especially important: | ||
| without addTempRoot the current daemon worker would never hold | ||
| a temp root for an already-valid path. */ |
There was a problem hiding this comment.
Hm yeah, the early return is a rather recent thing and that does seem incorrect now. But from the looks of it we never added a temp root for derivations and previously the local store would create it in addToStoreFromDump, so that might be an argument to revert the early bailout completely. But considering that the issue is old it I wonder if there are more issues?
There was a problem hiding this comment.
Additionally, seems like we need to add temp roots in fetchToStore2.
There was a problem hiding this comment.
That could be related to another gc related issue I'm seeing with paths not being added to temp gc roots:
error: path '/nix/store/p3l1a5y7nllfyrjn2krlwgcc3z0cd3fq-make-
│ symlinks-relative.sh' is not valid
| for (auto & ref : references) | ||
| addTempRoot(ref); |
There was a problem hiding this comment.
Hm, this should be needed if all the references have already been added to temproots, right? We can keep the temp root for derivation and that should handle all the references that are inputDrvs. For input sources this should be enough I think? #15158 (comment).
|
This still doesn't fix the race, because the problem probably isn't in Nix itself but in how it's used. Since the comment mentions
the issue here is that the previous process should have registered a GC root. For instance, something like is racy because the output of which removes the window in which the path isn't a GC root. |
We've seen this one happen for a while during a bunch of heavy-load nixos tests and I've finally had time to fix it.
I haven't seen errors since, and before they happened for at least one test at the time, so it's helping.