Shuffle `BuildResult` data definition, make state machine clearer, introduce `SingleDrvOutputs` by Ericson2314 · Pull Request #6312 · NixOS/nix

Ericson2314 · 2022-03-25T02:00:46Z

Motivation

There are two changes going on here: changing BuildResult and changing the goal state machine logic.

Introduce `Worker::makeGoal`

This takes a DerivedPath so the caller doesn't need to care about which sort of goal does what.

`BuildResult` changes

Make BuildResult be like it was before, and instead made KeyedBuildResult be a subclass wit the path. Only buildPathsWithResults returns KeyedBuildResults, everything else just becomes like it was before, where the "key" is unambiguous from context.

The idea here is that storing the path when it is already unambiguous from context denormalizes our data model, making two sources of truth that must be kept in sync. This isn't ideal.

This is especially notable for DerivationGoal, where two DerivedPath requests may correspond to the same DerivationGoal. Before, the BuildResult of that goal (stored, then returned) would just be the first one to create that goal. That is non-deterministic!

KeyedBuildResult is useful in the return type of KeyedBuildResult, but this is because it is conceptually returning an mapping of DerivedPath keys to BuildResult values. We could, for example, also use std::map<BuildResult, BuildResult> for this purpose (and probably also have the arguments be a std::set not std::vector for symmetry), in which case we would here also not need KeyedBuildResult.

State machine changes

The needRestart, retrySubstitution, and retriedSubstitution bools are replaced with enums. The enum cases and conditional logic are exhaustively documented to try to make the code easier to understand.

An especially thorny case with the old code was that needRestart was set to true if the outputs changed for a goal that had progressed to building, but needRestart would never be read again. This is an in fact a good thing operationally, as a build will produce all outputs and so restarting is not necessary. But this (a) being how it worked, and (b) being a good thing was quite obfuscated just reading the code!

The new code and comments call out this case explicitly.

Introduce `SingleDrvOutputs`

In many cases we are dealing with a collection of realisations, they are all outputs of the same derivation. In that case, we don't need "derivation hashes modulos" to be part of our map key, because the output names alone will be unique. Those hashes are still part of the realisation proper, so we aren't loosing any information, we're just "normalizing our schema" by narrowing the "primary key".

Besides making our data model a bit "tighter" this allows us to avoid a double for loop in DerivationGoal::waiteeDone. The inner for loop was previously just to select the output we cared about without knowing its hash. Now we can just select the output by name directly.

Context

Protocol non-changes

Note that neither protocol is changed as part of this: we are still transferring DrvOutputs over the wire for BuildResults. I would only consider revising this once #6223 is merged, and we can mention protocol versions inside factored-out serialization logic. Until then it is better not change anything because it would come a the cost of code reuse.

Initializers

Unfortunately, we need to avoid C++20 strictness on designated initializers.

(BTW https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2287r1.html
this offers some new syntax for this use-case. Hopefully this will be
adopted and we can eventually use it.)

No having that yet, maybe it would be better to not make KeyedBuildResult a subclass to just avoid this.

History of this PR

In #6311 (comment), I thought that since derivation goals' wanted outputs can "grow" due to overlapping dependencies (See DerivationGoal::addWantedOutputs, called by Worker::makeDerivationGoalCommon), the previous bug fix had an unfortunate side effect of causing more pointless rebuilds.

In particular, I was worried about this situation:

Goal made from DerivedPath::Built { foo, OutputsSpec::Names { a } }.
Goal gives up on on substituting, starts building.
Goal made from DerivedPath::Built { foo, OutputsSpec::Names { b } }, in fact is just modified original goal.
Though the goal had gotten as far as building, so all outputs were going to be produced, addWantedOutputs no longer knows that and so the goal is flagged to be restarted.

This might sound far-fetched with input-addressed drvs, where we usually basically have all our goals "planned out" before we start doing anything, but with CA derivation goals and especially RFC 92, where drv resolution means goals are created after some building is completed, it would be more likely to happen.

@edolstra, on the other hand, thought this was not an issue because even though needsRestart = true would be set, it would never be read again. Eelco might be right, but nonetheless this is still very confusing storing "needs restart" but in fact neither needing nor wanting a restart!

So the first thing I did was restore the clearing of wantedOutputs we used to do, and then filter the outputs in buildPathsWithResults to only get the ones we care about.

The KeyedBuildResult change technically was already needed due to the "modified derivational goal with second request" issue described in the motivation section, but become more urgent with this modification of wantedOutputs.

Eelco however (if I recalled correctly) in real time meeting (so no paper trail) also didn't like modifying wantedOutputs upon falling back to doing a build (as we did before), because we should only modify it in response to new requests --- actual wants --- and not because we are "incidentally" building all the outptus beyond what may have been requested.

That's a fair point, and the alternative is to replace the boolean soup with proper enums: Instead of modifying wantedOuputs som more, we'll modify needsRestart to indicate we are passed the need. That lead me to the bool -> enum changes, and then I did the substitution ones in addition too for consistency.

"Keyed" data types in general.

I think separating the "primary key" field(s) from the other fields is good practical in general. (I would like to do the same thing for ValidPathInfo, for example.) Besides the example in the motivation, there are probably other cases where we would like to do std::map<Key, ThingWithoutKey> where the "without keys" is necessary to not contain duplicate keys and just precludes the possibility of those duplicate keys being out of sync.

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

agreed on idea
agreed on implementation strategy
tests, as appropriate
- functional tests - tests/**.sh
- unit tests - src/*/tests
- integration tests - tests/nixos/*
documentation in the manual
code and comments are self-explanatory
commit message explains why the change was made
new feature or incompatible change: updated release notes

Ericson2314 · 2022-03-25T08:19:50Z

Failure is spurious.

Ericson2314 · 2022-03-25T16:03:39Z

@edolstra For reference, the wantedOutputs clearing is something you did in 8d8d47a. Maybe looking at the commit will jog your memory on whether you were thinking of the same issue I was when you wrote that.

edolstra · 2022-04-01T12:40:57Z

Though the goal had gotten as far as building, so all outputs were
going to be produced, addWantedOutputs no longer knows that and so
the goal is flagged to be restarted.

Why would it be restarted? The only place where we restart is in DerivationGoal::outputsSubstitutionTried(). Once the build has actually started, it builds all outputs and it won't be restarted, as far as I can tell.

Ericson2314 · 2022-04-01T13:49:25Z

Fair enough, but I think setting a needs-restart flag that is never read is at least as confusing as changing the wanted outputs.

I still prefer the way it was before, because the metadata on the accurately reflects what it is doing.

roberth · 2023-02-26T12:23:39Z

Data model improvement and worker protocol no-ops lgtm.
Just need to look into the src/libstore/build/entry-points.cc changes still.

roberth

Suggesting some improvements to buildPathsWithResults.
Another issue I had was the heavy abbreviation of local variables, making the code harder to understand. I'm already suggesting some things that can be factored out, so perhaps it won't be as much of a problem anymore.

req/reqs: this is fine
g will only span 3 lines, but why not goal?
reqs2 -> state as mentioned
gp: only used once, but goalPtr is still easier to read
pbp: might go away?
bp: contains multiple outputs. We've already established that that's an unnecessary grouping in most cases, so we'll probably revisit that anyway.
bos: move to goal method, change to results.

src/libstore/build/entry-points.cc

roberth · 2023-03-18T11:10:34Z

src/libstore/build/entry-points.cc

+                return worker.makePathSubstitutionGoal(bo.path, buildMode == bmRepair ? Repair : NoRepair);
            },
-        }, br.raw());
+        }, req.raw());


This visit could be moved into worker.
DerivedPath is a very fundamental thing, so I think this would be appropriate.

make is a lie, but that's out of scope for now.

Goal is also a lie. MutableWorkUnit? Not sure if that name helps though. Let's not move the Goal posts for this PR.

The first commit of #4628 is I think just what you want! Added it here.

roberth · 2023-03-18T11:27:05Z

src/libstore/build/entry-points.cc

+        });
+
+        auto pbp = std::get_if<DerivedPath::Built>(&req);
+        if (!pbp) continue;


It seems that we could return a BuildResult for opaque paths too. Substituted, AlreadyValid, NoSubstituters are appropriate statuses.

However, that might be somewhat of a breaking change. Maybe add a new method in a followup?

Oh we are, that continue is after making a KeyedBuildResult.

This is a sign to make the continue is too tricky, so I am just doing a regular if block on the rest.

roberth · 2023-03-18T11:29:48Z

src/libstore/build/entry-points.cc

+        /* Because goals are in general shared between derived paths
+           that share the same derivation, we need to filter their
+           results to get back just the results we care about.
+         */


Aforementioned lies make this non-obvious. Short of that, I think we can simplify this a bit

/* Goals are mutable shared state, that may contain more outputs than we requested. */

ie only the weird part. However, I think this code should be moved.

Part of the problem is the gp->buildResult is too easy. How about we isolate the nastiness of mutable state by encapsulating gp? Then we can provide a method that requests the right outputs and make this mistake impossible.

Ericson2314 · 2023-04-14T17:12:36Z

src/libstore/build/derivation-goal.cc

+    if (!useDerivation) return;
+    auto & fullDrv = *dynamic_cast<Derivation *>(drv.get());
+
+    auto * dg = dynamic_cast<DerivationGoal *>(&*waitee);
+    if (!dg) return;
+
+    auto outputs = fullDrv.inputDrvs.find(dg->drvPath);
+    if (outputs == fullDrv.inputDrvs.end()) return;
+
+    for (auto & outputName : outputs->second) {
+        auto buildResult = dg->getBuildResult(DerivedPath::Built {
+            .drvPath = dg->drvPath,
+            .outputs = OutputsSpec::Names { outputName },
+        });
+        if (buildResult.success()) {
+            for (auto & [output, realisation] : buildResult.builtOutputs) {
                inputDrvOutputs.insert_or_assign(
-                    { bfd->drvPath, output.outputName },
+                    { dg->drvPath, output.outputName },
                    realisation.outPath);
+            }
+        }
+    }
 }


This became more awkward after buildResult was encapsulated because one cannot easily index builtOutputs without knowing the hash modulo / fixed output hash.

I think what this means is that

DrvPaths builtOutputs;

is the wrong field for BuildResult; instead of that std::map<DrvOutput, Realisation> it should be

std::map<std::string, Realisation> builtOutputs;

since we are only returning realizations for a single top-level derivation.

Realisation.id.drvHash gets us that hash, that was removed from the map key.

I pushed one more commit which addresses this awkwardness, getting rid of the double for loop.

src/libstore/build/derivation-goal.cc

src/libstore/build/derivation-goal.hh

roberth · 2023-04-15T12:06:40Z

src/libstore/remote-store.cc

 void write(const Store & store, Sink & to, const BuildResult & res)
 {
-    worker_proto::write(store, to, res.path);
    to
        << res.status
        << res.errorMsg
        << res.timesBuilt
        << res.isNonDeterministic
        << res.startTime
        << res.stopTime;
-    worker_proto::write(store, to, res.builtOutputs);
+    DrvOutputs builtOutputs;
+    for (auto & [output, realisation] : res.builtOutputs)
+        builtOutputs.insert_or_assign(realisation.id, realisation);
+    worker_proto::write(store, to, builtOutputs);
 }


Isn't this a breaking change to the protocol?

Err my intent was to not change the protocol, converting it to the old way. (I should add that the commit message and PR description!)

The top line is changed in a previous commit; that is the difference between BuildResult and KeyedBuildResult. The latter is now used everywhere needed to preserve the protocol.

For what its worth, after #6223 is merged the factored out serialization logic can use protocol versions, and then I can deduplicate this stuff much more aggressively :)

I didn't really register the other changes to the write functions. This makes sense. Does the daemon compatibility test pass?

Good question. Yes it does!

This takes a `DerivedPath` so the caller doesn't need to care about which sort of goal does what.

…er way In NixOS#6311 (comment), I realized since derivation goals' wanted outputs can "grow" due to overlapping dependencies (See `DerivationGoal::addWantedOutputs`, called by `Worker::makeDerivationGoalCommon`), the previous bug fix had an unfortunate side effect of causing more pointless rebuilds. In paticular, we have this situation: 1. Goal made from `DerivedPath::Built { foo, {a} }`. 2. Goal gives on on substituting, starts building. 3. Goal made from `DerivedPath::Built { foo, {b} }`, in fact is just modified original goal. 4. Though the goal had gotten as far as building, so all outputs were going to be produced, `addWantedOutputs` no longer knows that and so the goal is flagged to be restarted. This might sound far-fetched with input-addressed drvs, where we usually basically have all our goals "planned out" before we start doing anything, but with CA derivation goals and especially RFC 92, where *drv resolution* means goals are created after some building is completed, it is more likely to happen. So the first thing to do was restore the clearing of `wantedOutputs` we used to do, and then filter the outputs in `buildPathsWithResults` to only get the ones we care about. But fix also has its own side effect in that the `DerivedPath` in the `BuildResult` in `DerivationGoal` cannot be trusted; it is merely the *first* `DerivedPath` for which this goal was originally created. To remedy this, I made `BuildResult` be like it was before, and instead made `KeyedBuildResult` be a subclass wit the path. Only `buildPathsWithResults` returns `KeyedBuildResult`s, everything else just becomes like it was before, where the "key" is unambiguous from context. I think separating the "primary key" field(s) from the other fields is good practical in general anyways. (I would like to do the same thing for `ValidPathInfo`.) Among other things, it allows constructions like `std::map<Key, ThingWithKey>` where doesn't contain duplicate keys and just precludes the possibility of those duplicate keys being out of sync. We might leverage the above someday to overload `buildPathsWithResults` to take a *set* of return a *map* per the above. ----- Unfortunately, we need to avoid C++20 strictness on designated initializers. (BTW https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2287r1.html this offers some new syntax for this use-case. Hopefully this will be adopted and we can eventually use it.) No having that yet, maybe it would be better to not make `KeyedBuildResult` a subclass to just avoid this. Co-authored-by: Robert Hensing <[email protected]>

@edolstra

If my memory is correct, @edolstra objected to modifying `wantedOutputs` upon falling back to doing a build (as we did before), because we should only modify it in response to new requests --- *actual* wants --- and not because we are "incidentally" building all the outptus beyond what may have been requested. That's a fair point, and the alternative is to replace the boolean soup with proper enums: Instead of modifying `wantedOuputs` som more, we'll modify `needsRestart` to indicate we are passed the need.

In many cases we are dealing with a collection of realisations, they are all outputs of the same derivation. In that case, we don't need "derivation hashes modulos" to be part of our map key, because the output names alone will be unique. Those hashes are still part of the realisation proper, so we aren't loosing any information, we're just "normalizing our schema" by narrowing the "primary key". Besides making our data model a bit "tighter" this allows us to avoid a double `for` loop in `DerivationGoal::waiteeDone`. The inner `for` loop was previously just to select the output we cared about without knowing its hash. Now we can just select the output by name directly. Note that neither protocol is changed as part of this: we are still transferring `DrvOutputs` over the wire for `BuildResult`s. I would only consider revising this once NixOS#6223 is merged, and we can mention protocol versions inside factored-out serialization logic. Until then it is better not change anything because it would come a the cost of code reuse.

roberth · 2024-02-28T01:52:15Z

src/libstore/build/derivation-goal.cc

+            break;
+        case RetrySubstitution::YesNeed:
+            // Should not be able to reach this state from here.
+            assert(false);


Unfortunately, it has been reached

Assertion failure in outputsSubstitutionTried #10092

Ericson2314 mentioned this pull request Mar 25, 2022

Give nix daemon and nix-store --serve protocols separate serializers with version info #6223

Merged

7 tasks

Ericson2314 force-pushed the keyed-build-result branch 2 times, most recently from 83f5782 to 513e904 Compare March 25, 2022 15:59

Ericson2314 requested a review from edolstra March 25, 2022 16:04

Ericson2314 mentioned this pull request Mar 25, 2022

Factor out serialization for BuildResult #4588

Closed

Ericson2314 changed the title ~~Make KeyedBuildResult, put back BuildResult the way it was before~~ Make KeyedBuildResult, BuildResult like before, and fix bug another way Mar 25, 2022

Ericson2314 changed the title ~~Make KeyedBuildResult, BuildResult like before, and fix bug another way~~ Fix race conditition causing other work, shuffle BuildResult data definition Mar 25, 2022

Ericson2314 changed the title ~~Fix race conditition causing other work, shuffle BuildResult data definition~~ Fix race condition causing other work, shuffle BuildResult data definition Mar 25, 2022

Ericson2314 changed the title ~~Fix race condition causing other work, shuffle BuildResult data definition~~ Fix race condition causing extra work, shuffle BuildResult data definition Mar 25, 2022

Ericson2314 mentioned this pull request Mar 31, 2022

Make buildPathsWithResults() only return info on wanted outputs #6311

Merged

fogti mentioned this pull request May 5, 2022

resolvedResult.builtOutputs doesn't contain expected output after successful multi-output build #6383

Closed

stale bot added the stale label Oct 1, 2022

Ericson2314 mentioned this pull request Feb 3, 2023

TCP support in the Nix daemon #5265

Draft

Ericson2314 requested a review from thufschmitt as a code owner February 14, 2023 18:41

Ericson2314 changed the title ~~Fix race condition causing extra work, shuffle BuildResult data definition~~ Shuffle BuildResult data definition, make state machine clearer Feb 14, 2023

stale bot removed the stale label Feb 14, 2023

roberth self-assigned this Feb 15, 2023

Ericson2314 mentioned this pull request Feb 17, 2023

Also pass unwanted outputs to post-build-hook #7721

Merged

7 tasks

Ericson2314 force-pushed the keyed-build-result branch from 059944f to 13aee06 Compare February 24, 2023 14:41

fricklerhandwerk added the store Issues and pull requests concerning the Nix store label Mar 3, 2023

roberth reviewed Mar 18, 2023

View reviewed changes

Ericson2314 force-pushed the keyed-build-result branch from 8fa1336 to 94fba53 Compare April 14, 2023 17:05

Ericson2314 commented Apr 14, 2023

View reviewed changes

Ericson2314 force-pushed the keyed-build-result branch from 94fba53 to eb18102 Compare April 14, 2023 20:27

Ericson2314 changed the title ~~Shuffle BuildResult data definition, make state machine clearer~~ Shuffle BuildResult data definition, make state machine clearer, introduce SingleDrvOutputs Apr 14, 2023

roberth reviewed Apr 15, 2023

View reviewed changes

Ericson2314 and others added 4 commits April 15, 2023 11:01

Introduce Worker::makeGoal

9df7f3f

This takes a `DerivedPath` so the caller doesn't need to care about which sort of goal does what.

Ericson2314 force-pushed the keyed-build-result branch from 11dd418 to 24866b7 Compare April 15, 2023 16:53

roberth approved these changes Apr 17, 2023

View reviewed changes

roberth merged commit 3f9589f into NixOS:master Apr 17, 2023

Ericson2314 deleted the keyed-build-result branch April 17, 2023 16:21

roberth reviewed Feb 28, 2024

View reviewed changes

tomberek mentioned this pull request Jun 4, 2025

Assertion failure in outputsSubstitutionTried #10092

Closed

Uh oh!

Conversation

Ericson2314 commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Introduce Worker::makeGoal

BuildResult changes

State machine changes

Introduce SingleDrvOutputs

Context

Protocol non-changes

Initializers

History of this PR

"Keyed" data types in general.

Checklist for maintainers

Uh oh!

Ericson2314 commented Mar 25, 2022

Uh oh!

Ericson2314 commented Mar 25, 2022

Uh oh!

edolstra commented Apr 1, 2022

Uh oh!

Ericson2314 commented Apr 1, 2022

Uh oh!

roberth commented Feb 26, 2023

Uh oh!

roberth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ericson2314 Apr 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ericson2314 commented Mar 25, 2022 •

edited

Loading

Introduce `Worker::makeGoal`

`BuildResult` changes

Introduce `SingleDrvOutputs`

Ericson2314 Apr 14, 2023 •

edited

Loading