Fix out-of-order verification of cycle head dependencies#1061
Fix out-of-order verification of cycle head dependencies#1061MichaReiser merged 36 commits intosalsa-rs:masterfrom
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
Merging this PR will degrade performance by 13.81%
Performance Changes
Comparing Footnotes |
This comment was marked as outdated.
This comment was marked as outdated.
d947327 to
d7f5cf8
Compare
This comment was marked as resolved.
This comment was marked as resolved.
d7f5cf8 to
8121cec
Compare
122fb9d to
0e937e2
Compare
| stack.extend( | ||
| origin | ||
| .inputs() | ||
| .filter_map(|input| TryInto::<DatabaseKeyIndex>::try_into(input).ok()) |
There was a problem hiding this comment.
I don't know why we used TryInto here. input is a DatabaseKeyIndex, so this is a no-op conversion
| // TODO: Write a test that demonstrates that backdating queries participating in a cycle isn't safe | ||
| // OR write many tests showing that it is (and fixing the case where it didn't correctly account for today). | ||
| if !revisions.cycle_heads().is_empty() { | ||
| if !revisions.cycle_heads().is_empty() || old_memo.may_be_provisional() { |
There was a problem hiding this comment.
We can't backdate if old_memo is the memo from the last iteration (or a previous revision but never finalized). To support backdating cyclic queries, we'd have to back up some of the previous-revision memo's metadata before overriding the memo with the first iteration's result.
For now, simply skip this path. The downside is that this requires running a few extra queries, but the blast radius should still be contained, assuming those queries aren't cyclic (so that they can backdate).
3bf0aa9 to
7bbeef3
Compare
| db.assert_logs(expect![[r#" | ||
| [ | ||
| "salsa_event(DidValidateMemoizedValue { database_key: min_panic(Id(2)) })", | ||
| "salsa_event(WillExecute { database_key: min_panic(Id(3)) })", |
There was a problem hiding this comment.
d and c are both queries that don't participate in a cycle. They aren't flattened into a. a's flattened dependencies are:
input(a)input(b)cd
Salsa now walks as dependency edges in deep_verify_edges, starting with input(a), followed by input(b) and c that all haven't changed. d has a changed input. Salsa re-executes it to see if it can be backdated, but it can't, which is why we later re-execute ' a `.
This test does suggest to me that the old implementation is too eager when it comes to re-executing cyclic queries, because it immediately starts executing the outer most cyclic query, even if d is later able to backdate.
| [ | ||
| "salsa_event(WillExecute { database_key: min_panic(Id(2)) })", | ||
| "salsa_event(WillExecute { database_key: min_panic(Id(1)) })", | ||
| "salsa_event(WillExecute { database_key: min_iterate(Id(0)) })", |
There was a problem hiding this comment.
This now is also more correct. Before, we re-executed c first, even though it is a cycle participant. Now, we correctly detect that an input going into a's cycle changed, and immediately start re-executing the entire cycle.
| [ | ||
| "salsa_event(DidValidateInternedValue { key: Interned(Id(400)), revision: R2 })", | ||
| "salsa_event(WillExecute { database_key: head(Id(0)) })", | ||
| "salsa_event(DidValidateInternedValue { key: Interned(Id(400)), revision: R2 })", |
There was a problem hiding this comment.
We now avoid checking the same interned value multiple times, because we immediately see that the entire cycle needs to re-execute
| "salsa_event(DidSetCancellationFlag)", | ||
| "salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(400)) })", | ||
| "salsa_event(DidValidateInternedValue { key: query_d::interned_arguments(Id(800)), revision: R2 })", | ||
| "salsa_event(WillExecute { database_key: query_d(Id(800)) })", |
There was a problem hiding this comment.
This makes sense to me and is similar to another change we've seen above.
query_d doesn't participate in the cycle, so it doesn't get flattened out.
query_b's dependencies are:
Output(0)(tracked new)read_value(Output(0))query_d(has changed)ab.valueOutput(1)(next iteration)read_value(Output(1))Output(2)(next iteration)read_value(Output(2))Output(3)(next iteration)read_value(Output(3))
Salsa detects that query_d's input changed, it executes query_d but can't backdate it because the result is different. In turn, the query_b cycle changed, so that salsa re-executes it
a761359 to
c87488f
Compare
ibraheemdev
left a comment
There was a problem hiding this comment.
This makes a lot of sense to me, very nice.
…ions`" This reverts commit e686db9.
c87488f to
d029bee
Compare
* Start working on maybe_changed_after * Add `maybe_changed_after` for tracked structs * Remove old cycle handling in maybe_changed_after * Add tests * Don't collect deps if there's an outer cycle * Remove `validate_provisional` * Compute outer most cycle * Set edges in more branches * More * Improve logging * Flatten dependencies of all heads, use lazy validation * Properly skip `complete_iteration` in the no cycle head case * Revert `struct_database_key_index` changes * Revert more changes * Revert even more * Rename * Fix memory leak * Clippy * Advoid recursing cycle heads * Add missing inventory gating * Only track queries in seen * Reuse allocations * Avoid copying current-revision input-outputs into `QueryRevisions` * Revert "Avoid copying current-revision input-outputs into `QueryRevisions`" This reverts commit e686db9. * Move more out of hot path * Reuse allocations more * Revert some Ingredient::maybe_changed_after changes * More cleanup * Back out disambiguator changes * Docs * Small reordering * Panic when using accumulator in fixpoint query * Revert unnecessary test changes * Simplify more * Code review feedback * Format
* Start working on maybe_changed_after * Add `maybe_changed_after` for tracked structs * Remove old cycle handling in maybe_changed_after * Add tests * Don't collect deps if there's an outer cycle * Remove `validate_provisional` * Compute outer most cycle * Set edges in more branches * More * Improve logging * Flatten dependencies of all heads, use lazy validation * Properly skip `complete_iteration` in the no cycle head case * Revert `struct_database_key_index` changes * Revert more changes * Revert even more * Rename * Fix memory leak * Clippy * Advoid recursing cycle heads * Add missing inventory gating * Only track queries in seen * Reuse allocations * Avoid copying current-revision input-outputs into `QueryRevisions` * Revert "Avoid copying current-revision input-outputs into `QueryRevisions`" This reverts commit e686db9. * Move more out of hot path * Reuse allocations more * Revert some Ingredient::maybe_changed_after changes * More cleanup * Back out disambiguator changes * Docs * Small reordering * Panic when using accumulator in fixpoint query * Revert unnecessary test changes * Simplify more * Code review feedback * Format
|
Hmm, the new rust release is hitting. |
2250e0f to
4b98bc0
Compare
Problem
A foundational assumption of
deep_verify_memois that it walks query dependencies in execution order (as seen by the query being validated). This PR fixes a bug related to cyclic queries where this invariant was violated because of how Salsa tracks dependencies:DatabaseKeyIndexbut not theIterationCountwhen adding a read dependency. The result is that Salsa adds a single dependency even if a query reads ax_1andx_2where_isignifies the iteration.i.The result is that
deep_verify_edgesverifies dependencies before the structs on which the query results are stored, which is unsound.Here an example where this happened before (this is a new test case):
Collected input/outputs by query:
query_aquery_bInterned::new(1)Interned::new(2)Interned::new(3)Interned::new(10)query_bInterned::new(0)query_aquery_x(Interned(0))query_x(Interned(1))query_x(Interned(2))query_x(Interned(10))query_x: Not importantTraversal order in
maybe_changed_after:query_aquery_bInterned::new(0)query_a(returns unchanged, fixpoint initial)query_x(0)query_x(1)query_x(2)query_x(...)query_x(10)Interned::new(1)Interned::new(2)Interned::new(...)Interned::new(10)Note how Salsa verifies
query_x(1), andquery_x(...)before verifying that the interned values haven't changed.The issue is that Salsa sees a dependency on
query_b(without any iteration!), but it then doesn't just validate the dependencies ofquery_bin iteration 0, it validates all its dependencies, even if they are from later iterations. Only once that's done does salsa verify the remaining dependencies fromquery_a, even though they were created before some of the dependencies that were just verified as part ofquery_b.Solution
This PR flattens the dependencies of all queries with cycle handling that participate in a cycle after each execution (each iteration) so that the cycle head depends only on inputs, tracked structs, interned values, and finalized queries. This ensures that the query's dependencies form a tree, not a graph (no cycles).
Not only does this fix the bug mentioned above, but it also greatly simplifies
maybe_changed_after, as it no longer needs to handle cyclic dependencies. This allows us to remove almost all fixpoint-related code frommaybe_changed_after(and friends).Downsides
There are a few downsides to this approach:
deep_verify_memofor a nested head needed to delegate to the outermost cycle head. This, in turn, required tracking more state to verify if the outermost cycle head is still "the same" as when the inner head was executed (a cycle finalized in the field), and it's also much more difficult to reason about in general. It also didn't work 😆 Specifically, it didn't correctly handle the case where an inner cycle converges before its outer cycle. The issue is that the inner cycle counted on the outer most cycle to do the dependency flattening. But the inner cycle now suddenly needs to know its own flattened dependencies across all previous iterations. We could probably make this approach work, but it would require that the inner cycle walks its dependencies from the previous iteration, and replaces its dependency on the outer-most query with its flattened dependencies. However, it's not clear to me how to detect the outer-most cycle without storing it in the Memo. Ultimately, this didn't seem worthwhile, given there's no meaningful performance regression on ty (one benchmark shows a 1% regression)Alternative solutions
I still think the ideal is for Salsa to store multiple memos for queries participating in fixpoint, and for the iteration count to be part of the
QueryEdge. This would remove much of the special casing required for fixpoint iteration and fit more naturally into Salsa's overall design. It probably would also set the foundation for speculative execution, where an iteration is "one speculative execution", but we could use other keys.I decided not to explore this further because there are some big open questions, where it was unclear to me how to resolve them in the limited time I had available:
DatabaseKeyIndexor separate? I think it must be a single key; the caller offetchdoesn't know what the "latest" iteration of a given query is. But it always wants the most recent value.QueryEdgewithout increasing its size?maybe_changed_aftershouldn't result in a cycle, but acquiring the lock for the key without an iteration should.How to support
diff_outputs?I don't think there's an open issue for it, but our fixpoint handling is prone to removing outputs and tracked structs too early for queries participating in a cycle. Salsa diffs the outputs immediately after completing an iteration. However, some outputs (including tracked structs) might only have been created in later iterations. Salsa deleting them in the first iteration is overly aggressive (and it used to be unsound #980). To fix this, it would be very helpful if there were a hook to finalize all cycle participants, rather than our current lazy validation.
My initial approach to this PR used eager validation, but that didn't handle the case where a nested cycle converges before the outer cycle. So I switched to flattening all cycle head dependencies; however, that also means that walking the dependency tree during finalization no longer contains the full dependency tree, making eager finalization impossible (or we'd have to retain the query dependencies in a different way). But I think there's still a solution to this problem, but we have to defer
diff_outputsto the next revision.I don't have this fully figured out but my thinking is along these lines:
old_memohasprev_revision_metadataset. If so, diff theprev_revision_metadatawith theold_memo.input_outputsand delete any no-longer-created outputs.Notes for reviewer
This hopefully fixes #1023
For some more context, see the write up in https://hackmd.io/XCIIyYjlQkuVg33THEr4iQ
This PR is based on top of #1063 because it removes the need to special-case fallback-immediate.