Coarse-grained tracked structs#657
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #657 will not alter performanceComparing Summary
|
c331b84 to
3e2d7f3
Compare
| pub name: FunctionId<'db>, | ||
|
|
||
| #[tracked] | ||
| name_span: Span<'db>, |
There was a problem hiding this comment.
I tried not adding this #[tracked] attribute because it seems unnecessary, but I actually got a test failure that I don't really understand:
---- type_check::fix_bad_variable_in_function stdout ----
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillExecute { database_key: parse_statements(Id(0)) } }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillExecute { database_key: type_check_program(Id(1400)) } }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillExecute { database_key: type_check_function(Id(1800)) } }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillExecute { database_key: type_check_function(Id(1801)) } }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillExecute { database_key: find_function(Id(1c00)) } }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillExecute { database_key: find_function(Id(1c01)) } }
Event: Event { thread_id: ThreadId(11), kind: DidSetCancellationFlag }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillExecute { database_key: parse_statements(Id(0)) } }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
Event: Event { thread_id: ThreadId(11), kind: WillCheckCancellation }
thread 'type_check::fix_bad_variable_in_function' panicked at /home/ibraheem/dev/sync/salsa/src/table.rs:281:9:
assertion `left == right` failed: page has hidden type `"salsa::table::Page<salsa::tracked_struct::Value<calc::ir::Function>>"` but `"salsa::table::Page<salsa::tracked_struct::Value<calc::ir::Program>>"` was expected
left: TypeId(0xbeb3ca6f8846f92e6ae4fc7569b4bf5f)
right: TypeId(0xbbdce2561688eb1bfc9fa5f5006a8342)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
There was a problem hiding this comment.
I added some debug statements to function::maybe_chagned_after to log the dependencies and compared the logs between a run that succeeds and a run that fails.
The most interesting difference is
Successful
[src/tracked_struct/tracked_field.rs:58:9] self = salsa::tracked_struct::tracked_field::FieldIngredientImpl<calc::ir::Program> {
ingredient_index: IngredientIndex(
13,
),
field_index: 0,
}
Event: Event { thread_id: ThreadId(2), kind: WillCheckCancellation }
[src/function/maybe_changed_after.rs:154:9] &database_key_index = type_check_function(Id(1800))
[src/function/maybe_changed_after.rs:156:15] &old_memo.revisions.origin = Derived(
QueryEdges {
input_outputs: [
Input(
Function.args(Id(1000)),
),
Input(
Program(Id(1000)),
),
Input(
Span.start(Id(406)),
),
Input(
Span.end(Id(406)),
),
],
},
)
With Panic
[src/tracked_struct/tracked_field.rs:58:9] self = salsa::tracked_struct::tracked_field::FieldIngredientImpl<calc::ir::Program> {
ingredient_index: IngredientIndex(
12,
),
field_index: 0,
}
Event: Event { thread_id: ThreadId(2), kind: WillCheckCancellation }
[src/function/maybe_changed_after.rs:154:9] &database_key_index = type_check_function(Id(1800))
[src/function/maybe_changed_after.rs:156:15] &old_memo.revisions.origin = Derived(
QueryEdges {
input_outputs: [
Input(
Program(Id(1000)),
),
Input(
Program.statements(Id(1000)),
),
Input(
Span.start(Id(406)),
),
Input(
Span.end(Id(406)),
),
],
},
)
The main difference is that the ingredient on which we run maybe_changed_after is different, AND so are the inputs. Looking at type_check_function then the inputs of the panic case make no sense. The query never calls Program.statements but it does call Function.args. I don't understand how it is happening but I suspect that we're incorrectly reusing a memo and turn a Function memo into a Program which would align with the panic
There was a problem hiding this comment.
Okay, I think I know what the issue is...
You changed the tracked struct's Configuration::create_ingredients to only create ingredients for tracked fields. I think this change is correct. However, you call Ingredient::Index::successor in tracked_field with the field_index which is the absolute offset including all fields -- including untracked fields. What this means is that we now add incorrect dependencies as soon as we have a tracked struct where a tracked field follows an untracked field.
I hope that helps. I let you take a stab at correctly mapping the field indices. It would be great to have a unit test demonstrating the behavior and possibly an assertion somewhere that would help us catch this sooner.
There was a problem hiding this comment.
Ah, that is a tricky. I tried a couple things here:
- Creating the ingredients with the absolute indices directly. Unfortunately this causes a panic, as some of the ingredient code relies on indices being create sequentially.
- Passing the correct relative index to
successorintracked_field. Unfortunately this also did not seem to work correctly and didn't result in the correct fields being tracked, but I'm not exactly sure why. The logging code also relies on the indices being absolute in order to print field names, so this was a little awkward.
Eventually I think the easiest solution is just creating ingredients for all fields. I added a failing test that is now passing with this fix (and also fails with the second idea I tried).
There was a problem hiding this comment.
Hmm, that's unfortunate but seems reasonable. Would it be possible to avoid creating any field ingredients if the structure only has tracked fields? Doing this optimization seems worthwhile, considering that all-fields-untracked is the new default
There was a problem hiding this comment.
I assume that keeping a mapping table from field_index to relative_ingredient_index (a [Option<usize>] where the value is Some for all tracked fields) is annoying to write the macro code for? If not, maybe that's worth a try. I otherwise suggest we keep it as is for now (but optimize for the no-tracked fields case)
MichaReiser
left a comment
There was a problem hiding this comment.
This looks great to me. I think the idea of tracked and untracked fields is easier to understand than id (I'm fairly certain we used id incorrectly in Red Knot).
I think I have an understanding why the panic occurs and I added an inline comment with an explanation. I let you take a stab at fixing it.
| pub name: FunctionId<'db>, | ||
|
|
||
| #[tracked] | ||
| name_span: Span<'db>, |
There was a problem hiding this comment.
I added some debug statements to function::maybe_chagned_after to log the dependencies and compared the logs between a run that succeeds and a run that fails.
The most interesting difference is
Successful
[src/tracked_struct/tracked_field.rs:58:9] self = salsa::tracked_struct::tracked_field::FieldIngredientImpl<calc::ir::Program> {
ingredient_index: IngredientIndex(
13,
),
field_index: 0,
}
Event: Event { thread_id: ThreadId(2), kind: WillCheckCancellation }
[src/function/maybe_changed_after.rs:154:9] &database_key_index = type_check_function(Id(1800))
[src/function/maybe_changed_after.rs:156:15] &old_memo.revisions.origin = Derived(
QueryEdges {
input_outputs: [
Input(
Function.args(Id(1000)),
),
Input(
Program(Id(1000)),
),
Input(
Span.start(Id(406)),
),
Input(
Span.end(Id(406)),
),
],
},
)
With Panic
[src/tracked_struct/tracked_field.rs:58:9] self = salsa::tracked_struct::tracked_field::FieldIngredientImpl<calc::ir::Program> {
ingredient_index: IngredientIndex(
12,
),
field_index: 0,
}
Event: Event { thread_id: ThreadId(2), kind: WillCheckCancellation }
[src/function/maybe_changed_after.rs:154:9] &database_key_index = type_check_function(Id(1800))
[src/function/maybe_changed_after.rs:156:15] &old_memo.revisions.origin = Derived(
QueryEdges {
input_outputs: [
Input(
Program(Id(1000)),
),
Input(
Program.statements(Id(1000)),
),
Input(
Span.start(Id(406)),
),
Input(
Span.end(Id(406)),
),
],
},
)
The main difference is that the ingredient on which we run maybe_changed_after is different, AND so are the inputs. Looking at type_check_function then the inputs of the panic case make no sense. The query never calls Program.statements but it does call Function.args. I don't understand how it is happening but I suspect that we're incorrectly reusing a memo and turn a Function memo into a Program which would align with the panic
| pub name: FunctionId<'db>, | ||
|
|
||
| #[tracked] | ||
| name_span: Span<'db>, |
There was a problem hiding this comment.
Okay, I think I know what the issue is...
You changed the tracked struct's Configuration::create_ingredients to only create ingredients for tracked fields. I think this change is correct. However, you call Ingredient::Index::successor in tracked_field with the field_index which is the absolute offset including all fields -- including untracked fields. What this means is that we now add incorrect dependencies as soon as we have a tracked struct where a tracked field follows an untracked field.
I hope that helps. I let you take a stab at correctly mapping the field indices. It would be great to have a unit test demonstrating the behavior and possibly an assertion somewhere that would help us catch this sooner.
| let id = C::deref_struct(s); | ||
| let data = Self::data(zalsa.table(), id); | ||
|
|
||
| data.read_lock(zalsa.current_revision()); |
There was a problem hiding this comment.
Is the read_lock necessary here for an untracked field? It seems not because leak_fields does not call it.
There was a problem hiding this comment.
Not a 100% but the comment on Value::updated_at and the inline comment inside update does suggest to me that the lock is required before accessing any field (handing out references to field) and it not being present in leak_field seems like a bug.
MichaReiser
left a comment
There was a problem hiding this comment.
thanks. This is great. I think we should be able to optimize the implementation to avoid creating any field-ingredients if all fields are tracked.
|
Thanks. This looks good to me. I'll give @nikomatsakis a chance to review |
|
I realized where I missed remapping when I tried relative indices. I think I can get it to work now. |
|
@ibraheemdev OK |
|
I think my preference would be to either create ingredients only for tracked fields or ALWAYS create ingredients no matter what. The current "create nothing or create extra ingredients" feels a bit confusing. |
136e285 to
59c5172
Compare
|
I updated the implementation to only create ingredients for tracked fields. This should be good to go now. |
59c5172 to
c710526
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
This is great. I've one possible follow up improvement that we could make. Let me know if you want to tackle this as part of this PR or as a follow up. I otherwise think this is good to merge
src/tracked_struct.rs
Outdated
| &'db self, | ||
| db: &'db dyn crate::Database, | ||
| s: C::Struct<'db>, | ||
| _field_index: usize, |
There was a problem hiding this comment.
You removing the _field_index here makes me wonder if we should update revisions to only store the revisions for tracked fields because they seem wasted for untracked fields. Removing the bookkeeping for untracked fields could improve memory usage because it reduces the metadata stored for every tracked struct instance. It may also reduce the need to pass both relative and absolute indices to tracked_field.
However, this might also be something that's fine to follow up in a separate PR
|
I'll go ahead and merge it as is. We can follow up on the revision optimization in a separate PR |
| hash: crate::hash::hash(&C::id_fields(&fields)), | ||
| hash: crate::hash::hash(&C::untracked_fields(&fields)), |
There was a problem hiding this comment.
Note that IdentityHash's docs state
This includes the ingredient index of that struct type plus the hash of its id fields.
So if I see this right we have flipped the fields that we hash here now right? before we hashed the tracked fields, not hash the untracked fields which sounds wrong to me as the tracked ones are what gives us identity
There was a problem hiding this comment.
Flipping here is correct. We should have updated the documentation.
The identity of a tracked struct is now defined by all untracked fields. If any untracked field changes, then we just assume it's a different tracked struct. This "trick" ensures that we get away without tracking dependencies on individual fields because the id from rev a and rev b now no longer compare equal
There was a problem hiding this comment.
Yeah "id fields" have effectively been renamed to "untracked fields".
## Summary Transition to using coarse-grained tracked structs (depends on salsa-rs/salsa#657). For now, this PR doesn't add any `#[tracked]` fields, meaning that any changes cause the entire struct to be invalidated. It also changes `AstNodeRef` to be compared/hashed by pointer address, instead of performing a deep AST comparison. ## Test Plan This yields a 10-15% improvement on my machine (though weirdly some runs were 5-10% without being flagged as inconsistent by criterion, is there some non-determinism involved?). It's possible that some of this is unrelated, I'll try applying the patch to the current salsa version to make sure. --------- Co-authored-by: Micha Reiser <[email protected]>
This PR makes tracked structs coarse-grained by default. To declare a field independently tracked, you now use the
#[tracked]attribute. This is the opposite of the previous behavior, where#[id]effectively declared a field untracked. The#[id]attribute now has no effect. Importantly, reads of untracked fields do not require a read dependency to be declared.There is some duplicated code in the macros as we need to generate separate accessors for tracked vs. untracked fields and I would like these to be static (as opposed to branching at runtime).
I believe the guide documentation also needs to be updated.
Resolves #598.