[ty] improve complex TDD-based narrowing#23201
[ty] improve complex TDD-based narrowing#23201mtshiba wants to merge 78 commits intoastral-sh:mainfrom
Conversation
Typing conformance resultsNo changes detected ✅ |
Memory usage reportSummary
Significant changesClick to expand detailed breakdownprefect
sphinx
trio
flake8
|
Merging this PR will improve performance by 19.82%
Performance Changes
Comparing |
d247e5c to
7182d3c
Compare
|
13ce7d1 to
eb8977b
Compare
|
My original idea didn't work, but instead I was able to suppress the exponential blowup by implementing a cache for |
2503e23 to
6ded4be
Compare
bda7146 to
61ea03f
Compare
61ea03f to
0e2ca45
Compare
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
invalid-key |
0 | 51 | 0 |
unresolved-attribute |
0 | 4 | 32 |
possibly-unresolved-reference |
0 | 20 | 0 |
invalid-argument-type |
2 | 3 | 12 |
unresolved-reference |
0 | 10 | 0 |
unsupported-operator |
0 | 1 | 6 |
invalid-return-type |
2 | 0 | 3 |
not-iterable |
0 | 0 | 2 |
unused-type-ignore-comment |
1 | 1 | 0 |
invalid-assignment |
0 | 0 | 1 |
no-matching-overload |
0 | 1 | 0 |
possibly-missing-attribute |
0 | 1 | 0 |
type-assertion-failure |
0 | 1 | 0 |
| Total | 5 | 93 | 56 |
| .. | ||
| }) => Some(!resolve_to_literal(operand)?), | ||
| _ => None, | ||
| #[derive(Copy, Clone)] |
There was a problem hiding this comment.
In the end, this didn't seem to do much to improve the performance degradation, but I don't think there's anything bad that can come from leaving it.
| self.record_narrowing_constraint(negated_predicate); | ||
| self.record_reachability_constraint(negated_predicate); | ||
| let predicate_id = self.record_narrowing_constraint(negated_predicate); | ||
| self.record_reachability_constraint_id(predicate_id); |
There was a problem hiding this comment.
Because the reachability constraint and the narrowing constraint point to the same predicate, we can reuse the same ID to save memory. That was the original intention, but this change was actually essential for this PR (it was essential for the Bindings::merge change https://github.com/astral-sh/ruff/pull/23201/changes#r2800954123).
Without this identification, it seems that narrowing would be mistakenly determined to be a non no-op when it should be no-op.
| place: ScopedPlaceId, | ||
| ) -> Type<'db> { | ||
| self.narrow_by_constraint_inner(db, predicates, id, base_ty, place, None) | ||
| let mut memo = FxHashMap::default(); |
There was a problem hiding this comment.
Caching helps mitigate the exponential blowup reported in 5fd9a9c.
| { | ||
| ScopedNarrowingConstraint::ALWAYS_TRUE | ||
| } else { | ||
| // A branch contributes narrowing only when it is reachable. |
There was a problem hiding this comment.
This technique is possible because narrowing and reachability are now managed in the same data structure. The narrowing of a/b is triggered when the reachability of a/b is ALWAYS_TRUE.
That is, gate each narrowing constraint with each reachability constraint with and.
|
I've tried various things, but this is the limit. I'm out of ideas for optimization, and the codspeed profiling results show no abnormal hot spots. So I'll mark this PR as ready for review. |
fd08b52 to
a82538c
Compare
6146f04 to
52c4104
Compare
|
|
||
| /// Inner recursive helper that accumulates narrowing constraints along each TDD path. | ||
| #[allow(clippy::too_many_arguments)] | ||
| fn narrow_by_constraint_inner<'db>( |
There was a problem hiding this comment.
This function was the hotspot in most of the performance degradation cases, as we started to build complex narrowing TDD and the number of recursive calls grew significantly.
crates/ty_python_semantic/src/semantic_index/reachability_constraints.rs
Outdated
Show resolved
Hide resolved
| // Fast path for intersection types: use set-based subset check instead of | ||
| // the full `has_relation_to` machinery. This is critical for narrowing where | ||
| // many intersection types with overlapping positive elements are produced. | ||
| if let (Type::Intersection(self_inter), Type::Intersection(other_inter)) = (self, other) { |
There was a problem hiding this comment.
I'm not sure I understand this comment: why is it important for this fast path to be in Type::has_relation_to() (an uncached function) rather than Type::has_relation_to_impl (a cached function)? Can this fast path not be applied to the (Type::Intersection(...), Type::Intersection(...)) branch of Type::has_relation_to_impl?
There was a problem hiding this comment.
(This change was split into #23555)
Looking at the codspeed result, it seems that placing this processing outside of is_redundant_with_impl(has_relation_to_impl) actually improves performance.
I think the cache hit rate is low in cases where this fast path is effective.
|
Thank you very much for working on this.
I know this is a bit of work, but it would really great if we could split out all of the optimizations that make sense in isolation.
a lot of these sound like they might improve performance on |
c298c02 to
f879416
Compare
Co-Authored-By: Alex Waygood <[email protected]>
f879416 to
0f3374e
Compare
## Summary From #23201 (comment) Let's merge this benchmark into main ahead of #23201 so that we can see the impact of the changes. ## Test Plan `benchmark_large_union_narrowing` added to `ruff_benchmark/ty.rs`
## Summary From #23201 (comment) This could bring a significant performance improvement to main, so I'm breaking it out as a standalone PR. ## Test Plan N/A --------- Co-authored-by: David Peter <[email protected]>
|
@mtshiba I'm moving this to draft for now, as it is my understanding that we're still pulling things out of this PR. Please feel free to move it back to review once it's ready. If you do that before, say, March 6th, please feel free to unassign myself as reviewer and close+open this PR to get someone else as reviewer, as I'm out for a few days. |
## Summary Implement hash-consing for several fields in `UseDefMap` to reduce memory usage. The rationale for this optimization is the size and high duplication rate of the structs used in `UseDefMap`. The size of each struct is: * Bindings: 40 * PlaceState: 64 * ReachableDefinitions: 64 * EnclosingSnapshot: 40 These elements are stored within `IndexVec` where duplication of elements can occur. For example, taking these statistics (duplication rate) with hydra-zen yields the following: - bindings_by_use: 65.13% - end_of_scope_members: 57.37% - enclosing_snapshots: 57.18% - reachable_definitions_by_member: 34.58% - declarations_by_binding: 32.70% - bindings_by_definition: 27.84% - end_of_scope_symbols: 21.25% - reachable_definitions_by_symbol: 15.70% The disparity in duplication rates between symbols and members was a bit surprising, but can be explained as follows: `obj.attr` and `obj["k"]` might be tracked, but often end up in a similar undefined/unbound state, leading to a large number of identical `PlaceState` instances. This PR implements hash-consing for these fields. Specifically, each `IndexVec` for these fields will only store IDs for each struct, and the actual struct instances will be stored in another `IndexVec` without duplication. According to the memory usage report, this change reduced the size of `SemanticIndex` by about 10%. ## Performance analysis What this PR is trying to do involves a trade-off between time complexity and space complexity. In #23201, I aimed to achieve a more significant reduction in memory consumption by interning `Bindings`, but this resulted in a major regression in time performance, so I decided to manually intern only items that seemed to have a high effect. This PR change is not very effective for microbenchmarks, and rather adds a time cost (due to hash calculations). However, looking at the trends in the memory report, it seems that for large codebases, it can achieve significant memory savings with relatively little overhead. The larger the `UseDefMap`, the greater the reduction, so I think it's worth to do this. The commit that achieved the greatest memory consumption reduction in this PR was b9d067a, but it was reverted because it exceeded the threshold for codspeed microbenchmarks. There was almost no impact on walltime benchmarks (in fact, some even showed slight improvements). ## Test Plan N/A
Summary
An unsolved issue in #23109 is that the following narrowing doesn't work properly:
This was attempted to be fixed in ada1084, but was abandoned due to the exponential blowup (5fd9a9c).I've come up with a different approach to this, so I'll try it out to see if it works.
What this PR does is simple: when merging bindings, it "gates" narrowing constraints with reachability constraints.
This allows us to exclude types from unreachable paths.
The problem is that this increases the size of the TDD used for narrowing, which causes serious performance degradation.
This PR also implements optimizations to prevent this.
In order of importance:
narrow_by_constraintcache: Without it, CI would be so slow that it would not end (989268d).NarrowingConstraintstructure:intersection_disjunctnow has multiple types (Conjunctions), and intersection construction is now delayed (a1e303e).all_negative_narrowing_constraints_for_expression:all_narrowing_constraints_for_expressionnow calculates both positive and negative values simultaneously (8468a9e).build_predicate: This did not seem to have much effect on this PR....
↓ P.S. spitted to
ecosystem-analyzer timing results showed a significant regression in egglog-python that was missed by codspeed. Each commit gradually mitigates this regression.
narrow_by_constraint_innerall_negative_narrowing_constraints_for_...NarrowingConstrainthasCunjunctionsTest Plan
mdtest updated