[ty] Recognize string-literal types as subtypes of Sequence[Literal[chars]]#22415
Conversation
Typing conformance resultsNo changes detected ✅ |
|
AlexWaygood
left a comment
There was a problem hiding this comment.
Thanks! I haven't looked in depth yet, just one thing I spotted
7ff9718 to
7c62605
Compare
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
invalid-parameter-default |
0 | 0 | 7 |
invalid-return-type |
2 | 0 | 4 |
invalid-argument-type |
2 | 1 | 2 |
unused-ignore-comment |
0 | 2 | 0 |
| Total | 4 | 3 | 13 |
7c62605 to
7a4a84f
Compare
…hars]] Implements astral-sh#2128: `Literal["abba"]` is now recognized as a subtype of `Sequence[Literal["a", "b"]]`. Changes: - Add `KnownClass::Sequence` as a known protocol class from `typing` - Extend `has_relation_to_impl` to check if a string literal's unique characters are all subtypes of the Sequence's element type - Add mdtests covering positive cases, negative cases, and edge cases Closes astral-sh#2128
7a4a84f to
6425aa0
Compare
Thanks for catching this! Fixed. |
Sequence[Literal[chars]]
Merging this PR will not alter performance
Comparing Footnotes
|
|
Uff. I think I know what I screwed up there. |
…mpatible targets Add could_be_sequence_supertype() to skip expensive Sequence[Literal[chars]] creation when target cannot possibly be a Sequence supertype (e.g., list[str], tuple[int], TypeVar). This complements the existing optimizations (FxHashSet dedup, compact_string, UnionType::new) by avoiding the closure call entirely for common cases like overload resolution with list[str] parameters.
This reverts commit 9e2c661.
…for incompatible targets" This reverts commit aeec2dc.
|
We may have to do a profiling run on Altair to see where we spend more time now. I suspect we run into lock contention because we keep re-interning the same characters over and over again (as string literals). It might be worth adding some debug logging to count how many |
|
The optimisations I pushed got the time on the multithreaded benchmark down from 2.3s to 1.8s, so it's much better than it was... but a 23% performance regression is obviously far from ideal... I have the same suspicion you do about the cause of the regression, but if that is the cause, what do we do about it? Can you think of any ways to avoid reinterning the same characters over and over here? I already pushed a change to only call |
|
The 5-10% memory usage increase is a bit concerning Maybe a feature we should get back to in the future and drop for now? |
|
I think this should help with the memory regression. The issue is we're doing expensive character processing for every Here's a quick fix that adds a cheap MRO check before any allocations: diff --git a/crates/ty_python_semantic/src/types/relation.rs b/crates/ty_python_semantic/src/types/relation.rs
index ab84708b65..e832ae59bf 100644
--- a/crates/ty_python_semantic/src/types/relation.rs
+++ b/crates/ty_python_semantic/src/types/relation.rs
@@ -3,6 +3,7 @@ use rustc_hash::FxHashSet;
use crate::place::{DefinedPlace, Place};
use crate::types::builder::RecursivelyDefined;
+use crate::types::class_base::ClassBase;
use crate::types::constraints::{IteratorConstraintsExtension, OptionConstraintsExtension};
use crate::types::enums::is_single_member_enum;
use crate::types::{
@@ -1047,6 +1048,22 @@ impl<'db> Type<'db> {
return ConstraintSet::from(true);
}
+ if let Some(sequence_class) = KnownClass::Sequence.try_to_class_literal(db) {
+ let is_sequence_subclass =
+ sequence_class.iter_mro(db, None).any(|base| match base {
+ ClassBase::Class(base_class) => {
+ base_class.class_literal(db).0 == other_class.class_literal(db).0
+ }
+ _ => false,
+ });
+
+ if !is_sequence_subclass {
+ return ConstraintSet::from(false);
+ }
+ } else {
+ return ConstraintSet::from(false);
+ }
+
let chars: FxHashSet<char> = value.value(db).chars().collect();This cuts the memory overhead dramatically by only doing character interning when the target class is actually in the Sequence hierarchy. On the |
|
Thanks @bxff! I applied a variant of that and it did indeed solve the memory-usage regression. This PR now applies consistent rules for assignability/subtyping/redundancy, does not add any new Salsa caching, does not have any reported memory-usage regressions, and does not have any reported performance regressions. So I think it's good to go. Thanks @jhartum!! Sorry that this one turned out to be a bit more complicated than we initially expected. |
Summary
Implements astral-sh/ty#2128:
Literal["abba"]is now recognized as a subtype ofSequence[Literal["a", "b"]].Changes
KnownClass::Sequence- New known class fortyping.Sequence(an ABC, not a protocol)has_relation_to_impl- When checking if a string literal is a subtype ofSequence[X], verify that each unique character in the string is a subtype ofXExample
Implementation Details
The implementation in
has_relation_to_impl:strfallback (for cases likeSequence[str])Sequence[X], extracts unique characters from the string literalXusingwhen_allto properly accumulate constraintsEdge cases handled:
Literal["ab"]≠Literal["a", "b"])Test Plan
is_subtype_of.mdCloses astral-sh/ty#2128