Skip to content

[ty] Recognize string-literal types as subtypes of Sequence[Literal[chars]]#22415

Merged
AlexWaygood merged 22 commits intoastral-sh:mainfrom
jhartum:feat/string-literal-sequence-subtype-v2
Jan 18, 2026
Merged

[ty] Recognize string-literal types as subtypes of Sequence[Literal[chars]]#22415
AlexWaygood merged 22 commits intoastral-sh:mainfrom
jhartum:feat/string-literal-sequence-subtype-v2

Conversation

@jhartum
Copy link
Contributor

@jhartum jhartum commented Jan 6, 2026

Summary

Implements astral-sh/ty#2128: Literal["abba"] is now recognized as a subtype of Sequence[Literal["a", "b"]].

Changes

  • Add KnownClass::Sequence - New known class for typing.Sequence (an ABC, not a protocol)
  • Extend has_relation_to_impl - When checking if a string literal is a subtype of Sequence[X], verify that each unique character in the string is a subtype of X
  • Add mdtests - Cover positive cases, negative cases, and edge cases

Example

from collections.abc import Sequence
from typing import Literal

def func(tags: Sequence[Literal["a", "b"]]) -> None:
    pass

func("abba")  # Now OK - was incorrectly flagged as error

Implementation Details

The implementation in has_relation_to_impl:

  1. First tries the standard str fallback (for cases like Sequence[str])
  2. If that fails and the target is Sequence[X], extracts unique characters from the string literal
  3. Verifies each character is a subtype of X using when_all to properly accumulate constraints
  4. Returns the combined constraint set

Edge cases handled:

  • Empty strings (always valid)
  • Unicode characters
  • Multi-char literals in element type (correctly rejected - Literal["ab"]Literal["a", "b"])

Test Plan

  • Added mdtests in is_subtype_of.md
  • All existing mdtests pass

Closes astral-sh/ty#2128

@MichaReiser MichaReiser added the ty Multi-file analysis & type inference label Jan 6, 2026
@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 6, 2026

Typing conformance results

No changes detected ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 6, 2026

mypy_primer results

Changes were detected when running on open source projects
tornado (https://github.com/tornadoweb/tornado)
- tornado/gen.py:255:62: error[invalid-argument-type] Argument to bound method `__init__` is incorrect: Expected `None | Awaitable[Unknown] | list[Awaitable[Unknown]] | dict[Any, Awaitable[Unknown]] | Future[Unknown]`, found `_T@next | _VT@next | _T@next`
+ tornado/gen.py:255:62: error[invalid-argument-type] Argument to bound method `__init__` is incorrect: Expected `None | Awaitable[Unknown] | list[Awaitable[Unknown]] | dict[Any, Awaitable[Unknown]] | Future[Unknown]`, found `_T@next | _T@next | _VT@next`

static-frame (https://github.com/static-frame/static-frame)
- static_frame/core/bus.py:671:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[Bus[Any], object_]`, found `InterGetItemLocReduces[Bus[Any] | Bottom[Index[Any]] | TypeBlocks | ... omitted 6 union elements, object_]`
+ static_frame/core/bus.py:671:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[Bus[Any], object_]`, found `InterGetItemLocReduces[Bus[Any] | Bottom[Index[Any]] | Bottom[Series[Any, Any]] | ... omitted 6 union elements, object_]`
- static_frame/core/node_selector.py:526:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[TVContainer_co@InterfaceSelectQuartet, Any]`, found `InterGetItemLocReduces[Unknown | Bottom[Series[Any, Any]], Any]`
+ static_frame/core/node_selector.py:526:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[TVContainer_co@InterfaceSelectQuartet, Any]`, found `InterGetItemLocReduces[Bottom[Series[Any, Any]] | Unknown, Any]`
- static_frame/core/series.py:4072:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[SeriesHE[Any, Any], TVDtype@SeriesHE]`, found `InterGetItemILocReduces[Bottom[Series[Any, Any]] | Bottom[Index[Any]] | TypeBlocks | ... omitted 7 union elements, TVDtype@SeriesHE]`
+ static_frame/core/series.py:4072:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[SeriesHE[Any, Any], TVDtype@SeriesHE]`, found `InterGetItemILocReduces[Bottom[Series[Any, Any]] | ndarray[Never, Never] | TypeBlocks | ... omitted 7 union elements, TVDtype@SeriesHE]`
- static_frame/core/yarn.py:418:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Yarn[Any], object_]`, found `InterGetItemILocReduces[Yarn[Any] | Bottom[Index[Any]] | TypeBlocks | ... omitted 6 union elements, object_]`
+ static_frame/core/yarn.py:418:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Yarn[Any], object_]`, found `InterGetItemILocReduces[Yarn[Any] | ndarray[Never, Never] | TypeBlocks | ... omitted 6 union elements, object_]`

core (https://github.com/home-assistant/core)
+ homeassistant/util/variance.py:47:12: error[invalid-return-type] Return type does not match returned value: expected `(**_P@ignore_variance) -> _R@ignore_variance`, found `_Wrapped[_P@ignore_variance, _R@ignore_variance | int | float | datetime, _P@ignore_variance, _R@ignore_variance | int | float | datetime]`
- Found 14496 diagnostics
+ Found 14497 diagnostics

No memory usage changes detected ✅

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I haven't looked in depth yet, just one thing I spotted

@jhartum jhartum force-pushed the feat/string-literal-sequence-subtype-v2 branch 2 times, most recently from 7ff9718 to 7c62605 Compare January 6, 2026 10:11
@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 6, 2026

ecosystem-analyzer results

Lint rule Added Removed Changed
invalid-parameter-default 0 0 7
invalid-return-type 2 0 4
invalid-argument-type 2 1 2
unused-ignore-comment 0 2 0
Total 4 3 13

Full report with detailed diff (timing results)

@jhartum jhartum force-pushed the feat/string-literal-sequence-subtype-v2 branch from 7c62605 to 7a4a84f Compare January 6, 2026 10:21
…hars]]

Implements astral-sh#2128: `Literal["abba"]` is now recognized as a subtype of
`Sequence[Literal["a", "b"]]`.

Changes:
- Add `KnownClass::Sequence` as a known protocol class from `typing`
- Extend `has_relation_to_impl` to check if a string literal's unique
  characters are all subtypes of the Sequence's element type
- Add mdtests covering positive cases, negative cases, and edge cases

Closes astral-sh#2128
@jhartum jhartum force-pushed the feat/string-literal-sequence-subtype-v2 branch from 7a4a84f to 6425aa0 Compare January 6, 2026 10:37
jhartum

This comment was marked as duplicate.

@jhartum
Copy link
Contributor Author

jhartum commented Jan 6, 2026

Thanks! I haven't looked in depth yet, just one thing I spotted

Thanks for catching this! Fixed.

@AlexWaygood AlexWaygood changed the title feat(ty): Recognize string literals as subtypes of Sequence[Literal[chars]] [ty] Recognize string-literal types as subtypes of Sequence[Literal[chars]] Jan 6, 2026
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 6, 2026

Merging this PR will not alter performance

✅ 23 untouched benchmarks
⏩ 30 skipped benchmarks1


Comparing jhartum:feat/string-literal-sequence-subtype-v2 (528ded0) with main (57c98a1)

Open in CodSpeed

Footnotes

  1. 30 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@AlexWaygood
Copy link
Member

Uff. I think I know what I screwed up there.

AlexWaygood and others added 5 commits January 6, 2026 15:35
…mpatible targets

Add could_be_sequence_supertype() to skip expensive Sequence[Literal[chars]]
creation when target cannot possibly be a Sequence supertype (e.g., list[str],
tuple[int], TypeVar).

This complements the existing optimizations (FxHashSet dedup, compact_string,
UnionType::new) by avoiding the closure call entirely for common cases like
overload resolution with list[str] parameters.
@MichaReiser
Copy link
Member

We may have to do a profiling run on Altair to see where we spend more time now. I suspect we run into lock contention because we keep re-interning the same characters over and over again (as string literals). It might be worth adding some debug logging to count how many StringLiteral types we created by character to see if there are any that stand out

@AlexWaygood
Copy link
Member

AlexWaygood commented Jan 6, 2026

The optimisations I pushed got the time on the multithreaded benchmark down from 2.3s to 1.8s, so it's much better than it was... but a 23% performance regression is obviously far from ideal...

I have the same suspicion you do about the cause of the regression, but if that is the cause, what do we do about it? Can you think of any ways to avoid reinterning the same characters over and over here? I already pushed a change to only call StringLiteralType::new() on the characters after they've been deduplicated, to try to avoid the lock contention issues (and it did lead to a big speedup, but there's still an overall regression).

@AlexWaygood AlexWaygood requested a review from carljm January 6, 2026 18:41
@MichaReiser
Copy link
Member

MichaReiser commented Jan 8, 2026

The 5-10% memory usage increase is a bit concerning

Maybe a feature we should get back to in the future and drop for now?

@bxff
Copy link
Contributor

bxff commented Jan 9, 2026

I think this should help with the memory regression. The issue is we're doing expensive character processing for every StringLiteral vs NominalInstance comparison, even when it's clearly not a sequence type.

Here's a quick fix that adds a cheap MRO check before any allocations:

diff --git a/crates/ty_python_semantic/src/types/relation.rs b/crates/ty_python_semantic/src/types/relation.rs
index ab84708b65..e832ae59bf 100644
--- a/crates/ty_python_semantic/src/types/relation.rs
+++ b/crates/ty_python_semantic/src/types/relation.rs
@@ -3,6 +3,7 @@ use rustc_hash::FxHashSet;
 
 use crate::place::{DefinedPlace, Place};
 use crate::types::builder::RecursivelyDefined;
+use crate::types::class_base::ClassBase;
 use crate::types::constraints::{IteratorConstraintsExtension, OptionConstraintsExtension};
 use crate::types::enums::is_single_member_enum;
 use crate::types::{
@@ -1047,6 +1048,22 @@ impl<'db> Type<'db> {
                     return ConstraintSet::from(true);
                 }
 
+                if let Some(sequence_class) = KnownClass::Sequence.try_to_class_literal(db) {
+                    let is_sequence_subclass =
+                        sequence_class.iter_mro(db, None).any(|base| match base {
+                            ClassBase::Class(base_class) => {
+                                base_class.class_literal(db).0 == other_class.class_literal(db).0
+                            }
+                            _ => false,
+                        });
+
+                    if !is_sequence_subclass {
+                        return ConstraintSet::from(false);
+                    }
+                } else {
+                    return ConstraintSet::from(false);
+                }
+
                 let chars: FxHashSet<char> = value.value(db).chars().collect();

This cuts the memory overhead dramatically by only doing character interning when the target class is actually in the Sequence hierarchy. On the attrs benchmark: +61 MB → +10 MB (down to ~2.5% overhead from 16%). All 308 mdtests pass.

@AlexWaygood AlexWaygood marked this pull request as draft January 18, 2026 16:33
@AlexWaygood AlexWaygood marked this pull request as ready for review January 18, 2026 17:40
@AlexWaygood
Copy link
Member

Thanks @bxff! I applied a variant of that and it did indeed solve the memory-usage regression.

This PR now applies consistent rules for assignability/subtyping/redundancy, does not add any new Salsa caching, does not have any reported memory-usage regressions, and does not have any reported performance regressions. So I think it's good to go.

Thanks @jhartum!! Sorry that this one turned out to be a bit more complicated than we initially expected.

@AlexWaygood AlexWaygood merged commit bab571c into astral-sh:main Jan 18, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecosystem-analyzer ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Literal["abba"] should be a subtype of Sequence[Literal["a", "b"]]

5 participants