Narrowing for class patterns in match statements by dcreager · Pull Request #15223 · astral-sh/ruff

dcreager · 2025-01-02T16:15:11Z

We now support class patterns in a match statement, adding a narrowing constraint that within the body of that match arm, we can assume that the subject is an instance of that class.

#14740

crates/red_knot_python_semantic/resources/mdtest/narrow/match.md

* main: [`ruff`] Avoid reporting when `ndigits` is possibly negative (`RUF057`) (#15234) Attribute panics to the mdtests that cause them (#15241) Show errors for attempted fixes only when passed `--verbose` (#15237) [`RUF`] Add rule to detect empty literal in deque call (`RUF025`) (#15104) TD003: remove issue code length restriction (#15175) Preserve multiline implicit concatenated strings in docstring positions (#15126) [`pyflakes`] Ignore errors in `@no_type_check` string annotations (`F722`, `F821`) (#15215) style(AIR302): rename removed_airflow_plugin_extension as check_airflow_plugin_extension (#15233) [`pylint`] Re-implement `unreachable` (`PLW0101`) (#10891) refactor(AIR303): move duplicate qualified_name.to_string() to Diagnostic argument (#15220) Misc. clean up to rounding rules (#15231) Avoid syntax error when removing int over multiple lines (#15230) Migrate renovate config (#15228) Remove `Type::tuple` in favor of `TupleType::from_elements` (#15218)

crates/red_knot_python_semantic/src/semantic_index/builder.rs

* main: (29 commits) [`ruff`] Dataclass enums (`RUF049`) (#15299) Better error message when `--config` is given a table key and a non-inline-table value (#15266) Update pre-commit dependencies (#15289) Don't fix in ecosystem check (#15267) Update Rust crate itertools to 0.14.0 (#15287) Remove accidental empty block at the bottom of `split-static-string (SIM905)` doc (#15290) Update Rust crate clearscreen to v4 (#15288) Update Rust crate insta to v1.42.0 (#15286) Update NPM Development dependencies (#15285) Update dependency uuid to v11.0.4 (#15284) Update dependency ruff to v0.8.6 (#15283) Update Rust crate syn to v2.0.95 (#15282) Update Rust crate matchit to v0.8.6 (#15281) Update Rust crate bstr to v1.11.3 (#15280) [red-knot] Future-proof `Type::is_disjoint_from()` (#15262) [red-knot] Improve `Type::is_disjoint_from()` for `KnownInstanceType`s (#15261) [red-knot] Minor simplifications and improvements to constraint narrowing logic (#15270) Allow assigning ellipsis literal as parameter default value (#14982) [red-knot] fix control flow for assignment expressions in elif tests (#15274) [`refurb`] Mark fix as unsafe when the right-hand side is a string (`FURB171`) (#15273) ...

carljm · 2025-01-06T18:38:24Z

Given the failing tests and the "WIP, not working yet" PR description, moving this to draft state for now? Let me know if you would like a review in the current state.

github-actions · 2025-01-06T19:35:31Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

dcreager · 2025-01-06T19:40:50Z

crates/red_knot_python_semantic/src/types/infer.rs

+        // TODO(dcreager): We're currently creating standalone expressions for top-level value and
+        // class patterns, but not for nested ones.  Consider always creating standalone
+        // expressions for these forms, even if they appear as subpatterns, to eliminate the
+        // difference here between `infer_match_pattern` and `infer_nested_match_pattern`.


This separation between infer_match_pattern and infer_match_pattern_impl has to do with how we handle nested subpatterns in the semantic index builder. (I've changed the name of the latter to infer_nested_match_pattern to better describe this.)

We currently only call add_pattern_constraint for top-level patterns, and do not recurse into subpatterns. That means, in particular, that we don't create a standalone expression for the value part of a value pattern nor for the class identifier in a class pattern. That means we need to know here whether we are looking at a top-level pattern (in which case we would call infer_standalone_expression) or a nested subpattern (in which case we would call infer_expression).

I've kept that separation, since it results in a smaller diff for this PR. But it seems brittle to have standalone expressions only for some value/class patterns, but not others. So I've added this TODO comment to consider updating the semantic index builder to always create standalone expressions. (And 252290e actually implements that, should we want to do that now.)

@sharkdp, do have strong opinions about the two choices?

I think the standalone-expressions system is brittle in general, not only in this this case. It requires hard-coded agreement between the semantic index builder and type inference about which expressions (in which syntactic positions) are standalone and which are not. This is just a particular case of that general problem. (In fact the brittleness problem is even more general than that; we also require hardcoded agreement between semantic-index builder and inference about e.g. what AST nodes we will/won't create a Definition for.)

The reason it works this way is because we want to minimize our overall number of Salsa ingredients, because they are associated with memory and performance overhead. But if we need to query the type of an expression as its own Salsa query (because it is either part of a definition or part of a constraint on a definition, which are things we need to query independently from the rest of the surrounding scope), then we need a Salsa ingredient for it. So we record it as a standalone expression.

In 252290e we would actually create both an Expression ingredient and a Constraint ingredient for every sub-pattern. I agree that this is less brittle and simpler to handle in inference. But it's also wasteful; we don't need those Salsa ingredients, because we will never need to query them separately from the top-level pattern. The logical conclusion of this approach would be to create an Expression for every expression, period. Which would indeed be simpler and less brittle (and is basically what I wanted to do in our bespoke inference-caching before we decided to adopt Salsa), but also prohibitively expensive.

(Creating a Constraint for each nested pattern is semantically odd as well, because nested patterns cannot function as an independent constraint on the type of any symbol, they are just potentially additional information -- e.g. information about the possible types of attributes -- within the context of the constraint applied by the top-level pattern.)

It's certainly possible that we could accept some inefficiency here for the sake of simpler code; we make that trade-off all the time. But I'm not convinced that it's warranted here; the current code is not that complex, and I think we arguably create more confusion by creating unused nested Constraint ingredients that aren't semantically sensible as standalone constraints.

So I would suggest to replace this TODO comment with something that explains why the distinction exists.

Thank you for the explanation! That's very helpful — I was able to figure out the "what", but the "why" is useful context.

Creating a Constraint for each nested pattern is semantically odd as well

Yes, definitely — I had also considered adding a different method that would have recursed into the subpatterns, creating the standalone expression but not the Constraint. But that would have just moved the complexity from the inference side to the builder.

So I would suggest to replace this TODO comment with something that explains why the distinction exists.

Done — please lmk if I got any of the details wrong

carljm

This is great, thank you! Sorry for the novel-length comments; I hope they are useful in clarifying some things.

crates/red_knot_python_semantic/resources/mdtest/narrow/match.md

carljm · 2025-01-06T21:09:18Z

crates/red_knot_python_semantic/src/types/infer.rs

+        // TODO(dcreager): We're currently creating standalone expressions for top-level value and
+        // class patterns, but not for nested ones.  Consider always creating standalone
+        // expressions for these forms, even if they appear as subpatterns, to eliminate the
+        // difference here between `infer_match_pattern` and `infer_nested_match_pattern`.


I think the standalone-expressions system is brittle in general, not only in this this case. It requires hard-coded agreement between the semantic index builder and type inference about which expressions (in which syntactic positions) are standalone and which are not. This is just a particular case of that general problem. (In fact the brittleness problem is even more general than that; we also require hardcoded agreement between semantic-index builder and inference about e.g. what AST nodes we will/won't create a Definition for.)

The reason it works this way is because we want to minimize our overall number of Salsa ingredients, because they are associated with memory and performance overhead. But if we need to query the type of an expression as its own Salsa query (because it is either part of a definition or part of a constraint on a definition, which are things we need to query independently from the rest of the surrounding scope), then we need a Salsa ingredient for it. So we record it as a standalone expression.

In 252290e we would actually create both an Expression ingredient and a Constraint ingredient for every sub-pattern. I agree that this is less brittle and simpler to handle in inference. But it's also wasteful; we don't need those Salsa ingredients, because we will never need to query them separately from the top-level pattern. The logical conclusion of this approach would be to create an Expression for every expression, period. Which would indeed be simpler and less brittle (and is basically what I wanted to do in our bespoke inference-caching before we decided to adopt Salsa), but also prohibitively expensive.

(Creating a Constraint for each nested pattern is semantically odd as well, because nested patterns cannot function as an independent constraint on the type of any symbol, they are just potentially additional information -- e.g. information about the possible types of attributes -- within the context of the constraint applied by the top-level pattern.)

It's certainly possible that we could accept some inefficiency here for the sake of simpler code; we make that trade-off all the time. But I'm not convinced that it's warranted here; the current code is not that complex, and I think we arguably create more confusion by creating unused nested Constraint ingredients that aren't semantically sensible as standalone constraints.

So I would suggest to replace this TODO comment with something that explains why the distinction exists.

carljm · 2025-01-06T21:18:55Z

crates/red_knot_python_semantic/src/semantic_index/builder.rs

                PatternConstraintKind::Singleton(singleton.value, guard)
            }
+            ast::Pattern::MatchClass(pattern) => {
+                let cls = self.add_standalone_expression(&pattern.cls);


Hmm, interesting. So normally the idea would be that when inferring types for a standalone-expression inference scope, we would only infer types for sub-expressions of the standalone expression. This is to ensure that we have a clear "owning inference scope" for every expression, and don't end up inferring some expressions multiple times, as part of different inference scopes.

But of course the inference code doesn't really care about what is actually nested inside what in the AST, as long as we do know which expressions belong to which inference scope, and only infer them there. So I don't think there is a problem with making pattern.cls be the standalone expression, as long as we know that pattern.arguments is "part of" the same inference scope.

The natural thing to do would be to have pattern itself be the standalone expression, since it is the parent of both cls and arguments. But since pattern is not an ast::Expr, we can't do that; at least not without reworking the Expression ingredient to be able to wrap nodes that are not actually expressions, which would be both invasive and semantically odd.

So I think what you've done here is the best option. But it may be worth an explicit comment over on the type inference side, that we consider pattern.arguments to be part of the inference scope of the standalone expression pattern.cls, even though they aren't actually nested in the AST. (Or an alternate phrasing would be that we use pattern.cls as a stand-in for the overall pattern, since its an expression.)

I've taken this into account in the new comment over in infer_match_pattern

The comment you wrote is excellent, and accurately reflects what I said! Unfortunately, after stepping away from the computer and going for a run, I realized that my description of the situation was wrong :/

The reason the nesting structure of the AST is important is that, given an AST node, we can only access its children; we have no way to walk up the AST, or find sibling nodes. And the whole point of a standalone expression is that we can enter type inference given only that node (when you call infer_expression_types in evaluate_match_pattern_class, in this case.)

In the current PR, when we are inferring types for the entire scope, we will hit the match-pattern node, infer its cls attribute as a standalone expression, and infer its arguments attribute as normal expressions. The arguments will not be inferred as part of the standalone expression inference region, they will be inferred as part of the outer scope inference region.

When we are inferring just the standalone expression region, we start with the expression that is the cls attribute of the pattern, and can only reach it and its sub-expressions.

For the purposes of the current PR, this works fine, because the only thing we care about for the type narrowing is the cls; we don't care about the arguments. But if you tried to pull out types for the argument expressions in evaluate_match_pattern_class, you would find they are not in the returned TypeInference.

I think, given that it's unclear exactly how much we will narrow based on arguments to a class patter, and how we will implement that, that we shouldn't go out out of our way for it now. So I still think the implementation you have here is good for now. We can cross the arguments bridge when we come to it. But we should update the comment(s) to accurately reflect that class pattern arguments are not currently inferred as part of the standalone expression inference region, but that's fine for now because at the moment we don't need them in our handling of the constraint.

Sorry for the confusion.

But we should update the comment(s) to accurately reflect that class pattern arguments are not currently inferred as part of the standalone expression inference region,

Done

carljm · 2025-01-06T21:29:01Z

crates/red_knot_python_semantic/src/visibility_constraints.rs

-                    Truthiness::Ambiguous
-                }
+                PatternConstraintKind::Singleton(..)
+                | PatternConstraintKind::Class(..)


We could sometimes infer a static truthiness here (and also for Singleton for that matter), but for now we can consider that a possible future feature, to be driven by actual demand. We don't need to add it as a TODO.

Co-authored-by: Carl Meyer <[email protected]>

dcreager · 2025-01-06T22:08:16Z

crates/red_knot_python_semantic/src/types/infer.rs

+        // TODO(dcreager): We're currently creating standalone expressions for top-level value and
+        // class patterns, but not for nested ones.  Consider always creating standalone
+        // expressions for these forms, even if they appear as subpatterns, to eliminate the
+        // difference here between `infer_match_pattern` and `infer_nested_match_pattern`.


Thank you for the explanation! That's very helpful — I was able to figure out the "what", but the "why" is useful context.

Creating a Constraint for each nested pattern is semantically odd as well

Yes, definitely — I had also considered adding a different method that would have recursed into the subpatterns, creating the standalone expression but not the Constraint. But that would have just moved the complexity from the inference side to the builder.

So I would suggest to replace this TODO comment with something that explains why the distinction exists.

Done — please lmk if I got any of the details wrong

dcreager · 2025-01-06T22:08:46Z

crates/red_knot_python_semantic/src/semantic_index/builder.rs

                PatternConstraintKind::Singleton(singleton.value, guard)
            }
+            ast::Pattern::MatchClass(pattern) => {
+                let cls = self.add_standalone_expression(&pattern.cls);


I've taken this into account in the new comment over in infer_match_pattern

dcreager · 2025-01-06T22:09:53Z

crates/red_knot_python_semantic/src/semantic_index/builder.rs

+        //
+        // See the comment in TypeInferenceBuilder::infer_match_pattern for more details.
+
        let guard = guard.map(|guard| self.add_standalone_expression(guard));


Does your comment below about inference scopes suggest that the guard of a pattern doesn't need to be wrapped in its own standalone expression? And instead that we can consider it part of the inference scope of the standalone expression that we create for the pattern? (That might require being exhaustive in creating a standalone expression for all pattern variants, even the ones we don't yet support for inference/narrowing?)

Currently the guard doesn't need to be its own standalone expression, because we don't actually do any narrowing based on it. But if/when we do, it will need to be its own standalone expression, because of what I described in my more recent comment; we can't actually include non-child nodes as part of the inference region of a standalone expression.

Does this suggest that it's worth a follow-on PR to allow standalone expressions to wrap ast::Pattern in addition to ast::Expr? (Or I guess ast::MatchCase, so that it can encompass both the pattern and the guard)

The other approach (which is already used for guards, although we don't use it on the inference side yet) is that the PatternConstraintKind can just store references to multiple expressions. (The code isn't super clear because it uses tuple variants instead of struct variants, but the optional second expression in both PatternConstraintKind::Singleton and PatternConstraintKind::Value is the guard expression, which is already recorded as a standalone expression in the semantic index builder.) So I don't think there's anything more to be done with guards, until we are ready to actually use them in narrowing.

The situation with class pattern arguments is more complex, since they aren't expressions either, but rather nested patterns. So we'll need to do something different there; one option is to allow an ast::Pattern itself to be an inference region (similar to a standalone expression). Another option might be to have PatternConstraintKind::Class recursively contain a vec of constraints for the arguments; this would be more similar to the recursive-standalone-expression approach you experimented with. It would result in more Constraint ingredients for large nested patterns, but in practice that might be fine, if such patterns aren't very common.

My suggestion was that we just punt on this until we actually want to narrow based on class-pattern arguments.

crates/red_knot_python_semantic/resources/mdtest/narrow/match.md

MichaReiser · 2025-01-07T07:30:25Z

crates/red_knot_python_semantic/src/semantic_index/builder.rs

                cases,
                range: _,
            }) => {
+                debug_assert!(self.current_match_case.is_none());


Nit: Using assert_eq gives more useful assertion messages if current_match_case is Some for whatever reason

Suggested change

debug_assert!(self.current_match_case.is_none());

debug_assert_eq!(self.current_match_case, None);

I had to also add Debug and PartialEq derivations to make this work

Co-authored-by: Micha Reiser <[email protected]>

* main: Use uv consistently throughout the documentation (#15302) [red-knot] Eagerly normalize `type[]` types (#15272) [`pyupgrade`] Split `UP007` to two individual rules for `Union` and `Optional` (`UP007`, `UP045`) (#15313) [red-knot] Improve symbol-lookup tracing (#14907) [red-knot] improve type shrinking coverage in red-knot property tests (#15297) [`flake8-return`] Recognize functions returning `Never` as non-returning (`RET503`) (#15298) [`flake8-bugbear`] Implement `class-as-data-structure` (`B903`) (#9601) Avoid treating newline-separated sections as sub-sections (#15311) Remove call when removing final argument from `format` (#15309) Don't enforce `object-without-hash-method` in stubs (#15310) Don't special-case class instances in binary expression inference (#15161) Upgrade zizmor to the latest version in CI (#15300)

carljm

Looks good, thank you!

* main: [`pylint`] Fix `unreachable` infinite loop (`PLW0101`) (#15278) fix invalid syntax in workflow file (#15357) [`pycodestyle`] Avoid false positives related to type aliases (`E252`) (#15356) [`flake8-builtins`] Disapply `A005` to stub files (#15350) Improve logging system using `logLevel`, avoid trace value (#15232) [`flake8-builtins`] Rename `A005` and improve its error message (#15348) Spruce up docs for pydoclint rules (#15325) [`flake8-type-checking`] Apply `TC008` more eagerly in `TYPE_CHECKING` blocks and disapply it in stubs (#15180) [red-knot] `knot_extensions` Python API (#15103) Display Union of Literals as a Literal (#14993) [red-knot] all types are assignable to object (#15332) [`ruff`] Parenthesize arguments to `int` when removing `int` would change semantics in `unnecessary-cast-to-int` (`RUF046`) (#15277) [`eradicate`] Correctly handle metadata blocks directly followed by normal blocks (`ERA001`) (#15330) Narrowing for class patterns in match statements (#15223) [red-knot] add call checking (#15200) Spruce up docs for `slice-to-remove-prefix-or-suffix` (`FURB188`) (#15328) [`internal`] Return statements in finally block point to end block for `unreachable` (`PLW0101`) (#15276) [`ruff`] Treat `)` as a regex metacharacter (`RUF043`, `RUF055`) (#15318) Use uv consistently throughout the documentation (#15302)

AlexWaygood added the ty Multi-file analysis & type inference label Jan 3, 2025

sharkdp reviewed Jan 3, 2025

View reviewed changes

crates/red_knot_python_semantic/resources/mdtest/narrow/match.md Outdated Show resolved Hide resolved

dcreager added 7 commits January 3, 2025 16:19

Start class pattern narrowing in match statement

fda0867

Don't special-case first match case

da6f519

Inline and reorder visit_match_case

b62e24b

Match subject is an _instance_ of the class guard

a208279

Remove tracing

dcc2646

Linter

a7bf2d7

dcreager force-pushed the dcreager/class-match branch from 147d53b to a7bf2d7 Compare January 3, 2025 21:34

Pattern constraint should apply to guard as well

8a05207

dcreager marked this pull request as ready for review January 3, 2025 21:54

dcreager requested review from AlexWaygood, MichaReiser and carljm as code owners January 3, 2025 21:54

dcreager commented Jan 3, 2025

View reviewed changes

crates/red_knot_python_semantic/src/semantic_index/builder.rs Show resolved Hide resolved

crates/red_knot_python_semantic/src/semantic_index/builder.rs Show resolved Hide resolved

dcreager added 2 commits January 6, 2025 10:25

Fix test case failures

9d2aefd

carljm marked this pull request as draft January 6, 2025 18:38

Create pattern constraints for subpatterns

252290e

Only top-level class patterns have standalone expressions

55c027a

dcreager commented Jan 6, 2025

View reviewed changes

dcreager marked this pull request as ready for review January 6, 2025 19:45

carljm approved these changes Jan 6, 2025

View reviewed changes

dcreager and others added 2 commits January 6, 2025 17:05

Replace TODO with explanation

da116f5

Apply suggestions from code review

be528fb

Co-authored-by: Carl Meyer <[email protected]>

dcreager commented Jan 6, 2025

View reviewed changes

MichaReiser reviewed Jan 7, 2025

View reviewed changes

dcreager and others added 4 commits January 7, 2025 10:38

Update crates/red_knot_python_semantic/src/semantic_index/builder.rs

474588c

Co-authored-by: Micha Reiser <[email protected]>

Update comment on standalone expression regions

2cb2c2b

Fix test failures

f6579fa

carljm approved these changes Jan 7, 2025

View reviewed changes

dcreager merged commit b2a0d68 into main Jan 7, 2025
21 checks passed

dcreager deleted the dcreager/class-match branch January 7, 2025 20:58

	debug_assert!(self.current_match_case.is_none());
	debug_assert_eq!(self.current_match_case, None);

Comments

Conversation

dcreager commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carljm commented Jan 6, 2025

Uh oh!

github-actions bot commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichaReiser Jan 7, 2025 • edited by AlexWaygood Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dcreager commented Jan 2, 2025 •

edited

Loading

github-actions bot commented Jan 6, 2025 •

edited

Loading

`ruff-ecosystem` results

MichaReiser Jan 7, 2025 •

edited by AlexWaygood

Loading