Narrowing for class patterns in match statements#15223
Conversation
crates/red_knot_python_semantic/resources/mdtest/narrow/match.md
Outdated
Show resolved
Hide resolved
* main: [`ruff`] Avoid reporting when `ndigits` is possibly negative (`RUF057`) (#15234) Attribute panics to the mdtests that cause them (#15241) Show errors for attempted fixes only when passed `--verbose` (#15237) [`RUF`] Add rule to detect empty literal in deque call (`RUF025`) (#15104) TD003: remove issue code length restriction (#15175) Preserve multiline implicit concatenated strings in docstring positions (#15126) [`pyflakes`] Ignore errors in `@no_type_check` string annotations (`F722`, `F821`) (#15215) style(AIR302): rename removed_airflow_plugin_extension as check_airflow_plugin_extension (#15233) [`pylint`] Re-implement `unreachable` (`PLW0101`) (#10891) refactor(AIR303): move duplicate qualified_name.to_string() to Diagnostic argument (#15220) Misc. clean up to rounding rules (#15231) Avoid syntax error when removing int over multiple lines (#15230) Migrate renovate config (#15228) Remove `Type::tuple` in favor of `TupleType::from_elements` (#15218)
147d53b to
a7bf2d7
Compare
* main: (29 commits) [`ruff`] Dataclass enums (`RUF049`) (#15299) Better error message when `--config` is given a table key and a non-inline-table value (#15266) Update pre-commit dependencies (#15289) Don't fix in ecosystem check (#15267) Update Rust crate itertools to 0.14.0 (#15287) Remove accidental empty block at the bottom of `split-static-string (SIM905)` doc (#15290) Update Rust crate clearscreen to v4 (#15288) Update Rust crate insta to v1.42.0 (#15286) Update NPM Development dependencies (#15285) Update dependency uuid to v11.0.4 (#15284) Update dependency ruff to v0.8.6 (#15283) Update Rust crate syn to v2.0.95 (#15282) Update Rust crate matchit to v0.8.6 (#15281) Update Rust crate bstr to v1.11.3 (#15280) [red-knot] Future-proof `Type::is_disjoint_from()` (#15262) [red-knot] Improve `Type::is_disjoint_from()` for `KnownInstanceType`s (#15261) [red-knot] Minor simplifications and improvements to constraint narrowing logic (#15270) Allow assigning ellipsis literal as parameter default value (#14982) [red-knot] fix control flow for assignment expressions in elif tests (#15274) [`refurb`] Mark fix as unsafe when the right-hand side is a string (`FURB171`) (#15273) ...
|
Given the failing tests and the "WIP, not working yet" PR description, moving this to draft state for now? Let me know if you would like a review in the current state. |
|
| // TODO(dcreager): We're currently creating standalone expressions for top-level value and | ||
| // class patterns, but not for nested ones. Consider always creating standalone | ||
| // expressions for these forms, even if they appear as subpatterns, to eliminate the | ||
| // difference here between `infer_match_pattern` and `infer_nested_match_pattern`. |
There was a problem hiding this comment.
This separation between infer_match_pattern and infer_match_pattern_impl has to do with how we handle nested subpatterns in the semantic index builder. (I've changed the name of the latter to infer_nested_match_pattern to better describe this.)
We currently only call add_pattern_constraint for top-level patterns, and do not recurse into subpatterns. That means, in particular, that we don't create a standalone expression for the value part of a value pattern nor for the class identifier in a class pattern. That means we need to know here whether we are looking at a top-level pattern (in which case we would call infer_standalone_expression) or a nested subpattern (in which case we would call infer_expression).
I've kept that separation, since it results in a smaller diff for this PR. But it seems brittle to have standalone expressions only for some value/class patterns, but not others. So I've added this TODO comment to consider updating the semantic index builder to always create standalone expressions. (And 252290e actually implements that, should we want to do that now.)
@sharkdp, do have strong opinions about the two choices?
There was a problem hiding this comment.
I think the standalone-expressions system is brittle in general, not only in this this case. It requires hard-coded agreement between the semantic index builder and type inference about which expressions (in which syntactic positions) are standalone and which are not. This is just a particular case of that general problem. (In fact the brittleness problem is even more general than that; we also require hardcoded agreement between semantic-index builder and inference about e.g. what AST nodes we will/won't create a Definition for.)
The reason it works this way is because we want to minimize our overall number of Salsa ingredients, because they are associated with memory and performance overhead. But if we need to query the type of an expression as its own Salsa query (because it is either part of a definition or part of a constraint on a definition, which are things we need to query independently from the rest of the surrounding scope), then we need a Salsa ingredient for it. So we record it as a standalone expression.
In 252290e we would actually create both an Expression ingredient and a Constraint ingredient for every sub-pattern. I agree that this is less brittle and simpler to handle in inference. But it's also wasteful; we don't need those Salsa ingredients, because we will never need to query them separately from the top-level pattern. The logical conclusion of this approach would be to create an Expression for every expression, period. Which would indeed be simpler and less brittle (and is basically what I wanted to do in our bespoke inference-caching before we decided to adopt Salsa), but also prohibitively expensive.
(Creating a Constraint for each nested pattern is semantically odd as well, because nested patterns cannot function as an independent constraint on the type of any symbol, they are just potentially additional information -- e.g. information about the possible types of attributes -- within the context of the constraint applied by the top-level pattern.)
It's certainly possible that we could accept some inefficiency here for the sake of simpler code; we make that trade-off all the time. But I'm not convinced that it's warranted here; the current code is not that complex, and I think we arguably create more confusion by creating unused nested Constraint ingredients that aren't semantically sensible as standalone constraints.
So I would suggest to replace this TODO comment with something that explains why the distinction exists.
There was a problem hiding this comment.
Thank you for the explanation! That's very helpful — I was able to figure out the "what", but the "why" is useful context.
Creating a
Constraintfor each nested pattern is semantically odd as well
Yes, definitely — I had also considered adding a different method that would have recursed into the subpatterns, creating the standalone expression but not the Constraint. But that would have just moved the complexity from the inference side to the builder.
So I would suggest to replace this
TODOcomment with something that explains why the distinction exists.
Done — please lmk if I got any of the details wrong
carljm
left a comment
There was a problem hiding this comment.
This is great, thank you! Sorry for the novel-length comments; I hope they are useful in clarifying some things.
| // TODO(dcreager): We're currently creating standalone expressions for top-level value and | ||
| // class patterns, but not for nested ones. Consider always creating standalone | ||
| // expressions for these forms, even if they appear as subpatterns, to eliminate the | ||
| // difference here between `infer_match_pattern` and `infer_nested_match_pattern`. |
There was a problem hiding this comment.
I think the standalone-expressions system is brittle in general, not only in this this case. It requires hard-coded agreement between the semantic index builder and type inference about which expressions (in which syntactic positions) are standalone and which are not. This is just a particular case of that general problem. (In fact the brittleness problem is even more general than that; we also require hardcoded agreement between semantic-index builder and inference about e.g. what AST nodes we will/won't create a Definition for.)
The reason it works this way is because we want to minimize our overall number of Salsa ingredients, because they are associated with memory and performance overhead. But if we need to query the type of an expression as its own Salsa query (because it is either part of a definition or part of a constraint on a definition, which are things we need to query independently from the rest of the surrounding scope), then we need a Salsa ingredient for it. So we record it as a standalone expression.
In 252290e we would actually create both an Expression ingredient and a Constraint ingredient for every sub-pattern. I agree that this is less brittle and simpler to handle in inference. But it's also wasteful; we don't need those Salsa ingredients, because we will never need to query them separately from the top-level pattern. The logical conclusion of this approach would be to create an Expression for every expression, period. Which would indeed be simpler and less brittle (and is basically what I wanted to do in our bespoke inference-caching before we decided to adopt Salsa), but also prohibitively expensive.
(Creating a Constraint for each nested pattern is semantically odd as well, because nested patterns cannot function as an independent constraint on the type of any symbol, they are just potentially additional information -- e.g. information about the possible types of attributes -- within the context of the constraint applied by the top-level pattern.)
It's certainly possible that we could accept some inefficiency here for the sake of simpler code; we make that trade-off all the time. But I'm not convinced that it's warranted here; the current code is not that complex, and I think we arguably create more confusion by creating unused nested Constraint ingredients that aren't semantically sensible as standalone constraints.
So I would suggest to replace this TODO comment with something that explains why the distinction exists.
| PatternConstraintKind::Singleton(singleton.value, guard) | ||
| } | ||
| ast::Pattern::MatchClass(pattern) => { | ||
| let cls = self.add_standalone_expression(&pattern.cls); |
There was a problem hiding this comment.
Hmm, interesting. So normally the idea would be that when inferring types for a standalone-expression inference scope, we would only infer types for sub-expressions of the standalone expression. This is to ensure that we have a clear "owning inference scope" for every expression, and don't end up inferring some expressions multiple times, as part of different inference scopes.
But of course the inference code doesn't really care about what is actually nested inside what in the AST, as long as we do know which expressions belong to which inference scope, and only infer them there. So I don't think there is a problem with making pattern.cls be the standalone expression, as long as we know that pattern.arguments is "part of" the same inference scope.
The natural thing to do would be to have pattern itself be the standalone expression, since it is the parent of both cls and arguments. But since pattern is not an ast::Expr, we can't do that; at least not without reworking the Expression ingredient to be able to wrap nodes that are not actually expressions, which would be both invasive and semantically odd.
So I think what you've done here is the best option. But it may be worth an explicit comment over on the type inference side, that we consider pattern.arguments to be part of the inference scope of the standalone expression pattern.cls, even though they aren't actually nested in the AST. (Or an alternate phrasing would be that we use pattern.cls as a stand-in for the overall pattern, since its an expression.)
There was a problem hiding this comment.
I've taken this into account in the new comment over in infer_match_pattern
There was a problem hiding this comment.
The comment you wrote is excellent, and accurately reflects what I said! Unfortunately, after stepping away from the computer and going for a run, I realized that my description of the situation was wrong :/
The reason the nesting structure of the AST is important is that, given an AST node, we can only access its children; we have no way to walk up the AST, or find sibling nodes. And the whole point of a standalone expression is that we can enter type inference given only that node (when you call infer_expression_types in evaluate_match_pattern_class, in this case.)
In the current PR, when we are inferring types for the entire scope, we will hit the match-pattern node, infer its cls attribute as a standalone expression, and infer its arguments attribute as normal expressions. The arguments will not be inferred as part of the standalone expression inference region, they will be inferred as part of the outer scope inference region.
When we are inferring just the standalone expression region, we start with the expression that is the cls attribute of the pattern, and can only reach it and its sub-expressions.
For the purposes of the current PR, this works fine, because the only thing we care about for the type narrowing is the cls; we don't care about the arguments. But if you tried to pull out types for the argument expressions in evaluate_match_pattern_class, you would find they are not in the returned TypeInference.
I think, given that it's unclear exactly how much we will narrow based on arguments to a class patter, and how we will implement that, that we shouldn't go out out of our way for it now. So I still think the implementation you have here is good for now. We can cross the arguments bridge when we come to it. But we should update the comment(s) to accurately reflect that class pattern arguments are not currently inferred as part of the standalone expression inference region, but that's fine for now because at the moment we don't need them in our handling of the constraint.
Sorry for the confusion.
There was a problem hiding this comment.
But we should update the comment(s) to accurately reflect that class pattern arguments are not currently inferred as part of the standalone expression inference region,
Done
| Truthiness::Ambiguous | ||
| } | ||
| PatternConstraintKind::Singleton(..) | ||
| | PatternConstraintKind::Class(..) |
There was a problem hiding this comment.
We could sometimes infer a static truthiness here (and also for Singleton for that matter), but for now we can consider that a possible future feature, to be driven by actual demand. We don't need to add it as a TODO.
Co-authored-by: Carl Meyer <[email protected]>
| // TODO(dcreager): We're currently creating standalone expressions for top-level value and | ||
| // class patterns, but not for nested ones. Consider always creating standalone | ||
| // expressions for these forms, even if they appear as subpatterns, to eliminate the | ||
| // difference here between `infer_match_pattern` and `infer_nested_match_pattern`. |
There was a problem hiding this comment.
Thank you for the explanation! That's very helpful — I was able to figure out the "what", but the "why" is useful context.
Creating a
Constraintfor each nested pattern is semantically odd as well
Yes, definitely — I had also considered adding a different method that would have recursed into the subpatterns, creating the standalone expression but not the Constraint. But that would have just moved the complexity from the inference side to the builder.
So I would suggest to replace this
TODOcomment with something that explains why the distinction exists.
Done — please lmk if I got any of the details wrong
| PatternConstraintKind::Singleton(singleton.value, guard) | ||
| } | ||
| ast::Pattern::MatchClass(pattern) => { | ||
| let cls = self.add_standalone_expression(&pattern.cls); |
There was a problem hiding this comment.
I've taken this into account in the new comment over in infer_match_pattern
| // | ||
| // See the comment in TypeInferenceBuilder::infer_match_pattern for more details. | ||
|
|
||
| let guard = guard.map(|guard| self.add_standalone_expression(guard)); |
There was a problem hiding this comment.
Does your comment below about inference scopes suggest that the guard of a pattern doesn't need to be wrapped in its own standalone expression? And instead that we can consider it part of the inference scope of the standalone expression that we create for the pattern? (That might require being exhaustive in creating a standalone expression for all pattern variants, even the ones we don't yet support for inference/narrowing?)
There was a problem hiding this comment.
Currently the guard doesn't need to be its own standalone expression, because we don't actually do any narrowing based on it. But if/when we do, it will need to be its own standalone expression, because of what I described in my more recent comment; we can't actually include non-child nodes as part of the inference region of a standalone expression.
There was a problem hiding this comment.
Does this suggest that it's worth a follow-on PR to allow standalone expressions to wrap ast::Pattern in addition to ast::Expr? (Or I guess ast::MatchCase, so that it can encompass both the pattern and the guard)
There was a problem hiding this comment.
The other approach (which is already used for guards, although we don't use it on the inference side yet) is that the PatternConstraintKind can just store references to multiple expressions. (The code isn't super clear because it uses tuple variants instead of struct variants, but the optional second expression in both PatternConstraintKind::Singleton and PatternConstraintKind::Value is the guard expression, which is already recorded as a standalone expression in the semantic index builder.) So I don't think there's anything more to be done with guards, until we are ready to actually use them in narrowing.
The situation with class pattern arguments is more complex, since they aren't expressions either, but rather nested patterns. So we'll need to do something different there; one option is to allow an ast::Pattern itself to be an inference region (similar to a standalone expression). Another option might be to have PatternConstraintKind::Class recursively contain a vec of constraints for the arguments; this would be more similar to the recursive-standalone-expression approach you experimented with. It would result in more Constraint ingredients for large nested patterns, but in practice that might be fine, if such patterns aren't very common.
My suggestion was that we just punt on this until we actually want to narrow based on class-pattern arguments.
| cases, | ||
| range: _, | ||
| }) => { | ||
| debug_assert!(self.current_match_case.is_none()); |
There was a problem hiding this comment.
Nit: Using assert_eq gives more useful assertion messages if current_match_case is Some for whatever reason
| debug_assert!(self.current_match_case.is_none()); | |
| debug_assert_eq!(self.current_match_case, None); |
There was a problem hiding this comment.
I had to also add Debug and PartialEq derivations to make this work
Co-authored-by: Micha Reiser <[email protected]>
* main: Use uv consistently throughout the documentation (#15302) [red-knot] Eagerly normalize `type[]` types (#15272) [`pyupgrade`] Split `UP007` to two individual rules for `Union` and `Optional` (`UP007`, `UP045`) (#15313) [red-knot] Improve symbol-lookup tracing (#14907) [red-knot] improve type shrinking coverage in red-knot property tests (#15297) [`flake8-return`] Recognize functions returning `Never` as non-returning (`RET503`) (#15298) [`flake8-bugbear`] Implement `class-as-data-structure` (`B903`) (#9601) Avoid treating newline-separated sections as sub-sections (#15311) Remove call when removing final argument from `format` (#15309) Don't enforce `object-without-hash-method` in stubs (#15310) Don't special-case class instances in binary expression inference (#15161) Upgrade zizmor to the latest version in CI (#15300)
* main: [`pylint`] Fix `unreachable` infinite loop (`PLW0101`) (#15278) fix invalid syntax in workflow file (#15357) [`pycodestyle`] Avoid false positives related to type aliases (`E252`) (#15356) [`flake8-builtins`] Disapply `A005` to stub files (#15350) Improve logging system using `logLevel`, avoid trace value (#15232) [`flake8-builtins`] Rename `A005` and improve its error message (#15348) Spruce up docs for pydoclint rules (#15325) [`flake8-type-checking`] Apply `TC008` more eagerly in `TYPE_CHECKING` blocks and disapply it in stubs (#15180) [red-knot] `knot_extensions` Python API (#15103) Display Union of Literals as a Literal (#14993) [red-knot] all types are assignable to object (#15332) [`ruff`] Parenthesize arguments to `int` when removing `int` would change semantics in `unnecessary-cast-to-int` (`RUF046`) (#15277) [`eradicate`] Correctly handle metadata blocks directly followed by normal blocks (`ERA001`) (#15330) Narrowing for class patterns in match statements (#15223) [red-knot] add call checking (#15200) Spruce up docs for `slice-to-remove-prefix-or-suffix` (`FURB188`) (#15328) [`internal`] Return statements in finally block point to end block for `unreachable` (`PLW0101`) (#15276) [`ruff`] Treat `)` as a regex metacharacter (`RUF043`, `RUF055`) (#15318) Use uv consistently throughout the documentation (#15302)
We now support class patterns in a match statement, adding a narrowing constraint that within the body of that match arm, we can assume that the subject is an instance of that class.
#14740