[ty] add support for nonlocal statements#19112
Conversation
|
CodSpeed WallTime Performance ReportMerging #19112 will not alter performanceComparing Summary
|
carljm
left a comment
There was a problem hiding this comment.
Great start! Tbh I think it would be fine to land something close to this (maybe get the easy syntax errors working) with TODOs for the impact on revealed type in outer scope, and handle that in separate follow-up.
| def g(): | ||
| nonlocal x | ||
| x += 1 | ||
| reveal_type(x) # revealed: Literal[2] |
There was a problem hiding this comment.
Yes, I think this is the revealed type we should be aiming for here
| nonlocal x | ||
| x += 1 | ||
| reveal_type(x) # revealed: Literal[2] | ||
| reveal_type(x) # revealed: Literal[2] |
There was a problem hiding this comment.
Here I think Literal[2] would be unsound. It's clearly wrong in this particular case, since g is never called, so x would still be 1 here. I don't think we want to get into detecting whether/when g is called, at least not for now, but we should aim to give a result that accurately encompasses the possibilities.
One option here is that we actually try to walk all nested scopes and find their nonlocal assignments to x, and union those into the possible type of x in this scope. This will often be cyclic, in that the type assigned to x in the inner scope may depend on its type in the outer scope, and so if we also introduce a dependency the other direction, we have a cycle. In fact, this very example would be cyclic, and iterating the cycle would build an ever-growing union of integer literals 1, 2, 3, ... until we hit the maximum size for such a union and fall back to int. (This would be accurately reflecting that we aren't trying to determine the maximum number of times g can be called, so we have to account for any number of possible calls.)
I think this is a level of accuracy that other type checkers don't attempt, and I'd rather not get into, certainly not in this PR. Pyright seems content to totally ignore the possibility of a nonlocal write from an inner scope, even in cases where it definitely occurs.
Another approach is that we just enforce that the nonlocal writes in the inner scope(s) respect the declared type (if any) of the name in the outer scope, and in the outer scope we refuse to narrow from that declared type, because we don't know where a call to the inner scope might occur and invalidate our narrowing. So in undeclared cases, like this one, the declared type would be Unknown. Ideally I think what we'd do is union the declared and inferred types, which would result in Unknown | Literal[1] here. This is saying "we know x is nonlocal in a nested scope, and we aren't trying to analyze what might be assigned to it there, therefore we don't know the upper bound on its possible type, but we know the lower bound is Literal[1], since we've seen that assigned in this scope."
In a case where x is declared in the outer scope (e.g. x: int = 1, which we'd also want to add a test for), inner scopes with nonlocal x would require that only things assignable to int can be assigned to x, and the outer scope would take int | Literal[1] as the type of x, which simplifies to int.
I would be inclined to go for the latter approach in this PR, but open to other suggestions/ideas.
Let me know if this doesn't make sense.
|
| global g, h | ||
|
|
||
| g: bool = True | ||
| g = True |
There was a problem hiding this comment.
This was actually a syntax error :)
nonlocal statementsnonlocal statements
|
@AlexWaygood this is now plausibly ready for a proper review :) I did a lot of copy-pasting here, from existing |
| | SemanticSyntaxErrorKind::LoadBeforeNonlocalDeclaration { .. } | ||
| | SemanticSyntaxErrorKind::NonlocalAndGlobal(_) | ||
| | SemanticSyntaxErrorKind::AnnotatedGlobal(_) | ||
| | SemanticSyntaxErrorKind::AnnotatedNonlocal(_) => { |
There was a problem hiding this comment.
@ntBre did you mention that you'd be willing to help me fill in "proper" Ruff support for these new cases? :)
There was a problem hiding this comment.
Yeah, for sure! I'll either handle them myself or at least add them to #17412.
| if let ast::Expr::Name(name) = &*node.target { | ||
| let symbol_id = self.add_symbol(name.id.clone()); | ||
| let scope_id = self.current_scope(); | ||
| // Check whether the variable has been declared global. | ||
| if let Some(globals) = self.globals_by_scope.get(&scope_id) { | ||
| if globals.contains(&symbol_id) { | ||
| self.report_semantic_error(SemanticSyntaxError { | ||
| kind: SemanticSyntaxErrorKind::AnnotatedGlobal( | ||
| name.id.as_str().into(), | ||
| ), | ||
| range: name.range, | ||
| python_version: self.python_version, | ||
| }); | ||
| } | ||
| } | ||
| // Check whether the variable has been declared nonlocal. | ||
| if let Some(nonlocals) = self.nonlocals_by_scope.get(&scope_id) { | ||
| if nonlocals.contains(&symbol_id) { | ||
| self.report_semantic_error(SemanticSyntaxError { | ||
| kind: SemanticSyntaxErrorKind::AnnotatedNonlocal( | ||
| name.id.as_str().into(), | ||
| ), | ||
| range: name.range, | ||
| python_version: self.python_version, | ||
| }); | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Our approach to semantic syntax errors up to now has been to implement them in the SemanticSyntaxChecker that is shared between Ruff and ty and expose any semantic information necessary to perform the check via the SemanticSyntaxContext. Is there a specific reason that you diverged from this pattern in this PR and decided to instead implement it in the SemanticIndexBuilder directly?
There was a problem hiding this comment.
Hah no of course there's no reason, I just haven't noticed SemanticSyntaxChecker yet :)
AlexWaygood
left a comment
There was a problem hiding this comment.
A review of the tests, which are overall fantastic!
There was a problem hiding this comment.
these tests are fantastically thorough -- thank you!
There was a problem hiding this comment.
Yeah, nice tests!
AlexWaygood
left a comment
There was a problem hiding this comment.
The code looks fantastic. I agree with @MichaReiser's comments that it feels like some of the logic in infer_nonlocal_statement might ideally go in the SemanticSyntaxChecker rather than ty, though -- then Ruff and ty can naturally share the same code rather than us having to implement it twice for the two static-analysis tools 😃 if that would be hard to do, though, I also think it would be fine to land an initial implementation here and then work towards porting it to the SemanticSyntaxChecker (or adding a standalone separate implementation in Ruff!) as a followup
|
I'll put this back in Draft. You can republish it when it is ready for another review. This makes it easier for reviewers to know when it's ready for another review. |
|
This PR is ready for another round of review. The big remaining question is whether to move the most complicated def f():
def g():
def h():
# This is allowed, because of the subsequent definition of `x` two scopes up.
nonlocal x
x = 1As it stands, I still have this check in
I think regardless of which of those three options we prefer, I'd rather leave it TODO in this PR and open a followup ticket. Let me know what you guys think? |
There was a problem hiding this comment.
Yeah, nice tests!
MichaReiser
left a comment
There was a problem hiding this comment.
Nice job with removing the extra hash map!
This is good to go once we removed the new lint (see my inline comment on how to accomplish that).
The best way to implement this check is probably to "defer" walking function bodies until after we've walked all the statements in enclosing scopes. This is what Ruff does today. This would still be one walk of the AST, but in the example above by the time we encountered nonlocal x we'd already have seen x = 1, and we could check the nonlocal requirement immediately. @erictraut mentioned in chat that that's how Pyright does it. However, even if we want to do that refactoring (any objections?), I'd rather not do it in this PR.
I think this is worth a try (scoped to a few hours). I also think it's fine to defer integrating this into the SemanticSyntaxChecker for now (maybe something @ntBre could help with as well). Would you mind opening an issue and adding TODO comments in the code (above the blocks that should be moved).
| /// Map from the file-local [`FileScopeId`] to the set of explicit-global symbols it contains. | ||
| globals_by_scope: FxHashMap<FileScopeId, FxHashSet<ScopedPlaceId>>, | ||
|
|
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
unresolved-reference |
0 | 333 | 0 |
possibly-unresolved-reference |
0 | 45 | 0 |
unused-ignore-comment |
8 | 0 | 0 |
invalid-argument-type |
0 | 1 | 0 |
| Total | 8 | 379 | 0 |
| pub(super) fn mark_place_global(&mut self, id: ScopedPlaceId) { | ||
| self.table.places[id].insert_flags(PlaceFlags::MARKED_GLOBAL); | ||
| } | ||
|
|
||
| pub(super) fn mark_place_nonlocal(&mut self, id: ScopedPlaceId) { | ||
| self.table.places[id].insert_flags(PlaceFlags::MARKED_NONLOCAL); | ||
| } |
There was a problem hiding this comment.
oh, this is a much better approach. Nice job!
233e9bb to
664a9a2
Compare
#19271 is a draft PR that tries to do this. It actually seems to work for the |
|
Haven't had a chance to look at the PR, but I expect that switching to a breadth first AST walk in semantic indexing would require significant reworking of our handling of eager nested scopes (finding the right bindings in an outer scope at the point where the eager nested scope is defined). |
|
My inclination would just be to land this as-is for now, and punt on questions such as how we implement this syntax error in Ruff, whether we should try to integrate this syntax error into the |
The initial implementation of `infer_nonlocal` landed in #19112 fails to report an error for this example: ```py x = 1 def f(): # This is only a usage of `x`, not a definition. It shouldn't be # enough to make the `nonlocal` statement below allowed. print(x) def g(): nonlocal x ``` Fix this by continuing to walk enclosing scopes when the place we've found isn't bound, declared, or `nonlocal`.
The initial implementation of `infer_nonlocal` landed in #19112 fails to report an error for this example: ```py x = 1 def f(): # This is only a usage of `x`, not a definition. It shouldn't be # enough to make the `nonlocal` statement below allowed. print(x) def g(): nonlocal x ``` Fix this by continuing to walk enclosing scopes when the place we've found isn't bound, declared, or `nonlocal`.
Related to: astral-sh/ty#220
Putting up a draft PR so I can ask questions about it. Comments below.This PR is now plausibly ready for a proper review.
After landing: