Skip over trivia tokens after re-lexing by MichaReiser · Pull Request #21895 · astral-sh/ruff

MichaReiser · 2025-12-10T14:10:26Z

Summary

Unlike the Lexer, the TokenSource skips over trivia token because they aren't semantically meaningful
during parsing. However, this hasn't been the case after re-lexing where we forgot to skip over any trivia,
which this PR adds

Fixes astral-sh/ty#1828

Test Plan

This PR extends our fixture to assert that a node start or end falls within a token range (because that woudl be weird).
I added regression tests for the two fixed bugs.

I'm not super happy with how the parser recovers but I'll create a separate PR with some discussion around that (I don't have a solution yet).

crates/ruff_python_parser/resources/valid/expressions/21538.py

MichaReiser · 2025-12-10T14:12:34Z

crates/ruff_python_parser/tests/snapshots/invalid_syntax@re_lex_logical_token.py.snap

 29 | if call(foo, [a, b
-   |                   ^ Syntax Error: Expected `]`, found NonLogicalNewline
+30 |     def bar():
+   |     ^^^ Syntax Error: Expected `]`, found `def`


I find those error messages more accurate. We only expect a ] because def is the start of a new keyword. It's otherwise perfectly fine to write a list over multiple lines.

Hmm, but then the following error message points at this location "expecting ), found Newline". I think this is still an improvement but just found this a bit inconsistent. No need to change anything here.

astral-sh-bot · 2025-12-10T14:12:51Z

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

astral-sh-bot · 2025-12-10T14:14:27Z

`mypy_primer` results

Changes were detected when running on open source projects

scikit-build-core (https://github.com/scikit-build/scikit-build-core)
- src/scikit_build_core/build/wheel.py:98:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 42 diagnostics
+ Found 41 diagnostics

No memory usage changes detected ✅

MichaReiser · 2025-12-10T14:15:21Z

crates/ruff_python_parser/tests/snapshots/invalid_syntax@re_lex_logical_token_mac_eol.py.snap

+1 | if call(foo, [a, b    def bar():        pass
+  |                   ^ Syntax Error: Expected `)`, found newline


I think this is bad but it's inherent to how we do recovery. The issue is that nesting is still greater than 1 when we're recovering in [ so that the NonLogicalNewline isn't relexed as a Newline. However, it is later re-lexed when we recover from the unclosed (, which feels a bit inconsistent.

MichaReiser · 2025-12-10T14:16:38Z

crates/ruff_python_parser/tests/snapshots/invalid_syntax@re_lexing__ty_1828.py.snap

+  |
+4 | class A:
+5 |     pass
+  |         ^ Syntax Error: unexpected EOF while parsing


This error message is just confusing because the class itself is fine. But it's because of a previous error.

MichaReiser · 2025-12-10T14:17:22Z

crates/ruff_python_parser/tests/snapshots/invalid_syntax@re_lexing__ty_1828.py.snap

+  |  _______^
+4 | | class A:
+5 | |     pass
+  | |_________^ Syntax Error: f-string: unterminated triple-quoted string


This is also just funny but it's the same problem. The recovery in f-string has a nesting level > 0, so it doesn't actually do anything.

Probably related to #11946

astral-sh-bot · 2025-12-10T14:21:43Z

`ruff-ecosystem` results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+2 -1 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

openai/openai-cookbook (+2 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --no-preview --select A,E703,F704,B015,B018,D100

+ examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:11:49: invalid-syntax: Expected `)`, found newline
- examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:11:49: invalid-syntax: Expected `}`, found NonLogicalNewline
+ examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:12:5: invalid-syntax: Expected `}`, found `return`

Changes by rule (1 rules affected)

code	total	+ violation	- violation	+ fix	- fix
invalid-syntax:	3	2	1	0	0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+2 -1 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

openai/openai-cookbook (+2 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview --select A,E703,F704,B015,B018,D100

+ examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:11:49: invalid-syntax: Expected `)`, found newline
- examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:11:49: invalid-syntax: Expected `}`, found NonLogicalNewline
+ examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:12:5: invalid-syntax: Expected `}`, found `return`

Changes by rule (1 rules affected)

code	total	+ violation	- violation	+ fix	- fix
invalid-syntax:	3	2	1	0	0

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

dhruvmanila · 2025-12-11T05:27:23Z

crates/ruff_python_parser/src/token_source.rs

    pub(crate) fn re_lex_logical_token(&mut self) {
-        let mut non_logical_newline_start = None;
-        for token in self.tokens.iter().rev() {
+        let mut non_logical_line = None;


nit: non_logical_newline ?

dhruvmanila · 2025-12-11T05:33:01Z

crates/ruff_python_parser/src/token_source.rs

+        // Ensure `current` is positioned at a non-trivia token.
+        if self.current_kind().is_trivia() {
+            self.bump(self.current_kind());
        }


Is this the main part of the fix based on the PR description?

Yes, the other changes are mainly added assertions and some safe-guarding that we never remove any tokens that came before the non logical newline

* origin/main: (29 commits) Document range suppressions, reorganize suppression docs (#21884) Ignore ruff:isort like ruff:noqa in new suppressions (#21922) [ty] Handle `Definition`s in `SemanticModel::scope` (#21919) [ty] Attach salsa db when running ide tests for easier debugging (#21917) [ty] Don't show hover for expressions with no inferred type (#21924) [ty] avoid unions of generic aliases of the same class in fixpoint (#21909) [ty] Squash false positive logs for failing to find `builtins` as a real module [ty] Uniformly use "not supported" in diagnostics (#21916) [ty] Reduce size of ty-ide snapshots (#21915) [ty] Adjust scope completions to use all reachable symbols [ty] Rename `all_members_of_scope` to `all_end_of_scope_members` [ty] Remove `all_` prefix from some routines on UseDefMap Enable `--document-private-items` for `ruff_python_formatter` (#21903) Remove `BackwardsTokenizer` based `parenthesized_range` references in `ruff_linter` (#21836) [ty] Revert "Do not infer types for invalid binary expressions in annotations" (#21914) Skip over trivia tokens after re-lexing (#21895) [ty] Avoid inferring types for invalid binary expressions in string annotations (#21911) [ty] Improve overload call resolution tracing (#21913) [ty] fix missing heap_size on Salsa query (#21912) [ty] Support implicit type of `cls` in signatures (#21771) ...

* origin/main: (36 commits) [ty] Defer all parameter and return type annotations (#21906) [ty] Fix workspace symbols to return members too (#21926) Document range suppressions, reorganize suppression docs (#21884) Ignore ruff:isort like ruff:noqa in new suppressions (#21922) [ty] Handle `Definition`s in `SemanticModel::scope` (#21919) [ty] Attach salsa db when running ide tests for easier debugging (#21917) [ty] Don't show hover for expressions with no inferred type (#21924) [ty] avoid unions of generic aliases of the same class in fixpoint (#21909) [ty] Squash false positive logs for failing to find `builtins` as a real module [ty] Uniformly use "not supported" in diagnostics (#21916) [ty] Reduce size of ty-ide snapshots (#21915) [ty] Adjust scope completions to use all reachable symbols [ty] Rename `all_members_of_scope` to `all_end_of_scope_members` [ty] Remove `all_` prefix from some routines on UseDefMap Enable `--document-private-items` for `ruff_python_formatter` (#21903) Remove `BackwardsTokenizer` based `parenthesized_range` references in `ruff_linter` (#21836) [ty] Revert "Do not infer types for invalid binary expressions in annotations" (#21914) Skip over trivia tokens after re-lexing (#21895) [ty] Avoid inferring types for invalid binary expressions in string annotations (#21911) [ty] Improve overload call resolution tracing (#21913) ...

dhruvmanila · 2025-12-12T05:20:02Z

Do we need to change anything in TokenIterWithContext here:

ruff/crates/ruff_python_ast/src/token/tokens.rs

Lines 300 to 305 in 0138cd2

    
           // This mimics the behavior of re-lexing which reduces the nesting level on the lexer. 
        
           // We don't need to reduce it by 1 because unlike the lexer we see the final token 
        
           // after recovering from every unclosed parenthesis. 
        
           TokenKind::Newline if self.nesting > 0 => { 
        
               self.nesting = 0; 
        
           }

which tries to "mimic" this behavior of recovery. I think no, but just wanted to check-in with you.

MichaReiser · 2025-12-12T06:13:22Z

I don't think so

MichaReiser requested a review from dhruvmanila as a code owner December 10, 2025 14:10

MichaReiser added bug Something isn't working parser Related to the parser labels Dec 10, 2025

dylwil3 reviewed Dec 10, 2025

View reviewed changes

crates/ruff_python_parser/resources/valid/expressions/21538.py Outdated Show resolved Hide resolved

MichaReiser commented Dec 10, 2025

View reviewed changes

MichaReiser force-pushed the micha/parser-trivia-relex branch from 1b77b58 to 05eded6 Compare December 10, 2025 14:21

Skip over trivia tokens after re-lexing

7d47f71

MichaReiser force-pushed the micha/parser-trivia-relex branch from 05eded6 to 7d47f71 Compare December 10, 2025 14:24

dhruvmanila approved these changes Dec 11, 2025

View reviewed changes

MichaReiser mentioned this pull request Dec 11, 2025

Remove BackwardsTokenizer based parenthesized_range references in ruff_linter #21836

Merged

Rename variable

5ecf3c3

MichaReiser enabled auto-merge (squash) December 11, 2025 10:40

MichaReiser merged commit aa27925 into main Dec 11, 2025
41 checks passed

MichaReiser deleted the micha/parser-trivia-relex branch December 11, 2025 10:45

BrewTestBot mentioned this pull request Dec 11, 2025

ruff 0.14.9 Homebrew/homebrew-core#258289

Merged

MichaReiser mentioned this pull request Dec 14, 2025

parser: remove invalid node from empty & unclosed container types #21703

Closed

		1 \| if call(foo, [a, b def bar(): pass
		\| ^ Syntax Error: Expected `)`, found newline

Comments

Conversation

MichaReiser commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astral-sh-bot bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Diagnostic diff on typing conformance tests

Uh oh!

astral-sh-bot bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astral-sh-bot bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dhruvmanila commented Dec 12, 2025

Uh oh!

MichaReiser commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MichaReiser commented Dec 10, 2025 •

edited

Loading

astral-sh-bot bot commented Dec 10, 2025 •

edited

Loading

astral-sh-bot bot commented Dec 10, 2025 •

edited

Loading

`mypy_primer` results

astral-sh-bot bot commented Dec 10, 2025 •

edited

Loading

`ruff-ecosystem` results