Skip to content

Comments

Skip over trivia tokens after re-lexing#21895

Merged
MichaReiser merged 2 commits intomainfrom
micha/parser-trivia-relex
Dec 11, 2025
Merged

Skip over trivia tokens after re-lexing#21895
MichaReiser merged 2 commits intomainfrom
micha/parser-trivia-relex

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Dec 10, 2025

Summary

Unlike the Lexer, the TokenSource skips over trivia token because they aren't semantically meaningful
during parsing. However, this hasn't been the case after re-lexing where we forgot to skip over any trivia,
which this PR adds

Fixes astral-sh/ty#1828

Test Plan

This PR extends our fixture to assert that a node start or end falls within a token range (because that woudl be weird).
I added regression tests for the two fixed bugs.

I'm not super happy with how the parser recovers but I'll create a separate PR with some discussion around that (I don't have a solution yet).

@MichaReiser MichaReiser added bug Something isn't working parser Related to the parser labels Dec 10, 2025
29 | if call(foo, [a, b
| ^ Syntax Error: Expected `]`, found NonLogicalNewline
30 | def bar():
| ^^^ Syntax Error: Expected `]`, found `def`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find those error messages more accurate. We only expect a ] because def is the start of a new keyword. It's otherwise perfectly fine to write a list over multiple lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but then the following error message points at this location "expecting ), found Newline". I think this is still an improvement but just found this a bit inconsistent. No need to change anything here.

@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 10, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 10, 2025

mypy_primer results

Changes were detected when running on open source projects
scikit-build-core (https://github.com/scikit-build/scikit-build-core)
- src/scikit_build_core/build/wheel.py:98:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 42 diagnostics
+ Found 41 diagnostics

No memory usage changes detected ✅

Comment on lines +121 to +122
1 | if call(foo, [a, b def bar(): pass
| ^ Syntax Error: Expected `)`, found newline
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is bad but it's inherent to how we do recovery. The issue is that nesting is still greater than 1 when we're recovering in [ so that the NonLogicalNewline isn't relexed as a Newline. However, it is later re-lexed when we recover from the unclosed (, which feels a bit inconsistent.

|
4 | class A:
5 | pass
| ^ Syntax Error: unexpected EOF while parsing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is just confusing because the class itself is fine. But it's because of a previous error.

Comment on lines +196 to +199
| _______^
4 | | class A:
5 | | pass
| |_________^ Syntax Error: f-string: unterminated triple-quoted string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also just funny but it's the same problem. The recovery in f-string has a nesting level > 0, so it doesn't actually do anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably related to #11946

@MichaReiser MichaReiser force-pushed the micha/parser-trivia-relex branch from 1b77b58 to 05eded6 Compare December 10, 2025 14:21
@astral-sh-bot
Copy link

astral-sh-bot bot commented Dec 10, 2025

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+2 -1 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

openai/openai-cookbook (+2 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --no-preview --select A,E703,F704,B015,B018,D100

+ examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:11:49: invalid-syntax: Expected `)`, found newline
- examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:11:49: invalid-syntax: Expected `}`, found NonLogicalNewline
+ examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:12:5: invalid-syntax: Expected `}`, found `return`

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
invalid-syntax: 3 2 1 0 0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+2 -1 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

openai/openai-cookbook (+2 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview --select A,E703,F704,B015,B018,D100

+ examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:11:49: invalid-syntax: Expected `)`, found newline
- examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:11:49: invalid-syntax: Expected `}`, found NonLogicalNewline
+ examples/mcp/databricks_mcp_cookbook.ipynb:cell 30:12:5: invalid-syntax: Expected `}`, found `return`

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
invalid-syntax: 3 2 1 0 0

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser force-pushed the micha/parser-trivia-relex branch from 05eded6 to 7d47f71 Compare December 10, 2025 14:24
pub(crate) fn re_lex_logical_token(&mut self) {
let mut non_logical_newline_start = None;
for token in self.tokens.iter().rev() {
let mut non_logical_line = None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: non_logical_newline ?

// Ensure `current` is positioned at a non-trivia token.
if self.current_kind().is_trivia() {
self.bump(self.current_kind());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the main part of the fix based on the PR description?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the other changes are mainly added assertions and some safe-guarding that we never remove any tokens that came before the non logical newline

@MichaReiser MichaReiser enabled auto-merge (squash) December 11, 2025 10:40
@MichaReiser MichaReiser merged commit aa27925 into main Dec 11, 2025
41 checks passed
@MichaReiser MichaReiser deleted the micha/parser-trivia-relex branch December 11, 2025 10:45
dcreager added a commit that referenced this pull request Dec 11, 2025
* origin/main: (29 commits)
  Document range suppressions, reorganize suppression docs (#21884)
  Ignore ruff:isort like ruff:noqa in new suppressions (#21922)
  [ty] Handle `Definition`s in `SemanticModel::scope` (#21919)
  [ty] Attach salsa db when running ide tests for easier debugging (#21917)
  [ty] Don't show hover for expressions with no inferred type (#21924)
  [ty] avoid unions of generic aliases of the same class in fixpoint (#21909)
  [ty] Squash false positive logs for failing to find `builtins` as a real module
  [ty] Uniformly use "not supported" in diagnostics (#21916)
  [ty] Reduce size of ty-ide snapshots (#21915)
  [ty] Adjust scope completions to use all reachable symbols
  [ty] Rename `all_members_of_scope` to `all_end_of_scope_members`
  [ty] Remove `all_` prefix from some routines on UseDefMap
  Enable `--document-private-items` for `ruff_python_formatter` (#21903)
  Remove `BackwardsTokenizer` based `parenthesized_range` references in `ruff_linter` (#21836)
  [ty] Revert "Do not infer types for invalid binary expressions in annotations" (#21914)
  Skip over trivia tokens after re-lexing (#21895)
  [ty] Avoid inferring types for invalid binary expressions in string annotations (#21911)
  [ty] Improve overload call resolution tracing (#21913)
  [ty] fix missing heap_size on Salsa query (#21912)
  [ty] Support implicit type of `cls` in signatures (#21771)
  ...
dcreager added a commit that referenced this pull request Dec 11, 2025
* origin/main: (36 commits)
  [ty] Defer all parameter and return type annotations (#21906)
  [ty] Fix workspace symbols to return members too (#21926)
  Document range suppressions, reorganize suppression docs (#21884)
  Ignore ruff:isort like ruff:noqa in new suppressions (#21922)
  [ty] Handle `Definition`s in `SemanticModel::scope` (#21919)
  [ty] Attach salsa db when running ide tests for easier debugging (#21917)
  [ty] Don't show hover for expressions with no inferred type (#21924)
  [ty] avoid unions of generic aliases of the same class in fixpoint (#21909)
  [ty] Squash false positive logs for failing to find `builtins` as a real module
  [ty] Uniformly use "not supported" in diagnostics (#21916)
  [ty] Reduce size of ty-ide snapshots (#21915)
  [ty] Adjust scope completions to use all reachable symbols
  [ty] Rename `all_members_of_scope` to `all_end_of_scope_members`
  [ty] Remove `all_` prefix from some routines on UseDefMap
  Enable `--document-private-items` for `ruff_python_formatter` (#21903)
  Remove `BackwardsTokenizer` based `parenthesized_range` references in `ruff_linter` (#21836)
  [ty] Revert "Do not infer types for invalid binary expressions in annotations" (#21914)
  Skip over trivia tokens after re-lexing (#21895)
  [ty] Avoid inferring types for invalid binary expressions in string annotations (#21911)
  [ty] Improve overload call resolution tracing (#21913)
  ...
@dhruvmanila
Copy link
Member

Do we need to change anything in TokenIterWithContext here:

// This mimics the behavior of re-lexing which reduces the nesting level on the lexer.
// We don't need to reduce it by 1 because unlike the lexer we see the final token
// after recovering from every unclosed parenthesis.
TokenKind::Newline if self.nesting > 0 => {
self.nesting = 0;
}

which tries to "mimic" this behavior of recovery. I think no, but just wanted to check-in with you.

@MichaReiser
Copy link
Member Author

I don't think so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic: Offset n is inside token FStringMiddle

3 participants