Skip to content

Drop excess capacity from Suites during parsing#25368

Merged
MichaReiser merged 8 commits into
mainfrom
micha/parser-shrink-to-fit
May 29, 2026
Merged

Drop excess capacity from Suites during parsing#25368
MichaReiser merged 8 commits into
mainfrom
micha/parser-shrink-to-fit

Conversation

@MichaReiser
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser commented May 24, 2026

Summary

Shrink Suite vectors to drop the excess capacity during parsing.

I excluded other nodes for now to get a better sense of the performance and memory impact.

I'm a bit conflicted on this. This is a pretty huge memory improvement for ty, but there are a few linter benchmarks that regress by 1-2% because some vectors need to be copied to smaller allocations, which hurts performance. For the linter and formatter, it's also not important to shrink the vectors, because the AST is never stored for long. But this is different for ty where we cache the AST.

I'm curious to hear what others think on this. I could also try to reduce the places where we call shrink_to_fit, e.g., only for statements?

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 24, 2026

Merging this PR will improve performance by 5.67%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 23 improved benchmarks
✅ 102 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory ty_check_file[cold] 21.7 MB 20.8 MB +4.32%
Memory DateType 25.4 MB 24.3 MB +4.78%
Memory ty_micro[many_string_assignments] 16.2 MB 15.5 MB +4.77%
Memory ty_micro[pandas_tdd] 44.2 MB 42 MB +5.29%
Memory ty_micro[very_large_tuple] 14.9 MB 14.1 MB +5.12%
Memory ty_micro[large_isinstance_narrowing] 16.2 MB 15.4 MB +4.88%
Memory ty_micro[many_enum_members] 17.5 MB 16.7 MB +4.36%
Memory ty_micro[literal_match_fallthrough] 13.5 MB 12.8 MB +4.94%
Memory ty_micro[vararg_parameter_type_accumulation] 13.1 MB 12.5 MB +5%
Memory ty_micro[many_tuple_assignments] 14.3 MB 13.6 MB +4.59%
Memory ty_micro[complex_constrained_attributes_2] 15.1 MB 14.4 MB +5.05%
Memory ty_micro[literal_equality_fallthrough_guarded_any] 15.7 MB 15.1 MB +4.21%
Memory ty_micro[gradual_vararg_call] 14.7 MB 13.9 MB +5.19%
Memory ty_micro[complex_constrained_attributes_3] 16.6 MB 15.9 MB +4.63%
Memory ty_micro[complex_constrained_attributes_1] 15.1 MB 14.4 MB +5.05%
Memory ty_micro[many_enum_members_2] 16.3 MB 15.5 MB +4.79%
Memory ty_micro[many_protocol_members_mismatch] 21.1 MB 20.3 MB +4.21%
Memory ty_micro[many_tuple_assignments] 15.6 MB 14.9 MB +4.87%
Memory parser[pydantic/types.py] 403.2 KB 368.5 KB +9.41%
Memory parser[large/dataset.py] 862.2 KB 816.8 KB +5.56%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing micha/parser-shrink-to-fit (3ade505) with main (572e4b5)

Open in CodSpeed

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 24, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser reopened this May 24, 2026
Base automatically changed from micha/thin-vec-stmt to main May 24, 2026 15:23
@MichaReiser MichaReiser reopened this May 24, 2026
@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 24, 2026

Typing conformance results

No changes detected ✅

Current numbers
The percentage of diagnostics emitted that were expected errors held steady at 91.94%. The percentage of expected errors that received a diagnostic held steady at 87.09%. The number of fully passing files held steady at 92/134.

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 24, 2026

Memory usage report

Summary

Project Old New Diff Outcome
flake8 45.94MB 44.10MB -4.02% (1.85MB) ⬇️
trio 112.21MB 109.53MB -2.39% (2.68MB) ⬇️
sphinx 264.77MB 261.54MB -1.22% (3.22MB) ⬇️
prefect 714.89MB 711.43MB -0.48% (3.45MB) ⬇️

Significant changes

Click to expand detailed breakdown

flake8

Name Old New Diff Outcome
parsed_module 15.55MB 13.71MB -11.87% (1.85MB) ⬇️

trio

Name Old New Diff Outcome
parsed_module 23.70MB 21.02MB -11.31% (2.68MB) ⬇️

sphinx

Name Old New Diff Outcome
parsed_module 28.84MB 25.61MB -11.18% (3.22MB) ⬇️

prefect

Name Old New Diff Outcome
parsed_module 30.35MB 26.89MB -11.38% (3.45MB) ⬇️

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 24, 2026

ecosystem-analyzer results

No diagnostic changes detected ✅

Full report with detailed diff (timing results)

@MichaReiser MichaReiser marked this pull request as ready for review May 24, 2026 15:37
@MichaReiser MichaReiser requested a review from dhruvmanila as a code owner May 24, 2026 15:37
@astral-sh-bot astral-sh-bot Bot requested a review from ntBre May 24, 2026 15:37
@MichaReiser MichaReiser force-pushed the micha/parser-shrink-to-fit branch from 605800b to 974d9d7 Compare May 24, 2026 15:39
@dhruvmanila dhruvmanila added the parser Related to the parser label May 25, 2026
Copy link
Copy Markdown
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of this. Related to the linter regression, possible alternatives could be to introduce a ty-specific API in the parser crate or have a new field in the ParseOptions but I'm not a fan of either options given that they'll be part of the public API.

@charliermarsh
Copy link
Copy Markdown
Member

IMO the memory gains here are clearly worth a 1-2% linter regression.

@MichaReiser MichaReiser changed the title Drop excess capacity from Vecs when parsing Drop excess capacity from Vecs in parser May 26, 2026
@MichaReiser MichaReiser force-pushed the micha/parser-shrink-to-fit branch 4 times, most recently from cd1b259 to 67a08d4 Compare May 28, 2026 09:47
@MichaReiser MichaReiser marked this pull request as draft May 28, 2026 10:04
@MichaReiser MichaReiser force-pushed the micha/parser-shrink-to-fit branch from bb45a9d to 60d29d6 Compare May 29, 2026 12:22
@MichaReiser MichaReiser changed the title Drop excess capacity from Vecs in parser Drop excess capacity from Suite's during parsing May 29, 2026
@MichaReiser MichaReiser changed the title Drop excess capacity from Suite's during parsing Drop excess capacity from Suites during parsing May 29, 2026
@MichaReiser
Copy link
Copy Markdown
Member Author

I scoped this PR down to only cover statements to get a better sense for the performance trade-off when adding the same treatment to some other nodes.

I also tried a few different techniques to mitigate the performance regression with very limited success:

  • Use a pool of Suite buffers that ruff parses into. This allows us to have a few vecs with large capacity. The downside is that it's always necessary to copy all elements (minus module level where I did not apply this optimization). This was maybe marginally faster, but not worth the complexity.
  • Special case a body with a single statement because they are more common than I thought. This was maybe marginally faster, but not worth the complexity
  • Use a SmallVec of size 8

In short, Vec is pretty fast

@MichaReiser MichaReiser marked this pull request as ready for review May 29, 2026 18:11
@MichaReiser MichaReiser enabled auto-merge (squash) May 29, 2026 18:11
@MichaReiser MichaReiser merged commit 1518f1d into main May 29, 2026
57 of 58 checks passed
@MichaReiser MichaReiser deleted the micha/parser-shrink-to-fit branch May 29, 2026 18:17
@dhruvmanila dhruvmanila added the performance Potential performance improvement label Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parser Related to the parser performance Potential performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants