Skip to content

Use ThinVec in AST to shrink Stmt#25361

Merged
MichaReiser merged 6 commits into
mainfrom
micha/thin-vec-stmt
May 24, 2026
Merged

Use ThinVec in AST to shrink Stmt#25361
MichaReiser merged 6 commits into
mainfrom
micha/thin-vec-stmt

Conversation

@MichaReiser
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser commented May 23, 2026

Summary

Use a ThinVec wherever it lets us reduce the enclosing container's size.

Most runtime benchmarks are neutral. There are a couple where this improves performance by 1-2%, and very few where performance regresses by the same amount.

Given the memory reduction, I think this is a worthwhile tradeoff.

This is a memory usage optimization, not a performance optimization.

Test plan

Existing tests

@MichaReiser MichaReiser added the parser Related to the parser label May 23, 2026
@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 23, 2026

Typing conformance results

No changes detected ✅

Current numbers
The percentage of diagnostics emitted that were expected errors held steady at 89.52%. The percentage of expected errors that received a diagnostic held steady at 86.99%. The number of fully passing files held steady at 90/134.

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 23, 2026

Memory usage report

Summary

Project Old New Diff Outcome
flake8 47.99MB 46.91MB -2.25% (1.08MB) ⬇️
trio 114.47MB 112.89MB -1.38% (1.58MB) ⬇️
sphinx 262.81MB 260.92MB -0.72% (1.89MB) ⬇️
prefect 698.61MB 696.58MB -0.29% (2.03MB) ⬇️

Significant changes

Click to expand detailed breakdown

flake8

Name Old New Diff Outcome
parsed_module 17.42MB 16.34MB -6.19% (1.08MB) ⬇️

trio

Name Old New Diff Outcome
parsed_module 26.53MB 24.95MB -5.96% (1.58MB) ⬇️

sphinx

Name Old New Diff Outcome
parsed_module 32.27MB 30.37MB -5.86% (1.89MB) ⬇️

prefect

Name Old New Diff Outcome
parsed_module 34.02MB 31.98MB -5.97% (2.03MB) ⬇️

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 23, 2026

ecosystem-analyzer results

No diagnostic changes detected ✅

Full report with detailed diff (timing results)

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented May 23, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 23, 2026

Merging this PR will improve performance by 6.88%

⚡ 5 improved benchmarks
✅ 118 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory parser[numpy/globals.py] 19.7 KB 18 KB +9.29%
Memory parser[large/dataset.py] 912.2 KB 862.2 KB +5.8%
Memory parser[unicode/pypinyin.py] 58.7 KB 55.5 KB +5.77%
Memory parser[numpy/ctypeslib.py] 200.6 KB 186.7 KB +7.45%
Memory parser[pydantic/types.py] 427.9 KB 403.2 KB +6.12%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing micha/thin-vec-stmt (fecd00b) with main (0909c7e)

Open in CodSpeed

# * `Expr` - A single expression.
# * `Expr?` - Option type.
# * `Expr*` - A vector of Expr.
# * `&Expr*` - A boxed slice of Expr.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed &Expr* in favor of Box<[Expr]> because I couldn't think of a clear symbol to express a thin vec (and & is also somewhat obscure ;))

@MichaReiser MichaReiser force-pushed the micha/thin-vec-stmt branch from 391f781 to 0b2d5a3 Compare May 24, 2026 06:31
@MichaReiser MichaReiser force-pushed the micha/thin-vec-stmt branch from 0b2d5a3 to fecd00b Compare May 24, 2026 06:37
@MichaReiser MichaReiser marked this pull request as ready for review May 24, 2026 06:50
@MichaReiser MichaReiser requested a review from dhruvmanila as a code owner May 24, 2026 06:50
@astral-sh-bot astral-sh-bot Bot requested a review from ntBre May 24, 2026 06:50
Copy link
Copy Markdown
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you for looking into this!

@MichaReiser MichaReiser merged commit 6ab8b73 into main May 24, 2026
59 checks passed
@MichaReiser MichaReiser deleted the micha/thin-vec-stmt branch May 24, 2026 15:23
@AlexWaygood
Copy link
Copy Markdown
Member

For my own education: how do you decide whether to use ThinVec or a boxed slice for things like this?

@MichaReiser
Copy link
Copy Markdown
Member Author

MichaReiser commented May 26, 2026

I only used ThinVec in T when it helped to reduce the size of Sequence<T> or the size of an enum where T is a variant. E.g. Using a ThinVec for StmtClassDef helped reduce the size of Stmt. I didn't not change items in ExprDict, because ExprDict isn't the largest Expr variant and reducing its size is, therefore, not very meaningful.

@charliermarsh
Copy link
Copy Markdown
Member

I naively would've expected: ThinVec where we need mutability, boxed slice where the data is now immutable.

@MichaReiser
Copy link
Copy Markdown
Member Author

ThinVec is smaller than a boxed slice, because it moves the capacity and length into the heap allocation. This is also one of the main downsides. Reading the length now requires a pointer dereference

@AlexWaygood
Copy link
Copy Markdown
Member

These are AST nodes, so I think the data is always immutable (and I would also have naively gone for boxed slices here as well, for the same reason!). But taking up less space on the stack is obviously a very good reason to go for ThinVec, given the big memory savings here. Nice optimisation! 😃

@charliermarsh
Copy link
Copy Markdown
Member

Oh yeah I didn't realize it was even smaller, nice.

anishgirianish pushed a commit to anishgirianish/ruff that referenced this pull request May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants