Skip to content

[ty] Pluck some low hanging performance fruit for completions#22630

Merged
BurntSushi merged 10 commits intomainfrom
ag/auto-import-perf1
Jan 20, 2026
Merged

[ty] Pluck some low hanging performance fruit for completions#22630
BurntSushi merged 10 commits intomainfrom
ag/auto-import-perf1

Conversation

@BurntSushi
Copy link
Member

This PR adds a new ad hoc benchmarking CLI for completions and
implements a few low hanging optimizations. All told, for my particular
benchmark (asking for completions on r in a blank file within a
checkout of Home Assistant), we get about a 40% improvement (~250ms
down to ~140ms) in the cached case and a more modest ~12.5% improvement
in the uncached case (~600ms down to ~526ms).

Before:

$ ./target/profiling/ty_completion_bench ~/astral/relatedclones/scratch-home-assistant/homeassistant/scratch.py 1 -q --iters 30
total elapsed for initial completions request: 603.491595ms
total elapsed: 7.36554807s, time per completion request: 245.518269ms

After:

$ ./target/profiling/ty_completion_bench ~/astral/relatedclones/scratch-home-assistant/homeassistant/scratch.py 1 -q --iters 30
total elapsed for initial completions request: 526.743638ms
total elapsed: 4.268009725s, time per completion request: 142.26699ms

Closes astral-sh/ty#2298

@BurntSushi BurntSushi requested review from Gankra and removed request for AlexWaygood, Gankra, carljm, dcreager and sharkdp January 16, 2026 20:09
@BurntSushi BurntSushi added performance Potential performance improvement server Related to the LSP server ty Multi-file analysis & type inference labels Jan 16, 2026
@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 16, 2026

Typing conformance results

No changes detected ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 16, 2026

mypy_primer results

Changes were detected when running on open source projects
tornado (https://github.com/tornadoweb/tornado)
- tornado/gen.py:255:62: error[invalid-argument-type] Argument to bound method `__init__` is incorrect: Expected `None | Awaitable[Unknown] | list[Awaitable[Unknown]] | dict[Any, Awaitable[Unknown]] | Future[Unknown]`, found `_T@next | _T@next | _VT@next`
+ tornado/gen.py:255:62: error[invalid-argument-type] Argument to bound method `__init__` is incorrect: Expected `None | Awaitable[Unknown] | list[Awaitable[Unknown]] | dict[Any, Awaitable[Unknown]] | Future[Unknown]`, found `_T@next | _VT@next | _T@next`

prefect (https://github.com/PrefectHQ/prefect)
- src/integrations/prefect-dbt/prefect_dbt/core/settings.py:94:28: error[invalid-assignment] Object of type `T@resolve_block_document_references | dict[str, Any]` is not assignable to `dict[str, Any]`
+ src/integrations/prefect-dbt/prefect_dbt/core/settings.py:94:28: error[invalid-assignment] Object of type `T@resolve_block_document_references | dict[str, Any] | str | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/integrations/prefect-dbt/prefect_dbt/core/settings.py:99:28: error[invalid-assignment] Object of type `T@resolve_variables | dict[str, Any]` is not assignable to `dict[str, Any]`
+ src/integrations/prefect-dbt/prefect_dbt/core/settings.py:99:28: error[invalid-assignment] Object of type `T@resolve_variables | str | int | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/prefect/cli/deploy/_core.py:86:21: error[invalid-assignment] Object of type `T@resolve_block_document_references | dict[str, Any]` is not assignable to `dict[str, Any]`
+ src/prefect/cli/deploy/_core.py:86:21: error[invalid-assignment] Object of type `T@resolve_block_document_references | dict[str, Any] | str | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/prefect/cli/deploy/_core.py:87:21: error[invalid-assignment] Object of type `T@resolve_variables` is not assignable to `dict[str, Any]`
+ src/prefect/cli/deploy/_core.py:87:21: error[invalid-assignment] Object of type `T@resolve_variables | str | int | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/prefect/deployments/steps/core.py:137:38: error[invalid-argument-type] Argument is incorrect: Expected `T@resolve_variables`, found `T@resolve_block_document_references | dict[str, Any]`
+ src/prefect/deployments/steps/core.py:137:38: error[invalid-argument-type] Argument is incorrect: Expected `T@resolve_variables`, found `T@resolve_block_document_references | dict[str, Any] | str | ... omitted 4 union elements`
- src/prefect/utilities/templating.py:320:13: error[invalid-assignment] Invalid subscript assignment with key of type `object` and value of type `T@resolve_block_document_references | dict[str, Any]` on object of type `dict[str, Any]`
+ src/prefect/utilities/templating.py:320:13: error[invalid-assignment] Invalid subscript assignment with key of type `object` and value of type `T@resolve_block_document_references | dict[str, Any] | str | ... omitted 4 union elements` on object of type `dict[str, Any]`
- src/prefect/utilities/templating.py:323:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_block_document_references | dict[str, Any]`, found `list[T@resolve_block_document_references | dict[str, Any] | Unknown]`
+ src/prefect/utilities/templating.py:323:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_block_document_references | dict[str, Any]`, found `list[T@resolve_block_document_references | dict[str, Any] | str | ... omitted 5 union elements]`
- src/prefect/utilities/templating.py:437:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `dict[object, T@resolve_variables | Unknown]`
+ src/prefect/utilities/templating.py:437:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `dict[object, T@resolve_variables | str | int | ... omitted 5 union elements]`
- src/prefect/utilities/templating.py:442:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `list[T@resolve_variables | Unknown]`
+ src/prefect/utilities/templating.py:442:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `list[T@resolve_variables | str | int | ... omitted 5 union elements]`
- src/prefect/workers/base.py:232:13: error[invalid-argument-type] Argument is incorrect: Expected `T@resolve_variables`, found `T@resolve_block_document_references | dict[str, Any]`
+ src/prefect/workers/base.py:232:13: error[invalid-argument-type] Argument is incorrect: Expected `T@resolve_variables`, found `T@resolve_block_document_references | dict[str, Any] | str | ... omitted 4 union elements`
- src/prefect/workers/base.py:234:20: error[invalid-argument-type] Argument expression after ** must be a mapping type: Found `T@resolve_variables`
+ src/prefect/workers/base.py:234:20: error[invalid-argument-type] Argument expression after ** must be a mapping type: Found `T@resolve_variables | str | int | ... omitted 4 union elements`

scikit-build-core (https://github.com/scikit-build/scikit-build-core)
+ src/scikit_build_core/build/wheel.py:99:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 46 diagnostics
+ Found 47 diagnostics

static-frame (https://github.com/static-frame/static-frame)
- static_frame/core/bus.py:671:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[Bus[Any], object_]`, found `InterGetItemLocReduces[Bus[Any] | Bottom[Index[Any]] | TypeBlocks | ... omitted 6 union elements, object_]`
- static_frame/core/bus.py:675:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Bus[Any], object_]`, found `InterGetItemILocReduces[Bus[Any] | Bottom[Index[Any]] | TypeBlocks | ... omitted 6 union elements, object_ | Self@iloc]`
+ static_frame/core/bus.py:675:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Bus[Any], object_]`, found `InterGetItemILocReduces[Self@iloc | Bus[Any], object_ | Self@iloc]`
- static_frame/core/series.py:772:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Series[Any, Any], TVDtype@Series]`, found `InterGetItemILocReduces[Series[Any, Any] | ndarray[Never, Never] | TypeBlocks | ... omitted 6 union elements, TVDtype@Series]`
- static_frame/core/series.py:4072:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[SeriesHE[Any, Any], TVDtype@SeriesHE]`, found `InterGetItemILocReduces[Bottom[Series[Any, Any]] | ndarray[Never, Never] | TypeBlocks | ... omitted 7 union elements, TVDtype@SeriesHE]`
- static_frame/core/yarn.py:418:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemILocReduces[Yarn[Any], object_]`, found `InterGetItemILocReduces[Yarn[Any] | ndarray[Never, Never] | TypeBlocks | ... omitted 6 union elements, object_]`
- Found 1825 diagnostics
+ Found 1821 diagnostics

No memory usage changes detected ✅

@BurntSushi BurntSushi force-pushed the ag/auto-import-perf1 branch from a5ac57d to 68e9b49 Compare January 16, 2026 20:15
@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 16, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more about why you chose to create an entirely separate crate over integrating the benchmarks into ty_walltime or the Python benchmarks in scripts/ty_benchmark (which even includes code to benchmark an LSP server). Integrating into ty_walltime has the advantage that we could decide to run the benchmarks as part of our CI pipeline. Integrating it into ty_benchmark has the advantage that they're easier to discover. Both benchmark also already provide the necessary infrastructure to install dependency and definitions for common ecosystem projects.

Copy link
Member Author

@BurntSushi BurntSushi Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, the reason is partially just that I wasn't aware of those. :-) And this particular program is just a copy of ty_completion_eval but stripped down for ad hoc benchmarking of completions specifically. So it was extremely quick and easy to add.

Looking at scripts/ty_benchmark, it looks like that requires doing a full build of ty which means longer iteration times:

$ touch ./crates/ty_ide/src/completion.rs && time cargo build --bin ty_completion_bench --profile profiling
   Compiling ty_ide v0.0.0 (/home/andrew/astral/ruff/pr1/crates/ty_ide)
   Compiling ty_completion_bench v0.0.0 (/home/andrew/astral/ruff/pr1/crates/ty_completion_bench)
    Finished `profiling` profile [optimized + debuginfo] target(s) in 3.24s

real    3.313
user    16.869
sys     1.102
maxmem  1227 MB
faults  1767

$ touch ./crates/ty_ide/src/completion.rs && time cargo build --bin ty --profile profiling
   Compiling ty_ide v0.0.0 (/home/andrew/astral/ruff/pr1/crates/ty_ide)
   Compiling ty_server v0.0.0 (/home/andrew/astral/ruff/pr1/crates/ty_server)
   Compiling ty v0.0.0 (/home/andrew/astral/ruff/pr1/crates/ty)
    Finished `profiling` profile [optimized + debuginfo] target(s) in 8.93s

real    9.005
user    1:15.21
sys     1.702
maxmem  1539 MB
faults  2140

The program was also made with profiling and ad hoc testing in mind. I'd have to actually go through the process of adding completions to ty_benchmark to figure out whether it would work for that. But my program does the minimal amount of work necessary to do a completion request and lets you also repeat that completion request to test the cached case. And you can see the actual completion output, which can be important for understanding what is actually happening. Similarly for ty_walltime, which uses divan and doesn't seem to have a good profiling story.

Now to be clear, I do think we should add completion benchmarks to both ty_benchmark and ty_walltime. Because, yes, running the benchmarks in CI and running them as part of a more complete LSP flow seems like a wise thing to do. But I'd still want this little CLI for ad hoc testing and profiling.

I can add a README noting all of this. Maybe we can get to a point where the CLI isn't needed any more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a README.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context.

I think my concern mainly is this

Honestly, the reason is partially just that I wasn't aware of those. :-)

But that's addressed if the plan is to ultimately add those benchmarks to ty_walltime, and this is more of an ad-hoc script.

db: &'db dyn Db,
context: CollectionContext<'db>,
items: Vec<Completion<'db>>,
items: BinaryHeap<CompletionRanker<'db>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! That's a much better approach than my brute force truncate call

It's sometimes useful to just create a scratch environment for ad hoc
testing.
This is mostly just a stripped down version of
`ty_completion_eval`. Basically, you point it at
a Python file, give it a byte offset and it does
the rest. It also lets one repeat the completion
request after the initial request for benchmarking
the cached case.
A surprising amount of time is spent sorting and comparing
completions. This commit does a very simple thing: it switches
the sorts to unstable and optimizes some comparisons to avoid
retrieving data we don't need for sorting.

Before:

```
$ ./target/profiling/ty_completion_bench ~/astral/relatedclones/scratch-home-assistant/homeassistant/scratch.py 1 -q --iters 30
total elapsed for initial completions request: 603.491595ms
total elapsed: 7.36554807s, time per completion request: 245.518269ms
```

After:

```
$ ./target/profiling/ty_completion_bench ~/astral/relatedclones/scratch-home-assistant/homeassistant/scratch.py 1 -q --iters 30
total elapsed for initial completions request: 570.040193ms
total elapsed: 5.740541847s, time per completion request: 191.351394ms
```
Instead of locking and unlocking the mutex for every symbol,
we gather what we can and only lock the mutex after processing
a module.

Before:

```
$ ./target/profiling/ty_completion_bench ~/astral/relatedclones/scratch-home-assistant/homeassistant/scratch.py 1 -q --iters 30
total elapsed for initial completions request: 579.057075ms
total elapsed: 5.627666455s, time per completion request: 187.588881ms
```

After:

```
$ ./target/profiling/ty_completion_bench ~/astral/relatedclones/scratch-home-assistant/homeassistant/scratch.py 1 -q --iters 30
total elapsed for initial completions request: 565.911487ms
total elapsed: 5.254306446s, time per completion request: 175.143548ms
```

This is a very naive/simplistic "improvement." We can probably do
better.
This should be a non-behavior-changing commit that tweaks how we rank
completions. We now use a wrapper type and include the module name for
more precise sorting when relevance and symbol name are identical.

The purpose of this is to prepare the code to switch to a max-heap.
It turns out this wasn't actually doing much and was just papering over
the possibility of duplicates being returned from `SemanticModel`. So
this was doing a fair bit of work for no good reason.

Instead, we push deduplication down into semantic completions directly.
This will run over a (typically) much smaller number of completions.

With that said, this doesn't seem to lead to improvement in ad hoc
benchmarking. However, perf wasn't really the main motivation here: this
change is primarily to prep for switching to a max-heap. This
deduplication was problematic because it was being done *after* sorting,
which meant it dependend on having the entire set of completions in
memory. But with a max-heap, we'll only keep around the top-K
completions, so deduplication has to be done "earlier" (if at all).
This is effectively what's going to happen when we switch to a max-heap
for collecting completions. This commit isolates that behavioral change
to ensure nothing unexpected happens. (I generally wouldn't expect it
to, since that would imply that imports/qualifications care about
results beyond the first 1,000.)
The previous commits made this change deliciously simple. The key point
is that we don't collect a completion if it has no way of being ranked
into the top 1000 that we return.

Before:

```
$ ./target/profiling/ty_completion_bench ~/astral/relatedclones/scratch-home-assistant/homeassistant/scratch.py 1 -q --iters 30
total elapsed for initial completions request: 554.746713ms
total elapsed: 5.435318983s, time per completion request: 181.177299ms

After:

```
$ ./target/profiling/ty_completion_bench ~/astral/relatedclones/scratch-home-assistant/homeassistant/scratch.py 1 -q --iters 30
total elapsed for initial completions request: 526.743638ms
total elapsed: 4.268009725s, time per completion request: 142.26699ms
```

This is an especially nice speed-up for the cached case, where lots of
time (proportionally) was being spent sorting potentially a very large
list of completions.

There is potential future wins here. For example, every auto-import
completion has an `import` statement generated for it. We could instead
defer to that work until the very end when we convert the max-heap into
a `Vec`. Since before then, the completion might get dropped and there's
no point in doing extra work. But that will require more refactoring
(and possibly another `CompletionIntermediate` type? blech).
I didn't dig into where this change came from, but 1) it's a small
change and 2) this one tends to be noisy since there are so many things
that match `array`.
@BurntSushi BurntSushi force-pushed the ag/auto-import-perf1 branch from 68e9b49 to 0fba952 Compare January 20, 2026 13:58
@BurntSushi
Copy link
Member Author

I created astral-sh/ty#2570 for tracking adding benchmarks to ty_walltime at minimum.

@BurntSushi BurntSushi force-pushed the ag/auto-import-perf1 branch from 1e49a13 to 7402377 Compare January 20, 2026 14:22
@BurntSushi BurntSushi merged commit 7e1f07e into main Jan 20, 2026
49 checks passed
@BurntSushi BurntSushi deleted the ag/auto-import-perf1 branch January 20, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Potential performance improvement server Related to the LSP server ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Initial optimization pass at auto-import symbol extraction

2 participants