[red-knot] Add `FunctionType::to_overloaded` by dhruvmanila · Pull Request #17585 · astral-sh/ruff

dhruvmanila · 2025-04-23T16:09:36Z

Summary

This PR adds a new method FunctionType::to_overloaded which converts a FunctionType into an OverloadedFunction which contains all the @overload-ed FunctionType and the implementation FunctionType if it exists.

There's a big caveat here (it's the way overloads work) which is that this method can only "see" all the overloads that comes before itself. Consider the following example:

from typing import overload

@overload
def foo() -> None: ...
@overload
def foo(x: int) -> int: ...
def foo(x: int | None) -> int | None:
	return x

Here, when the to_overloaded method is invoked on the

first foo definition, it would only contain a single overload which is itself and no implementation.
second foo definition, it would contain both overloads and still no implementation
third foo definition, it would contain both overloads and the implementation which is itself

Usages

This method will be used in the logic for checking invalid overload usages. It can also be used for #17541.

Test Plan

Make sure that existing tests pass.

github-actions · 2025-04-23T16:14:11Z

`mypy_primer` results

No ecosystem changes detected ✅

crates/red_knot_python_semantic/src/types.rs

carljm

This looks great!

crates/red_knot_python_semantic/src/types.rs

MichaReiser · 2025-04-23T20:08:37Z

crates/red_knot_python_semantic/src/types.rs

+            // The semantic model records a use for each function on the name node. This is used here
+            // to get the previous function definition with the same name.


Does that mean that we call to_overloaded 4 times if a function has 3 overloads (and one impl). 4 times because:

to_overloaded on the implementation

which calls into to_overloaded on the last overload

which, in turn, calls into `to_overloaded on the second overload

which, in turn, calls into to_overloaded on the first overload

Or is it at most two to_overloaded calls, one for the impl (root) which then calls into the to_overloaded for the first overload?

Yes, the former is correct. It will call 4 times.

This kind of gives me a thought as to whether this is being wasting resources in terms of memory because we don't really need to cache the calls on the function that isn't the implementation or the last overload (if implementation doesn't exists).

Only caching the last result would be great. Not only reduces it overall memory consumption. It also removes the need to clone the overloads vec multiple times only to add one extra element.

Would it be possible to rewrite this logic as a loop where we store the current function type that we look up instead of a recursive function call?

It should be possible to do this iteratively instead of recursively. It will be less efficient (will repeat work) if we did end up calling it on intermediate overloads too, but it should be somewhat more efficient if we only end up calling it on the last overload. Which I think should usually be the case?

The memory used here is tiny, and there's no cold regression shown; I doubt that overloads will ever be a significant part of our overall performance profile (cold check time or memory). There is a 1% incremental regression, probably due to the new memoized query results; an iterative approach might be able to bring that down if it reduces the number of memos?

I don't think we should spend more than an hour or two right now on reworking this.

t should be possible to do this iteratively instead of recursively. It will be less efficient (will repeat work) if we did end up calling it on intermediate overloads too, but it should be somewhat more efficient if we only end up calling it on the last overload. Which I think should usually be the case?

Could we remove the query all-together, considering that they're rare and because to_overloaded is only called from signature which itself is cached?

Yeah, I don't think we would face any regression if we don't cache the to_overloaded right now as most of the usages of overloads would come from signature which is cached.

Currently, there isn't any usages of to_overloaded but that will happen when I add the checks for valid overloads in a follow-up PR. I think I'd prefer to see if removing the cache would cause any regression after that and not now. What do you think?

There will likely be another upcoming use for dataclass transforms that wouldn't be from within signature? But would also be within a different query.

Yeah, for now I'll just follow-up with an iterative approach instead. I'll merge this PR after expanding the documentation for to_overloaded method.

crates/red_knot_python_semantic/src/types.rs

dhruvmanila · 2025-04-23T21:26:59Z

Merging this PR, I'll follow-up with an iterative approach (instead of the current recursive approach) to collect the overloads. Later, once we have a couple of usages of to_overloaded we can check if not caching the method regresses or not.

* main: [red-knot] fix collapsing literal and its negation to object (#17605) [red-knot] Add more tests for protocols (#17603) [red-knot] Ban direct instantiations of `Protocol` classes (#17597) [`pyupgrade`] Preserve parenthesis when fixing native literals containing newlines (`UP018`) (#17220) [`airflow`] fix typos (`AIR302`, `AIR312`) (#17574) [red-knot] Special case `@abstractmethod` for function type (#17591) [red-knot] Emit diagnostics for isinstance() and issubclass() calls where a non-runtime-checkable protocol is the second argument (#17561) [red-knot] Infer the members of a protocol class (#17556) [red-knot] Add `FunctionType::to_overloaded` (#17585) [red-knot] Add mdtests for `global` statement (#17563) [syntax-errors] Make duplicate parameter names a semantic error (#17131)

## Summary This PR updates the `to_overloaded` method to use an iterative approach instead of a recursive one. Refer to #17585 (comment) for context. The main benefit here is that it avoids calling the `to_overloaded` function in a recursive manner which is a salsa query. So, this is a bit hand wavy but we should also see less memory used because the cache will only contain a single entry which should be the entire overload chain. Previously, the recursive approach would mean that each of the function involved in an overload chain would have a cache entry. This reduce in memory shouldn't be too much and I haven't looked at the actual data for it. ## Test Plan Existing test cases should pass.

dhruvmanila added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Apr 23, 2025

dhruvmanila marked this pull request as ready for review April 23, 2025 17:32

dhruvmanila requested review from AlexWaygood, carljm, dcreager and sharkdp as code owners April 23, 2025 17:32

dhruvmanila force-pushed the dhruv/overloaded-function branch from da7e00b to dfdcef0 Compare April 23, 2025 17:33

MichaReiser reviewed Apr 23, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

carljm approved these changes Apr 23, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Apr 23, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Show resolved Hide resolved

dhruvmanila added 2 commits April 23, 2025 16:00

[red-knot] Add FunctionType::to_overloaded

e00d148

merge the impl blocks

53d594d

dhruvmanila force-pushed the dhruv/overloaded-function branch from 2fafb6b to 53d594d Compare April 23, 2025 21:00

Add note to to_overloaded

734a708

dhruvmanila merged commit 7b62227 into main Apr 23, 2025
34 checks passed

dhruvmanila deleted the dhruv/overloaded-function branch April 23, 2025 21:27

dhruvmanila mentioned this pull request Apr 24, 2025

[red-knot] Use iterative approach to collect overloads #17607

Merged

		// The semantic model records a use for each function on the name node. This is used here
		// to get the previous function definition with the same name.

Comments

Conversation

dhruvmanila commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usages

Test Plan

Uh oh!

github-actions bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichaReiser Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

MichaReiser Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

carljm Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

carljm Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dhruvmanila commented Apr 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dhruvmanila commented Apr 23, 2025 •

edited

Loading

github-actions bot commented Apr 23, 2025 •

edited

Loading

`mypy_primer` results

dhruvmanila Apr 23, 2025 •

edited

Loading

carljm Apr 23, 2025 •

edited

Loading