Skip to content

Comments

[red-knot] Add FunctionType::to_overloaded#17585

Merged
dhruvmanila merged 3 commits intomainfrom
dhruv/overloaded-function
Apr 23, 2025
Merged

[red-knot] Add FunctionType::to_overloaded#17585
dhruvmanila merged 3 commits intomainfrom
dhruv/overloaded-function

Conversation

@dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Apr 23, 2025

Summary

This PR adds a new method FunctionType::to_overloaded which converts a FunctionType into an OverloadedFunction which contains all the @overload-ed FunctionType and the implementation FunctionType if it exists.

There's a big caveat here (it's the way overloads work) which is that this method can only "see" all the overloads that comes before itself. Consider the following example:

from typing import overload

@overload
def foo() -> None: ...
@overload
def foo(x: int) -> int: ...
def foo(x: int | None) -> int | None:
	return x

Here, when the to_overloaded method is invoked on the

  1. first foo definition, it would only contain a single overload which is itself and no implementation.
  2. second foo definition, it would contain both overloads and still no implementation
  3. third foo definition, it would contain both overloads and the implementation which is itself

Usages

This method will be used in the logic for checking invalid overload usages. It can also be used for #17541.

Test Plan

Make sure that existing tests pass.

@dhruvmanila dhruvmanila added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Apr 23, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Apr 23, 2025

mypy_primer results

No ecosystem changes detected ✅

@dhruvmanila dhruvmanila marked this pull request as ready for review April 23, 2025 17:32
@dhruvmanila dhruvmanila force-pushed the dhruv/overloaded-function branch from da7e00b to dfdcef0 Compare April 23, 2025 17:33
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Comment on lines +6109 to +6139
// The semantic model records a use for each function on the name node. This is used here
// to get the previous function definition with the same name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that we call to_overloaded 4 times if a function has 3 overloads (and one impl). 4 times because:

  1. to_overloaded on the implementation
  2. which calls into to_overloaded on the last overload
  3. which, in turn, calls into `to_overloaded on the second overload
  4. which, in turn, calls into to_overloaded on the first overload

Or is it at most two to_overloaded calls, one for the impl (root) which then calls into the to_overloaded for the first overload?

Copy link
Member Author

@dhruvmanila dhruvmanila Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the former is correct. It will call 4 times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of gives me a thought as to whether this is being wasting resources in terms of memory because we don't really need to cache the calls on the function that isn't the implementation or the last overload (if implementation doesn't exists).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only caching the last result would be great. Not only reduces it overall memory consumption. It also removes the need to clone the overloads vec multiple times only to add one extra element.

Would it be possible to rewrite this logic as a loop where we store the current function type that we look up instead of a recursive function call?

Copy link
Contributor

@carljm carljm Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to do this iteratively instead of recursively. It will be less efficient (will repeat work) if we did end up calling it on intermediate overloads too, but it should be somewhat more efficient if we only end up calling it on the last overload. Which I think should usually be the case?

The memory used here is tiny, and there's no cold regression shown; I doubt that overloads will ever be a significant part of our overall performance profile (cold check time or memory). There is a 1% incremental regression, probably due to the new memoized query results; an iterative approach might be able to bring that down if it reduces the number of memos?

I don't think we should spend more than an hour or two right now on reworking this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t should be possible to do this iteratively instead of recursively. It will be less efficient (will repeat work) if we did end up calling it on intermediate overloads too, but it should be somewhat more efficient if we only end up calling it on the last overload. Which I think should usually be the case?

Could we remove the query all-together, considering that they're rare and because to_overloaded is only called from signature which itself is cached?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we would face any regression if we don't cache the to_overloaded right now as most of the usages of overloads would come from signature which is cached.

Currently, there isn't any usages of to_overloaded but that will happen when I add the checks for valid overloads in a follow-up PR. I think I'd prefer to see if removing the cache would cause any regression after that and not now. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will likely be another upcoming use for dataclass transforms that wouldn't be from within signature? But would also be within a different query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for now I'll just follow-up with an iterative approach instead. I'll merge this PR after expanding the documentation for to_overloaded method.

@dhruvmanila dhruvmanila force-pushed the dhruv/overloaded-function branch from 2fafb6b to 53d594d Compare April 23, 2025 21:00
@dhruvmanila
Copy link
Member Author

Merging this PR, I'll follow-up with an iterative approach (instead of the current recursive approach) to collect the overloads. Later, once we have a couple of usages of to_overloaded we can check if not caching the method regresses or not.

@dhruvmanila dhruvmanila merged commit 7b62227 into main Apr 23, 2025
34 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/overloaded-function branch April 23, 2025 21:27
dcreager added a commit that referenced this pull request Apr 24, 2025
* main:
  [red-knot] fix collapsing literal and its negation to object (#17605)
  [red-knot] Add more tests for protocols (#17603)
  [red-knot] Ban direct instantiations of `Protocol` classes (#17597)
  [`pyupgrade`] Preserve parenthesis when fixing native literals containing newlines (`UP018`) (#17220)
  [`airflow`] fix typos (`AIR302`, `AIR312`) (#17574)
  [red-knot] Special case `@abstractmethod` for function type (#17591)
  [red-knot] Emit diagnostics for isinstance() and issubclass() calls where a non-runtime-checkable protocol is the second argument (#17561)
  [red-knot] Infer the members of a protocol class (#17556)
  [red-knot] Add `FunctionType::to_overloaded` (#17585)
  [red-knot] Add mdtests for `global` statement (#17563)
  [syntax-errors] Make duplicate parameter names a semantic error (#17131)
dhruvmanila added a commit that referenced this pull request Apr 24, 2025
## Summary

This PR updates the `to_overloaded` method to use an iterative approach
instead of a recursive one.

Refer to
#17585 (comment) for
context.

The main benefit here is that it avoids calling the `to_overloaded`
function in a recursive manner which is a salsa query. So, this is a bit
hand wavy but we should also see less memory used because the cache will
only contain a single entry which should be the entire overload chain.
Previously, the recursive approach would mean that each of the function
involved in an overload chain would have a cache entry. This reduce in
memory shouldn't be too much and I haven't looked at the actual data for
it.

## Test Plan

Existing test cases should pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants