Skip to content

Conversation

@luxedo
Copy link
Contributor

@luxedo luxedo commented Jun 5, 2024

Add static typing to ma.min. Part of #26404.

@luxedo
Copy link
Contributor Author

luxedo commented Jun 5, 2024

I'd appreciate a little help here. This is my first time reading numpy's typing system.

  1. How to validate typing? I did not understand why np.min has three typing declarations:

    @overload
    def amin(
    a: _ArrayLike[_SCT],
    axis: None = ...,
    out: None = ...,
    keepdims: Literal[False] = ...,
    initial: _NumberLike_co = ...,
    where: _ArrayLikeBool_co = ...,
    ) -> _SCT: ...
    @overload
    def amin(
    a: ArrayLike,
    axis: None | _ShapeLike = ...,
    out: None = ...,
    keepdims: bool = ...,
    initial: _NumberLike_co = ...,
    where: _ArrayLikeBool_co = ...,
    ) -> Any: ...
    @overload
    def amin(
    a: ArrayLike,
    axis: None | _ShapeLike = ...,
    out: _ArrayType = ...,
    keepdims: bool = ...,
    initial: _NumberLike_co = ...,
    where: _ArrayLikeBool_co = ...,
    ) -> _ArrayType: ...

  2. Is following the documentation and the examples enough?

  3. I see there's a test dir for typing. Should I create tests as well?

  4. I'd love some review of this PR to check if I'm in the right direction.

I'm quite confident that once I got the resources I need I can keep typing the other functions by myself

@luxedo luxedo marked this pull request as ready for review June 9, 2024 23:28
@luxedo
Copy link
Contributor Author

luxedo commented Jun 18, 2024

Any updates on this?

Copy link
Contributor

@Jacob-Stevens-Haas Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm working on #26081 and while not a numpy dev, I can hopefully ask and answer some questions.

  1. np.amin has three overloads to tell the type checker three different cases
    • with no axis argument, amin will return the argmin of the entire array as a python (not numpy) object, so tell the type checker that.
    • With an axis argument and no out, it's impossible to tell what _ArrayLike object a is, so tell type checker Any
    • With an argument out, we know that the returned variable will be out, so the type checker knows the return type matches out
      These are probably also relevant for masked arrays.
  2. I don't think you need more documentation, because there's no new types, other than an entry in doc/release/upcoming_changes/26624.improvement.rst.
  3. Yes, most likely. A file numpy/typing/tests/data/pass/masked.py and likewise .../fail/masked.py that will fill up with type checks as #26404 is implemented. This is a good place to test-drive assumptions you might have about how your type annotations work as you develop the PR (e.g. the comment on axis and obj return types)

Comment on lines 379 to 383
axis: None | _ShapeLike = ...,
out: None = ...,
fill_value: None | _ScalarLike_co = ...,
keepdims: None | bool = ...,
) -> _SCT: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if axis is _ShapeLike and not None, the return type will be the same type as obj, not _SCT.

@luxedo
Copy link
Contributor Author

luxedo commented Jul 4, 2024

@Jacob-Stevens-Haas thank you for the feedback. I'll be working on this next week.

@charris
Copy link
Member

charris commented Aug 30, 2024

The test failure can be ignored.

@charris
Copy link
Member

charris commented Aug 30, 2024

Need rebase.

@luxedo
Copy link
Contributor Author

luxedo commented Sep 1, 2024

Rebased and upated with @Jacob-Stevens-Haas hints. Also added passing typing tests.

I couldn't understand how to make failing tests. When calling spin mypy somehow it kept throwing me old errors even after deleting the test file.

Copy link
Contributor

@Jacob-Stevens-Haas Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is great, a few comments about tests (with a grain of salt - I'm learning too):

I couldn't understand how to make failing tests.

Fail tests are written with arguments that fail type checking, with a comment afterwards for the expected type checking error (mostly either "no overload variant" or "argument has incompatible type" e.g.

np.amin(a, axis=1.0) # E: No overload variant
np.amin(a, keepdims=1.0) # E: No overload variant
np.amin(a, out=1.0) # E: No overload variant
np.amin(a, initial=[1.0]) # E: No overload variant
np.amin(a, where=[1.0]) # E: incompatible type

Edit: But thinking about it more, there's no illegal combination I can think of worth testing. E.g. the overloads allow np.min(1.0) and np.min(arr, axis=0). If they prevented something like np.ma.min(1.0, axis=0), that would be a good fail test. I don't think there are any such holes in the overloads for this PR though, so fail tests may not be necessary

There's also reveal tests that verify the type that gets returned with assert_type. My mental comparison is that pass/fail tests check arguments, reveal tests check return value. In this case, it might be helpful to check the returned type is not Any in overloads 1 and 3.

When calling spin mypy somehow it kept throwing me old errors even after deleting the test file.

Did you try running $ git clean -xdf?

m : np.ma.MaskedArray[Any, np.dtype[np.float64]] = np.ma.masked_array([1.5, 2, 3], mask=[True, False, True])
m: np.ma.MaskedArray[Any, np.dtype[np.float64]] = np.ma.masked_array([1.5, 2, 3], mask=[True, False, True])

A = np.array(True, ndmin=2, dtype=bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no expert on which tests to write for pass/, but this may be more than is required? IIUC, the PR doesn't make a distinction between bool, np.uint8, np.float32, np.uint8, so does the distinction between A/B and C/D really test anything? (happy to change my mind though!)

Comment on lines 13 to 15
A.setflags(write=False)
B.setflags(write=False)
C.setflags(write=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that setflags is called in fromnumeric.py, but does it do anything important for the test?

Comment on lines +26 to +27
np.ma.min(A, axis=0)
np.ma.min(B, axis=0)
Copy link
Contributor

@Jacob-Stevens-Haas Jacob-Stevens-Haas Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, axis being None, an int, or _ShapeLike does make a difference. It might make sense to also run this with axis=(0,)

@luxedo
Copy link
Contributor Author

luxedo commented Sep 26, 2024

@Jacob-Stevens-Haas I updated the PR with your suggestions

Copy link
Contributor

@Jacob-Stevens-Haas Jacob-Stevens-Haas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this! Can you add a reveal test to check the return values? I.e. numpy/typing/tests/data/reveal/ma.pyi like:

import numpy as np

arr: np.ma.MaskedArray[Any, np.dtype[np.float64]] = np.ma.masked_array([[1.0, 2.0]], mask=[True, False])
result = np.ma.min(arr, axis=None)
assert_type(result, np.float64)

tgt: np.ma.MaskedArray[Any, np.dtype[np.float64]] = np.ma.empty(shape=())
result = np.ma.min(arr, axis=None, out=tgt)
assert_type(result, np.ma.MaskedArray)

This verifies that the result type changes with the out parameter. Since reveal tests even need to be executable, so you can remove the assignment for readability.

luxedo and others added 2 commits September 30, 2024 00:10
Co-Authored-By: Jakob Stevens Haas <[email protected]>
Co-Authored-By: Jakob Stevens Haas <[email protected]>
@luxedo
Copy link
Contributor Author

luxedo commented Sep 30, 2024

Both my numpy's and scipy's builds are broken since last week :(

I pushed this test and it failed. I'll try to make spin mypy work and then I fix the typing.

@Jacob-Stevens-Haas
Copy link
Contributor

Jacob-Stevens-Haas commented Oct 9, 2024

Hmm, this is curious. keepdims defaults to _NoValue, not None or False, even though the default behavior is False. Thus the static typing sees the failing test as overload 2 instead of overload 1.

If the default value is passed, then keepdims will not be passed through to the min method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

In other words, _NoValueType exists in order to allow subclasses to break Liskov Substitution, which means might be a problem in and of itself. It looks like _NoValueType was first discussed as a sort of third option for an argument and has come up again in a few PRs and issues 1, 2, and 3.

The long and short of this is that you can probably include _NoValueType in overload 1, e.g. keep_dims: Literal[False] | _NoValueType. But it would also be good to check whether we really need to allow _NoValue as the default or we can just default to false.

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I left a couple of minor suggestions, but in general, I like the way this looks 🙁.

def min(obj, axis=..., out=..., fill_value=..., keepdims=...): ...
@overload
def min(
obj: MaskedArray[_ShapeLike, _SCT] | _ArrayLike[_SCT],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shape type is bound to tuple[int, ...], to which _ShapeLike isn't assignable (i.e. it's a broader type).

Additionally, the second type parameter of MaskedArray is bound to np.dtype, but _SCT is bound to np.generic.

Luckily this is easy to fix in this case, because MaskedArray is assignable to _ArrayLike (for a specific scalar type _SCT). So that means we simply write it as:

Suggested change
obj: MaskedArray[_ShapeLike, _SCT] | _ArrayLike[_SCT],
obj: _ArrayLike[_SCT],

axis: None = ...,
out: None = ...,
fill_value: None | _ScalarLike_co = ...,
keepdims: Literal[False] = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was discussed earlier, but keepdims uses a default sentinel, and it's impossible to type it.
However, that's not a problem at all. Because from a typing perspective, not passing keepdims is indistinguishable from keepdims=False. So we can just pretend that Literal[False] is the default.
So in other words; this overload is perfectly fine as-is.

out: None = ...,
fill_value: None | _ScalarLike_co = ...,
keepdims: Literal[False] = ...,
) -> _SCT: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the array has an object_ as scalar-type, then this implies that an instance of np.object_ is returned. However, those don't exist. And because we don't know what the actual type is "within" that object_, the best we could do is to return an object, and let the user cast it to the specific type.

But having said that, I don't think that it's really a priority at the moment. So I personally wouldn't mind if you'd label the returned object_ as a "proxy type", and keep it the way it is 🤷🏻.
Plus, a little bird told me that object_ might become a generic type in the near future 😏.


@overload
def min(
obj: MaskedArray[_ShapeLike, _SCT] | _ArrayLike[_SCT],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
obj: MaskedArray[_ShapeLike, _SCT] | _ArrayLike[_SCT],
obj: _ArrayLike[_SCT],

out: None = ...,
fill_value: None | _ScalarLike_co = ...,
keepdims: None | bool = ...,
) -> Any: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this is in line with the current annotations of numpy.min, but we could also narrow this down do _SCT | MaskedArray[tuple[int, ...], np.dtype[_SCT]] while we're at it 🤷🏻.

Like I mentioned before, this technically this isn't correct in the case of object_ (and StringDType as well, BTW). But because this is already an improvement in general, I think it's not too big of a problem if we just ignore this for now.


from typing_extensions import assert_type

arr: np.ma.MaskedArray[Any, np.dtype[np.float64]] = np.ma.masked_array([[1.0, 2.0]], mask=[True, False])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no need to do the actual function call within a .pyi stub, especially since the return type of np.ma.masked_array was already verified in the "pass" tests.

Suggested change
arr: np.ma.MaskedArray[Any, np.dtype[np.float64]] = np.ma.masked_array([[1.0, 2.0]], mask=[True, False])
arr: np.ma.MaskedArray[Any, np.dtype[np.float64]]

Comment on lines +8 to +9
result = np.ma.min(arr, axis=None)
assert_type(result, np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by inlining the result, you avoid changing the type result later on, which technically isn't valid (mypy allowing this is a bug IMHO).

Suggested change
result = np.ma.min(arr, axis=None)
assert_type(result, np.float64)
assert_type(np.ma.min(arr, axis=None), np.float64)

Comment on lines +11 to +13
tgt: np.ma.MaskedArray[Any, np.dtype[np.float64]] = np.ma.empty(shape=())
result = np.ma.min(arr, axis=None, out=tgt)
assert_type(result, np.ma.MaskedArray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of tgt is equivalent to that of arr, so there's no need for it (so it doesn't matter that at runtime this wouldn't make a lot of sense).

Additionally, np.ma.MaskedArray should have type arguments, as its type parameters have no defaults (PEP 696).

Suggested change
tgt: np.ma.MaskedArray[Any, np.dtype[np.float64]] = np.ma.empty(shape=())
result = np.ma.min(arr, axis=None, out=tgt)
assert_type(result, np.ma.MaskedArray)
assert_type(np.ma.min(arr, axis=None, out=arr), np.ma.MaskedArray[Any, np.dtype[np.float64]])

zeros_like as zeros_like,
angle as angle
)
from numpy._typing import _ArrayLike, _ScalarLike_co, _ShapeLike, overload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably safer to import overload directly from typing instead.


_ShapeType_co = TypeVar("_ShapeType_co", bound=tuple[int, ...], covariant=True)
_DType_co = TypeVar("_DType_co", bound=dtype[Any], covariant=True)
_ArrayType = TypeVar("_ArrayType", bound="MaskedArray[Any, Any]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for stringified annotations within .pyi stubs (and disallowed by e.g. flake8-pyi and several type-checkers)

@jorenham
Copy link
Member

jorenham commented Nov 4, 2024

I see that there's also a (tiny) merge conflict.

@jorenham
Copy link
Member

This has been superseded by #28593, so I'm going to close this.

@jorenham jorenham closed this Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants