-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Type annotations, part 1 #8295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type annotations, part 1 #8295
Conversation
| assert len(v.dask) < len(vv.dask) | ||
| assert_eq(v, vv) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated test
| def checkpoint(*collections, split_every: SplitEvery = None) -> Delayed: | ||
| def checkpoint( | ||
| *collections, | ||
| split_every: float | Literal[False] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alias would require the Python 3.10 TypeAlias, which is available in typing_extensions, but it felt like overkill.
Also, using float instead of Number. Note the differences:
isinstance(1, Number) # True
isinstance(1.1, Number) # True
isinstance(1, float) # False
x: Number
x = 1 # mypy error
x = 1.1 # mypy error
y: float
y = 1 # mypy passesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also flagging that typing_extensions~=3.10 is not necessarily widely supported yet. For example, until very recently, tensorflow couldn't be used with typing_extensions > 3.7.
Quite annoying, but a reason to be a bit cagey around very recent typing features.
| new_layers: Dict[str, Layer] = {} | ||
| new_deps: Dict[str, AbstractSet[str]] = {} | ||
| new_layers: dict[str, Layer] = {} | ||
| new_deps: dict[str, Set[str]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collections.abc.Set === typing.AbstractSet
set built-in === typing.Set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice contravariance discipline
| expected = "".join([files[v] for v in sorted(files)]) | ||
|
|
||
| fmt_bs = [(fmt, None) for fmt in compr] + [(None, "10 B")] | ||
| fmt_bs = [(fmt, None) for fmt in compr] + [(None, "10 B")] # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really unfortunate limitation in mypy that it can't handle things like
arr: list[str | int] = ["a"] + [1]I see several long running issues about it, and don't really see any forward progress :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere with the same pattern: it's annoying, but we could remove the ignore by splitting into two lines:
fmt_bs: list[tuple[str | None, str | None]] = [(fmt, None) for fmt in compr]
fmt_bs += [(None, "10 B")]| def checkpoint(*collections, split_every: SplitEvery = None) -> Delayed: | ||
| def checkpoint( | ||
| *collections, | ||
| split_every: float | Literal[False] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also flagging that typing_extensions~=3.10 is not necessarily widely supported yet. For example, until very recently, tensorflow couldn't be used with typing_extensions > 3.7.
Quite annoying, but a reason to be a bit cagey around very recent typing features.
| new_layers: Dict[str, Layer] = {} | ||
| new_deps: Dict[str, AbstractSet[str]] = {} | ||
| new_layers: dict[str, Layer] = {} | ||
| new_deps: dict[str, Set[str]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice contravariance discipline
|
I've decided to have a first tranche of this work merged. A few files remain that fail mypy, but I'll tackle them later on. |
| @pytest.mark.parametrize("idx_chunks", [None, 3, 2, 1]) | ||
| @pytest.mark.parametrize("x_chunks", [(3, 5), (2, 3), (1, 2), (1, 1)]) | ||
| def test_index_with_int_dask_array(x_chunks, idx_chunks): | ||
| def test_index_with_int_dask_array_nep35(x_chunks, idx_chunks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid test name conflict
ian-r-rose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @crusaderky, this is awesome. Can't wait to start using it. I have a few questions/suggestions, but nothing that I'd consider blocking.
Can you remind me what the problem was with configuring mypy to ignore problematic files? If we can, I'd love to start turning this on and just exclude some things for the time being.
|
|
||
| @pytest.mark.parametrize( | ||
| "tup", [(1, 2), collections.namedtuple("foo", ["a", "b"])(1, 2)] | ||
| "tup", [(1, 2), collections.namedtuple("foo", ["a", "b"])(1, 2)] # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an upstream bug report for this? Seems pretty clearly to be a mypy bug to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but then again, we're using namedtuple improperly.
This works:
T = collections.namedtuple("T", ["a", "b"])
T(1, 2)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why it's improper to not capture the type, is that documented anywhere? I don't see it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy checks that the name of the namedtuple matches the variable it's assigned to. This trips it:
foo = collections.namedtuple("bar", ["a", "b"])| expected = "".join([files[v] for v in sorted(files)]) | ||
|
|
||
| fmt_bs = [(fmt, None) for fmt in compr] + [(None, "10 B")] | ||
| fmt_bs = [(fmt, None) for fmt in compr] + [(None, "10 B")] # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere with the same pattern: it's annoying, but we could remove the ignore by splitting into two lines:
fmt_bs: list[tuple[str | None, str | None]] = [(fmt, None) for fmt in compr]
fmt_bs += [(None, "10 B")]
dask/callbacks.py
Outdated
| """ | ||
|
|
||
| active = set() | ||
| active: ClassVar[set[tuple]] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be awesome to write down a Protocol for a callback here, especially since we are about to drop 3.7, but in the meantime, can we do
| active: ClassVar[set[tuple]] = set() | |
| active: ClassVar[set[tuple[Callable | None, ...]]] = set() |
here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| import yaml | ||
|
|
||
| if TYPE_CHECKING: | ||
| from typing_extensions import Literal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on when #8572 lands we could remove this conditional import
| class Foo: | ||
| @globalmethod(key="f") | ||
| def f(): | ||
| def f(): # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit, I have no idea what the error mypy is reporting here is. Seems perfectly fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @globalmethod decorator is too magic for mypy to handle.
| CI. Developers should run them locally before they submit a PR, through the single | ||
| command ``pre-commit run --all-files``. This makes sure that linter versions and options | ||
| are aligned for all developers. | ||
| Dask uses several code linters (flake8, black, isort, pyupgrade, mypy), which are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
| dsk, keys = trivial(int(n), 5) | ||
| start = time() | ||
| get(dsk, keys) | ||
| get(dsk, keys) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh, another weird one from mypy. Do you understand this? Pyright seems okay with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that
- the functions in that list are not typed and/or
- the functions in that list don't have the exact same signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for sure it's related to having untyped signatures in the above functions, but my understanding was that mypy will typically infer Any for such a case, and instead I'm seeing cannot call function of unknown type. But not worth going into, I just thought it was weird.
Anything importing the problematic files will stop working.
You can add to .pre-commit-config.yaml: - repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.931
hooks:
- id: mypy
additional_dependencies:
# Type stubs
- types-docutils
- types-pkg_resources
- types-PyYAML
- types-requests
- types-setuptools
# Libraries exclusively imported under `if TYPE_CHECKING:`
- typing_extensions # To be reviewed after dropping Python 3.7
# Typed libraries
- numpyand then run
Your proposed alternative hurts readability and is overkill for a unit test, IMHO I think I addressed all review comments |
Fair enough. I don't love the proposed solution either, but might advocate for it more strongly if it were not in a unit test. |
ian-r-rose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @crusaderky!
| CI. Developers should run them locally before they submit a PR, through the single | ||
| command ``pre-commit run --all-files``. This makes sure that linter versions and options | ||
| are aligned for all developers. | ||
| Dask uses several code linters (flake8, black, isort, pyupgrade, mypy), which are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be missing something, but it looks like we're not enforcing mypy at the moment. Should we drop mypy from this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. That's from the initial version with the checker. Yes this paragraph should be reverted for now.
[PEP 544](https://www.python.org/dev/peps/pep-0544/) introduces the `Protocol` class to the `typing` module in Python 3.8 (the soon be the minimum supported version, dask/community#213). Writing new Dask collections for [dask-awkward](https://github.com/ContinuumIO/dask-awkward/) has had me thinking about working on a `DaskCollection` protocol. I imagine the benefits to be: - usage with static type checkers - other activity in this area at - #8295 - #8706 - #8854 - Python supporting IDEs take advantage of typing - self-documenting; some improvements to [the custom collections page](https://docs.dask.org/en/latest/custom-collections.html) of the docs. The protocol docs can be autogenerated and added to that page. - purely opt-in feature The `typing.runtime_checkable` decorator allows use of `isinstance(x, DaskCollection)` in any code base that uses Dask collections; for example: ```python >>> from dask.typing import DaskCollection >>> import dask.array as da >>> x = da.zeros((10, 3)) >>> isinstance(x, DaskCollection) True ``` (though this is an order of magnitude slower than `dask.base.is_dask_collection` which only checks for `x.__dask_graph__() is not None`; static typing checking & built-in interface documentation are the core benefits IMO) Something else that came up in the brief discussion on a call last week was having `{Scheduler,Worker,Nanny}Plugin` protocols in `distributed`; and perhaps those are better places to start introducing protocols to Dask since on the user side typically more folks would write plugins than new collections.
[PEP 544](https://www.python.org/dev/peps/pep-0544/) introduces the `Protocol` class to the `typing` module in Python 3.8 (the soon be the minimum supported version, dask/community#213). Writing new Dask collections for [dask-awkward](https://github.com/ContinuumIO/dask-awkward/) has had me thinking about working on a `DaskCollection` protocol. I imagine the benefits to be: - usage with static type checkers - other activity in this area at - dask#8295 - dask#8706 - dask#8854 - Python supporting IDEs take advantage of typing - self-documenting; some improvements to [the custom collections page](https://docs.dask.org/en/latest/custom-collections.html) of the docs. The protocol docs can be autogenerated and added to that page. - purely opt-in feature The `typing.runtime_checkable` decorator allows use of `isinstance(x, DaskCollection)` in any code base that uses Dask collections; for example: ```python >>> from dask.typing import DaskCollection >>> import dask.array as da >>> x = da.zeros((10, 3)) >>> isinstance(x, DaskCollection) True ``` (though this is an order of magnitude slower than `dask.base.is_dask_collection` which only checks for `x.__dask_graph__() is not None`; static typing checking & built-in interface documentation are the core benefits IMO) Something else that came up in the brief discussion on a call last week was having `{Scheduler,Worker,Nanny}Plugin` protocols in `distributed`; and perhaps those are better places to start introducing protocols to Dask since on the user side typically more folks would write plugins than new collections.
Fix mypy errors in most files; a handful of files remain for later.
Does not enable actual mypy yet.