Skip to content

Conversation

@crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Oct 25, 2021

Fix mypy errors in most files; a handful of files remain for later.
Does not enable actual mypy yet.

assert len(v.dask) < len(vv.dask)
assert_eq(v, vv)


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated test

@github-actions github-actions bot added the dispatch Related to `Dispatch` extension objects label Oct 26, 2021
def checkpoint(*collections, split_every: SplitEvery = None) -> Delayed:
def checkpoint(
*collections,
split_every: float | Literal[False] | None = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alias would require the Python 3.10 TypeAlias, which is available in typing_extensions, but it felt like overkill.
Also, using float instead of Number. Note the differences:

isinstance(1, Number)  # True
isinstance(1.1, Number)  # True
isinstance(1, float)  # False

x: Number
x = 1  # mypy error
x = 1.1  # mypy error

y: float
y = 1  # mypy passes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also flagging that typing_extensions~=3.10 is not necessarily widely supported yet. For example, until very recently, tensorflow couldn't be used with typing_extensions > 3.7.

Quite annoying, but a reason to be a bit cagey around very recent typing features.

new_layers: Dict[str, Layer] = {}
new_deps: Dict[str, AbstractSet[str]] = {}
new_layers: dict[str, Layer] = {}
new_deps: dict[str, Set[str]] = {}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collections.abc.Set === typing.AbstractSet
set built-in === typing.Set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice contravariance discipline

expected = "".join([files[v] for v in sorted(files)])

fmt_bs = [(fmt, None) for fmt in compr] + [(None, "10 B")]
fmt_bs = [(fmt, None) for fmt in compr] + [(None, "10 B")] # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really unfortunate limitation in mypy that it can't handle things like

arr: list[str | int] = ["a"] + [1]

I see several long running issues about it, and don't really see any forward progress :/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere with the same pattern: it's annoying, but we could remove the ignore by splitting into two lines:

fmt_bs: list[tuple[str | None, str | None]] = [(fmt, None) for fmt in compr]
fmt_bs += [(None, "10 B")]

def checkpoint(*collections, split_every: SplitEvery = None) -> Delayed:
def checkpoint(
*collections,
split_every: float | Literal[False] | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also flagging that typing_extensions~=3.10 is not necessarily widely supported yet. For example, until very recently, tensorflow couldn't be used with typing_extensions > 3.7.

Quite annoying, but a reason to be a bit cagey around very recent typing features.

new_layers: Dict[str, Layer] = {}
new_deps: Dict[str, AbstractSet[str]] = {}
new_layers: dict[str, Layer] = {}
new_deps: dict[str, Set[str]] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice contravariance discipline

@github-actions github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Nov 29, 2021
@crusaderky crusaderky changed the title WIP: mypy integration in CI Type annotations, part 1 Feb 4, 2022
@crusaderky
Copy link
Collaborator Author

I've decided to have a first tranche of this work merged. A few files remain that fail mypy, but I'll tackle them later on.
All test failures are already visible in main. Ready for review and merge.

@crusaderky crusaderky marked this pull request as ready for review February 4, 2022 15:25
@crusaderky crusaderky self-assigned this Feb 4, 2022
@pytest.mark.parametrize("idx_chunks", [None, 3, 2, 1])
@pytest.mark.parametrize("x_chunks", [(3, 5), (2, 3), (1, 2), (1, 1)])
def test_index_with_int_dask_array(x_chunks, idx_chunks):
def test_index_with_int_dask_array_nep35(x_chunks, idx_chunks):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid test name conflict

@crusaderky crusaderky removed the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Feb 7, 2022
@douglasdavis douglasdavis mentioned this pull request Feb 7, 2022
@ian-r-rose ian-r-rose self-requested a review February 8, 2022 17:06
Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @crusaderky, this is awesome. Can't wait to start using it. I have a few questions/suggestions, but nothing that I'd consider blocking.

Can you remind me what the problem was with configuring mypy to ignore problematic files? If we can, I'd love to start turning this on and just exclude some things for the time being.


@pytest.mark.parametrize(
"tup", [(1, 2), collections.namedtuple("foo", ["a", "b"])(1, 2)]
"tup", [(1, 2), collections.namedtuple("foo", ["a", "b"])(1, 2)] # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an upstream bug report for this? Seems pretty clearly to be a mypy bug to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but then again, we're using namedtuple improperly.
This works:

T = collections.namedtuple("T", ["a", "b"])
T(1, 2)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why it's improper to not capture the type, is that documented anywhere? I don't see it here.

Copy link
Collaborator Author

@crusaderky crusaderky Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy checks that the name of the namedtuple matches the variable it's assigned to. This trips it:

foo = collections.namedtuple("bar", ["a", "b"])

expected = "".join([files[v] for v in sorted(files)])

fmt_bs = [(fmt, None) for fmt in compr] + [(None, "10 B")]
fmt_bs = [(fmt, None) for fmt in compr] + [(None, "10 B")] # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere with the same pattern: it's annoying, but we could remove the ignore by splitting into two lines:

fmt_bs: list[tuple[str | None, str | None]] = [(fmt, None) for fmt in compr]
fmt_bs += [(None, "10 B")]

"""

active = set()
active: ClassVar[set[tuple]] = set()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be awesome to write down a Protocol for a callback here, especially since we are about to drop 3.7, but in the meantime, can we do

Suggested change
active: ClassVar[set[tuple]] = set()
active: ClassVar[set[tuple[Callable | None, ...]]] = set()

here and below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import yaml

if TYPE_CHECKING:
from typing_extensions import Literal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on when #8572 lands we could remove this conditional import

class Foo:
@globalmethod(key="f")
def f():
def f(): # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, I have no idea what the error mypy is reporting here is. Seems perfectly fine to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @globalmethod decorator is too magic for mypy to handle.

CI. Developers should run them locally before they submit a PR, through the single
command ``pre-commit run --all-files``. This makes sure that linter versions and options
are aligned for all developers.
Dask uses several code linters (flake8, black, isort, pyupgrade, mypy), which are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

dsk, keys = trivial(int(n), 5)
start = time()
get(dsk, keys)
get(dsk, keys) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh, another weird one from mypy. Do you understand this? Pyright seems okay with it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that

  • the functions in that list are not typed and/or
  • the functions in that list don't have the exact same signature

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for sure it's related to having untyped signatures in the above functions, but my understanding was that mypy will typically infer Any for such a case, and instead I'm seeing cannot call function of unknown type. But not worth going into, I just thought it was weird.

@crusaderky
Copy link
Collaborator Author

Can you remind me what the problem was with configuring mypy to ignore problematic files?

Anything importing the problematic files will stop working.

If we can, I'd love to start turning this on and just exclude some things for the time being.

You can add to .pre-commit-config.yaml:

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v0.931
    hooks:
      - id: mypy
        additional_dependencies:
          # Type stubs
          - types-docutils
          - types-pkg_resources
          - types-PyYAML
          - types-requests
          - types-setuptools
          # Libraries exclusively imported under `if TYPE_CHECKING:`
          - typing_extensions  # To be reviewed after dropping Python 3.7
          # Typed libraries
          - numpy

and then run pre-commit run --all-files | grep -v <known bad files>

Here and elsewhere with the same pattern: it's annoying, but we could remove the ignore by splitting into two lines:

Your proposed alternative hurts readability and is overkill for a unit test, IMHO

I think I addressed all review comments

@ian-r-rose
Copy link
Collaborator

Here and elsewhere with the same pattern: it's annoying, but we could remove the ignore by splitting into two lines:

Your proposed alternative hurts readability and is overkill for a unit test, IMHO

Fair enough. I don't love the proposed solution either, but might advocate for it more strongly if it were not in a unit test.

Copy link
Collaborator

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @crusaderky!

@crusaderky crusaderky merged commit 20e9246 into dask:main Feb 9, 2022
@crusaderky crusaderky deleted the mypy branch February 9, 2022 20:25
CI. Developers should run them locally before they submit a PR, through the single
command ``pre-commit run --all-files``. This makes sure that linter versions and options
are aligned for all developers.
Dask uses several code linters (flake8, black, isort, pyupgrade, mypy), which are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be missing something, but it looks like we're not enforcing mypy at the moment. Should we drop mypy from this list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. That's from the initial version with the checker. Yes this paragraph should be reverted for now.

jsignell pushed a commit that referenced this pull request May 6, 2022
[PEP 544](https://www.python.org/dev/peps/pep-0544/) introduces the `Protocol` class to the `typing` module in Python 3.8 (the soon be the minimum supported version, dask/community#213). Writing new Dask collections for [dask-awkward](https://github.com/ContinuumIO/dask-awkward/) has had me thinking about working on a `DaskCollection` protocol. I imagine the benefits to be:

- usage with static type checkers
  - other activity in this area at
    - #8295 
    - #8706 
    -  #8854
  - Python supporting IDEs take advantage of typing
- self-documenting; some improvements to [the custom collections page](https://docs.dask.org/en/latest/custom-collections.html) of the docs. The protocol docs can be autogenerated and added to that page.
- purely opt-in feature

The `typing.runtime_checkable` decorator allows use of `isinstance(x, DaskCollection)` in any code base
that uses Dask collections; for example:

```python
>>> from dask.typing import DaskCollection
>>> import dask.array as da
>>> x = da.zeros((10, 3))
>>> isinstance(x, DaskCollection)
True
```
(though this is an order of magnitude slower than `dask.base.is_dask_collection` which only checks for `x.__dask_graph__() is not None`; static typing checking & built-in interface documentation are the core benefits IMO)

Something else that came up in the brief discussion on a call last week was having `{Scheduler,Worker,Nanny}Plugin` protocols in `distributed`; and perhaps those are better places to start introducing protocols to Dask since on the user side typically more folks would write plugins than new collections.
erayaslan pushed a commit to erayaslan/dask that referenced this pull request May 12, 2022
[PEP 544](https://www.python.org/dev/peps/pep-0544/) introduces the `Protocol` class to the `typing` module in Python 3.8 (the soon be the minimum supported version, dask/community#213). Writing new Dask collections for [dask-awkward](https://github.com/ContinuumIO/dask-awkward/) has had me thinking about working on a `DaskCollection` protocol. I imagine the benefits to be:

- usage with static type checkers
  - other activity in this area at
    - dask#8295 
    - dask#8706 
    -  dask#8854
  - Python supporting IDEs take advantage of typing
- self-documenting; some improvements to [the custom collections page](https://docs.dask.org/en/latest/custom-collections.html) of the docs. The protocol docs can be autogenerated and added to that page.
- purely opt-in feature

The `typing.runtime_checkable` decorator allows use of `isinstance(x, DaskCollection)` in any code base
that uses Dask collections; for example:

```python
>>> from dask.typing import DaskCollection
>>> import dask.array as da
>>> x = da.zeros((10, 3))
>>> isinstance(x, DaskCollection)
True
```
(though this is an order of magnitude slower than `dask.base.is_dask_collection` which only checks for `x.__dask_graph__() is not None`; static typing checking & built-in interface documentation are the core benefits IMO)

Something else that came up in the brief discussion on a call last week was having `{Scheduler,Worker,Nanny}Plugin` protocols in `distributed`; and perhaps those are better places to start introducing protocols to Dask since on the user side typically more folks would write plugins than new collections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

array dataframe dispatch Related to `Dispatch` extension objects io

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants