-
-
Notifications
You must be signed in to change notification settings - Fork 3
Description
There's been some discordant opinions re. annotations recently, so I'd like to reach a consensus on project-wide guidelines.
The below is a first draft and represents my personal opinion only - please comment if you would like to add/amend.
General guidelines
- Dask code is annotated according to PEP 484 and later revisions.
This means that the style for annotating dask can change over time as new PEPs are created or amended upstream. - In addition to the PEPs, dask complies with typeshed and mypy guidelines. Notably:
__init__return type is not annotated- return types should not be unions to avoid burdening the caller with type checking, unless you want the caller to go through type checking at runtime.
e.g.def f() -> int | strshould be replaced withdef f() -> Any.
When designing new API, you should avoid having multiple return types unless you want the user to explicitly check for them; consider instead using TypeVars or breaking your function into two separate ones.
- All new code should be annotated, if sensible. Code should not be annotated if adding annotations would make it unreasonably cumbersome to read.
- When tweaking existing code, it is recommended (but not required) to spend some time adding annotations to the updated functions, unless it would substantially blow up the scope of the PR and/or make it much harder to review.
- This guidelines document should not be enforced in code review. If you are reviewing a PR and you see a violation, before asking it to be amended please ask yourself how much friction you'd introduce by demanding it to be compliant. Consider using a github suggestion so that the developer just has to click on "accept" instead of doing the work themselves.
mypy
- mypy version and settings are hardcoded project-wide. For the sake of coherence and reproducibility, you should call mypy exclusively through pre-commit.
- You should use
# type: ignoreonly when working around it would be unreasonable. - If you use
# type: ignoreto work around a bug in mypy, you should add a FIXME with a cross-reference to the mypy issue on github.
Specific vs. generic
Specific is better than generic, unless it unreasonably harms readability.
For example, this is bad:
d: dict = {}This is good:
d: dict[str, Callable[[str], bool]] = {}However, this is bad:
d: dict[
str | MyClass1 | MyClass2 | tuple[int, str, MyClass3],
Callable[[str, SomethingElse, AndOneMore], bool]
| Callable[[int, numpy.ndarray | dask.array.Array], bool]
] = {}in the above case, it is better to be more generic for the sake of readability:
d: dict[Any, Callable[..., bool]] = {}Frequently, components of very complex signatures are used repeatedly across a module; you should consider using TypeAlias (in the example above, there could be a TypeAlias for the dict keys).
You should use @overload, but only when the added verbosity is outweighed by the benefit.
Parameters and return types
Parameter types should be as abstract as possible - anything that is valid.
Return types should be as concrete as possible (however, do not use unions in return types - as explained above).
Prefer immutable collections in parameters whenever possible.
e.g.
def f(d: Mapping[str, int | None]) -> dict[str, int]:
return {k: v for k, v in d.items() if v is not None}Backporting
You may use annotations from the very latest version of Python, as follows:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
# TODO import from typing (requires Python >=3.10)
from typing_extensions import TypeAlias
UserID: TypeAlias = int | strThe TYPE_CHECKING guard is necessary as typing_extensions is not a runtime dependency.
The TODO comment is strongly recommended to make future upgrade work easier.
TYPE_CHECKING should be used only when actually necessary.
Delayed annotations
At the moment of writing, dask supports Python 3.8+ at runtime, but uses Python 3.9+ annotations.
from __future__ import annotations
x: int | dict[str, int]Don't use Union, Optional, Dict, etc. or quoted annotations unless necessary.
Notable exception are cast and subclassing (see next paragraph), which are interpreted at runtime:
x = cast("int | dict[str, int]", x)In this case, quoted annotations are preferred to Union etc.
typing vs. collections.abc
Always import everything that's possible from collections.abc.
Import from typing only when an object doesn't exist elsewhere.
Again, this requires from __future__ import annotations to work in Python 3.8:
from __future__ import annotations
from collections.abc import Mapping
x: Mapping[str, int]The only exception is when subclassing a collection. This is the only way to make it work in Python 3.8:
# TODO: import from collections.abc (requires Python >=3.9)
from typing import Mapping
class MyDict(Mapping[str, int]):
...In this case, you should import the Mapping class only from typing. Don't import it from collections.abc as well.
Class annotations
You should always annotate all instance variables, both public and private, in the class header (C++ style).
Class variables should be explicitly marked with ClassVar.
mypy implements some intelligence to infer the type of instance variables from the __init__ method; it should not be relied upon.
Don't:
class C:
x = 1
def __init__(self):
self.d: dict[str, int] = {}Do:
class C:
x: ClassVar[int] = 1
d: dict[str, int]
def __init__(self):
self.d = {}It is a good idea to keep Sphinx documentation together with annotations.
It is a good idea to use annotations to avoid explicitly declaring slots.
e.g.
class C:
#: This is rendered by sphinx; don't forget the `:`!
#: Line 2
x: int
__slots__ = tuple(__annotations__)In sphinx:
.. autoclass:: mymodule.C
:members:Redundant annotations
Redundancy should be avoided.
Type specifications should also be avoided in Sphinx documentation when redundant with the annotations in the code.
For example:
class C:
d: dict[str, int]
def f(self, x: str):
"""Some description
Parameters
----------
x: int # BAD - it's redundant with the signature; just use `x: `
Some description
"""
# BAD - the type of y can be fully inferred by the declaration of the instance variable
y: int = self.d[x]None
Defaults to None should be explicit, e.g. x: int | None = None.
You should always prefer | None to Optional or | NoneType.