-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
This is a follow-up from a very old discussion on #5511, which resurfaced again today on #7243.
dask features several trivial optional dependencies: cloudpickle, fsspec, toolz, and partd. "trivial" here means that
- they are either pure-python or with a transparent pure-python option,
- they are tiny,
- they exclusively have dependencies of their own that match the above two criteria.
As they are optional, when somebody types pip install dask (as a distracted user will likely do) they are not installed.
There is nothing in our CI, except for the superficial continuous_integration/scripts/test_imports.sh, verifying that dask can actually work without them.
toolz is a hard dependency of dask.base, so I strongly suspect there is nobody out there who doesn't use it: we're talking about an hypotethical user who either manually builds dask graphs or wrote a custom dask collection and who doesn't use any of the many fundamental tools in dask.base.
Proposed change
- Make cloudpickle, fsspec, toolz, and partd mandatory dependencies.
- The [bag] and [delayed] targets in setup.py would become empty, for backwards compatibility purposes only.
- Clean up the codebase of all conditional imports of dask.bag, dask.delayed, or one of the above libraries.
CC @ryan-williams ; please pull in any other known "light" dask users that install neither numpy nor pandas.