Skip to content

Duck typed dask collections for use with dask.imperative #700

@shoyer

Description

@shoyer

If an xray.Dataset consists of dask arrays, it should work like a first class dask collection for integration with dask.imperative. That is, when you call a delayed function (constructed with dask.imperative.do) on a xray Dataset consisting of dask arrays, dask should merge the task graphs when it executes.

The logic in dask.imperative that does this currently checks if the object inherits from the dask collection base class:
https://github.com/blaze/dask/blob/7e9bff047894c2bc9539370898dc135508739d38/dask/imperative.py#L67-L72

Strictly speaking, xray objects aren't always dask collections (dask isn't even a required dependency), so some type of duck typing solution seems appropriate. What should this look like?

The simplest way to adapt existing code would be to check for dask, _keys, _optimize and _finalize attributes, but I would prefer more specific names for the private methods to make it clearer what they refer to. I'm happy with a .dask attribute (the presence of which alone might be enough to signal a dask collection), but for the others, maybe __dask_keys__, __dask_optimize__ and __dask_finalize__ would be appropriate names?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions