Skip to content

Type checking or duck typing inside dask.array.Array.__array_function__ #4583

@pentschev

Description

@pentschev

A broad discussion on how handling __array_function__ mixin should work started in #4567. I suggest this continues here.

Some highlights:

@shoyer said:

The __array_function__ protocol guarantees that it will only get called if dask.array.Array is in types. So if that's all you wanted to check, you could drop this entirely.

The problem is that you want to support operations involving some but not all other array types. For example, xarray.DataArray wraps dask arrays, not the other way around. So dask.Array.__array_function__ should return NotImplemented in that case, leaving it up xarray.DataArray.__array_function__ to define the operation.

We basically need some protocol or registration system to mark arrays as "can be coerced into dask array", e.g., either

sparse.Array.__dask_chunk_compatibile__ = True

or

dask.array.register_chunk_type(sparse.Array)

@hameerabbasi said:

I guess Dask could safely coerce anything that implements InMemory and NumpyDuckArray mixins?

I was thinking of three classes of (NumPy-Specific) mixins, aimed at implementing what I see as the three major protocols:

  • NumPyUfuncMixin
  • NumPyIndexingMixin (for NumPy-like indexing, so XArray and pandas would skip this).
  • NumPyArrayFunctionMixin (for implementing array methods via __array_function__, such as sum and mean).

I see the use of your InMemory mixin, but I would rename it to InCore, as GPU computations don't take place in main memory. :)

For now, these would be provisional, just like __array_function__. The hope is that these would be adopted and specific checking code would go away.

Another @hameerabbasi comment:

Then how about a "numpy/pydata community repo" containing these mixins? I would be happy to maintain one. (Not in the main NumPy codebase)? The reason I like the idea of mixins is it allows "duck-array" like objects to easily implement NumPy-like functionality without too much effort.

We can also have protocols:

class DuckArray:
    __array_indexing__ = True
    __array_in_core__ = True

@mrocklin

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions