Skip to content

Conversation

@gjoseph92
Copy link
Collaborator

#6393 breaks many operations on subclasses of Array, since the inherited __array_function__ no longer considers them to be Dask-supported types.

#6393 (comment)

  • Tests added / passed
  • Passes black dask / flake8 dask

@TomAugspurger
Copy link
Member

cc @jthielen.

@TomAugspurger
Copy link
Member

I will say, in general Python gives subclasses a chance to determine the outcome of binary operations by calling the subclasses' implementation first.

IIUC the current implementation is consistent with that. dask.Array binops will return NotImplemented when there are any subtypes in types.

@gjoseph92
Copy link
Collaborator Author

@TomAugspurger good point, that makes sense.

Python gives subclasses a chance to determine the outcome of binary operations by calling the subclasses' implementation first

The thing I'm trying make easier is when you don't want to change the implementation in a subclass, you don't want to copy the whole boilerplate __array_function__—you'd expect it to work via inheritance.

How about type_ is type(self) here, instead of hardcoding Array? That should both cause Array to defer to subclasses, and allow them to inherit a working __array_function__.

@TomAugspurger
Copy link
Member

Checking type(self) rather than Array seems reasonable I think... Let see if @jthielen has a chance to reply.

FYI @jrbourbeau we might consider including this in the release if we're doing one today since it's a regression from previous behavior.

@gjoseph92
Copy link
Collaborator Author

Yeah @TomAugspurger this regression actually broke some things for us, since internally we use an Array subclass, so we've had to switch to running off my fork for now. Though it's probably not that common, getting a fix in today's release would be appreciated by anyone else who'd run into this.

@jthielen
Copy link
Contributor

jthielen commented Oct 6, 2020

Thanks for looping me into the discussion on this!

My reasoning in #6393 for the comparison with type is Array vs. a issubclass is 1) composition being preferred over inheritance made me not think about the issue of subclasses all that deeply, and 2) even if someone was subclassing dask.array.Array, it would always be an "upcast type" of Array in the context of the type casting hierarchy of NEPs 13 and 18, so Array should always defer to it, and subclasses should handle types themselves just like any other upcast type.

But, it makes a lot of sense for subclasses to not have to entirely re-implement the type checking, so the fix currently proposed here (type_ is type(self), rather than the previously suggested issubclass check which could break the type casting hierarchy by making it cyclic between subclasses) looks good to me!

Just to check, while on the topic of subclasses: do binops and ufuncs work as expected or not (i.e., having easy inheritance while also preserving type casting order)? They still have slightly different checks (#6672, #6673), so it would be worth confirming.

@jsignell
Copy link
Member

jsignell commented Oct 6, 2020

Does this PR need more tests? The proposed change seems reasonable to me.

@gjoseph92
Copy link
Collaborator Author

@jthielen it looks to me that binops should both inherit and preserve casting order, since they all use @check_if_handled_given_other, which just checks whether other has a (different) __array_ufunc__ method.

However, I'm realizing ufuncs (and array funcs) don't really subclass in the way you'd hope—they only work when all the inputs are your subclass (you can't mix it with plan Arrays):

class SubArray(da.Array):
    def __array_ufunc__(self, numpy_ufunc, method, *inputs, **kwargs):
        print("in custom __array_ufunc__")
        return super().__array_ufunc__(numpy_ufunc, method, *inputs, **kwargs)

    def __array_function__(self, func, types, args, kwargs):
        print("in custom __array_function__")
        return super().__array_function__(func, types, args, kwargs)

arr = da.arange(5)
sa = SubArray(arr.dask, "sa", arr.chunks, meta=arr._meta)
>>> np.broadcast_arrays(sa, sa)  # this works
in custom __array_function__
[dask.array<sa, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>,
 dask.array<sa, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>]

>>> np.broadcast_arrays(sa, arr)  # mixing subclass and Array doesn't work; both defer to each other
in custom __array_function__
TypeError: no implementation found for 'numpy.broadcast_arrays' on types that implement __array_function__: [<class '__main__.SubArray'>, <class 'dask.array.core.Array'>]

My feeling is basically that I'd expect this to work out of the box:

class SubArray(da.Array):
    pass

Is that reasonable to expect from a NumPy dispatching perspective?

Or maybe "work" here isn't well-enough defined. Because ideally I'd also expect ufuncs operating on my subclass to return instances of SubArray, but they certainly will not without many more changes across Dask.

@jrbourbeau
Copy link
Member

FYI @jrbourbeau we might consider including this in the release if we're doing one today since it's a regression from previous behavior

I have a slight preference for not including this in today's release as it looks like the conversation here is still ongoing and since we'll be holding off on releasing for 2-3 weeks to thoroughly vet recent PRs (dask/community#99) I'd rather include this PR with those we vet in case the changes here have any unintended consequences

@gjoseph92
Copy link
Collaborator Author

an __array_ufunc__ implementation of a class should generally return NotImplemented unless the inputs are instances of the same class or superclasses [emphasis mine]
NEP 13

So maybe we should do issubclass(type(self), type_)? (That's inverted from my original issubclass—checking that the other is a superclass of us.)

With that change, this subclass works with both array functions and ufuncs:

class SubArray(da.Array):
    pass

However, my first SubArray version, with a custom __array_ufunc__ which just prints and defers to super(). __array_ufunc__, raises a TypeError with ufuncs, since the _should_delegate implementation only checks whether the __array_ufunc__ on the other object is the same function object or not:

dask/dask/array/core.py

Lines 177 to 182 in 0c72cf9

elif (
hasattr(other, "__array_ufunc__")
and not is_valid_array_chunk(other)
and type(other).__array_ufunc__ is not Array.__array_ufunc__
):
return True

(versus checking the type, like __array_function__ does.)

This may not be a big deal, but it means if you do want to change ufunc behavior in a subclass, you can't effectively call super().__array_ufunc__—you'd usually have to rewrite the whole thing.

Basically, __array_ufunc__ and __array_function__ are using two different methods to determine whether to defer to other: if the function object is the same, versus if the type is the same.

I feel like they should probably use one method, for consistency? And the "inputs are instances of the same class or superclasses" check recommended in NEP-13 seems like the way to go to me. I know type(item).__array_ufunc__ is not ndarray.__array_ufunc__) is copied directly from the NEP, but it seems to me that (confusingly) this isn't actually the recommended implementation, but an example of how ndarray's implementation is different.

Here's an alternative PR which uses that method for decide on delegation for both __array_ufunc__ and __array_function__: #6710.

@shoyer
Copy link
Member

shoyer commented Oct 6, 2020

I think this fix is probably fine, but I think it would be a good idea for the broader dask development team to make an intentional decision about whether subclassing dask data structures is a supported feature or not. Implementation inheritance imposes a cost in complexity that may not be worth the trouble.

@jthielen
Copy link
Contributor

jthielen commented Oct 7, 2020

[...] Basically, __array_ufunc__ and __array_function__ are using two different methods to determine whether to defer to other: if the function object is the same, versus if the type is the same.

I feel like they should probably use one method, for consistency? [...]

Indeed, in retrospect (and based on #6632), #6393 probably should have done it this way from the start. In any case, @ravwojdyla opened an issue for it: #6672

@gjoseph92
Copy link
Collaborator Author

Closing in favor of #6710.

@gjoseph92 gjoseph92 closed this Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants