-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Support __array_function__ dispatching on Array subclasses #6706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support __array_function__ dispatching on Array subclasses #6706
Conversation
|
cc @jthielen. |
|
I will say, in general Python gives subclasses a chance to determine the outcome of binary operations by calling the subclasses' implementation first. IIUC the current implementation is consistent with that. |
|
@TomAugspurger good point, that makes sense.
The thing I'm trying make easier is when you don't want to change the implementation in a subclass, you don't want to copy the whole boilerplate How about |
|
Checking FYI @jrbourbeau we might consider including this in the release if we're doing one today since it's a regression from previous behavior. |
|
Yeah @TomAugspurger this regression actually broke some things for us, since internally we use an Array subclass, so we've had to switch to running off my fork for now. Though it's probably not that common, getting a fix in today's release would be appreciated by anyone else who'd run into this. |
|
Thanks for looping me into the discussion on this! My reasoning in #6393 for the comparison with But, it makes a lot of sense for subclasses to not have to entirely re-implement the type checking, so the fix currently proposed here ( Just to check, while on the topic of subclasses: do binops and ufuncs work as expected or not (i.e., having easy inheritance while also preserving type casting order)? They still have slightly different checks (#6672, #6673), so it would be worth confirming. |
|
Does this PR need more tests? The proposed change seems reasonable to me. |
|
@jthielen it looks to me that binops should both inherit and preserve casting order, since they all use However, I'm realizing ufuncs (and array funcs) don't really subclass in the way you'd hope—they only work when all the inputs are your subclass (you can't mix it with plan Arrays): class SubArray(da.Array):
def __array_ufunc__(self, numpy_ufunc, method, *inputs, **kwargs):
print("in custom __array_ufunc__")
return super().__array_ufunc__(numpy_ufunc, method, *inputs, **kwargs)
def __array_function__(self, func, types, args, kwargs):
print("in custom __array_function__")
return super().__array_function__(func, types, args, kwargs)
arr = da.arange(5)
sa = SubArray(arr.dask, "sa", arr.chunks, meta=arr._meta)>>> np.broadcast_arrays(sa, sa) # this works
in custom __array_function__
[dask.array<sa, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>,
dask.array<sa, shape=(5,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>]
>>> np.broadcast_arrays(sa, arr) # mixing subclass and Array doesn't work; both defer to each other
in custom __array_function__
TypeError: no implementation found for 'numpy.broadcast_arrays' on types that implement __array_function__: [<class '__main__.SubArray'>, <class 'dask.array.core.Array'>]My feeling is basically that I'd expect this to work out of the box: class SubArray(da.Array):
passIs that reasonable to expect from a NumPy dispatching perspective? Or maybe "work" here isn't well-enough defined. Because ideally I'd also expect ufuncs operating on my subclass to return instances of |
I have a slight preference for not including this in today's release as it looks like the conversation here is still ongoing and since we'll be holding off on releasing for 2-3 weeks to thoroughly vet recent PRs (dask/community#99) I'd rather include this PR with those we vet in case the changes here have any unintended consequences |
So maybe we should do With that change, this subclass works with both array functions and ufuncs: class SubArray(da.Array):
passHowever, my first Lines 177 to 182 in 0c72cf9
(versus checking the type, like __array_function__ does.)
This may not be a big deal, but it means if you do want to change ufunc behavior in a subclass, you can't effectively call Basically, I feel like they should probably use one method, for consistency? And the "inputs are instances of the same class or superclasses" check recommended in NEP-13 seems like the way to go to me. I know Here's an alternative PR which uses that method for decide on delegation for both |
|
I think this fix is probably fine, but I think it would be a good idea for the broader dask development team to make an intentional decision about whether subclassing dask data structures is a supported feature or not. Implementation inheritance imposes a cost in complexity that may not be worth the trouble. |
Indeed, in retrospect (and based on #6632), #6393 probably should have done it this way from the start. In any case, @ravwojdyla opened an issue for it: #6672 |
|
Closing in favor of #6710. |
#6393 breaks many operations on subclasses of
Array, since the inherited__array_function__no longer considers them to be Dask-supported types.#6393 (comment)
black dask/flake8 dask