-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix scalar dispatching, fixes #6631 and #6611 #6632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I can take a closer look when I get more time, but here's my initial impression: #6393 was too strict in its dunder operation type check when it comes to scalars since it essentially ignores non-numeric scalars (something that wasn't even on my mind, as all my own use cases are numeric!). This is definitely an issue that needs to be resolved as soon as possible, particularly for the dominant use case of strings (#6611 and #6631). However, the approach of this PR goes too far the other way towards permissiveness, which breaks my primary motivating use case for #6393: Pint compatibility. With #6393, Dask properly defers to Pint My general take on this is that Dask Arrays need to defer to unknown scalars just like they do to unknown arrays. So, we should probably do a more comprehensive survey of types that Dask should handle out of the box (most definitely strings, but not sure if there are others?), add those to the current check, and then update the documentation to emphasize that other unknown types (both arrays and scalars) need to be registered in order for Dask to operate with them. Perhaps a name other than |
|
👋 @jthielen thanks for taking a look, I'm obviously open to suggestions. Quick question:
Could you please point me to where is this currently tested? |
|
Right now, it isn't being tested in Dask, but in Pint: https://github.com/hgrecco/pint/blob/master/pint/testsuite/test_compat_downcast.py#L95-L111. It's still xfailed, however, due to Pint's slower cadence and its grouping with the similar issue of numpy/numpy#15200 with masked arrays. When writing the comment, though, that was based on just a quick, informal test in the repl. Perhaps a mock scalar test should also be added to Dask in this regard (don't want it to be Pint-specific, rather than any generic upcast type), or, even better yet, effort be put towards a more robust collection of these array/scalar interoperability tests across the ecosystem? |
|
@jthielen I see, we should definitely add a test that covers that use case. Now looking at the implementation of Pint's |
|
I didn't follow the original issue closely so I'm not sure what's best
here. Do you have a recommendation?
…On Mon, Sep 14, 2020 at 2:10 PM Rafal Wojdyla ***@***.***> wrote:
@jthielen <https://github.com/jthielen> I see, we should definitely add a
test that covers that use case.
Now looking at the implementation of Pint, I see why this PR would no
longer delegate to Pint. It's a bigger question of whether Dask should or
should not delegate to "unknown scalars", and more importantly how should
that change be introduced in Dask (because it is a breaking change). Is
there an agreement on this @TomAugspurger
<https://github.com/TomAugspurger> @mrocklin <https://github.com/mrocklin>
@jthielen <https://github.com/jthielen> ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6632 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOISFGCE6Z3TBNPDMMPDSFZTC3ANCNFSM4RKCY4YA>
.
|
78e452a to
c1d6929
Compare
|
@jthielen added a simple test to validate/cover your original issue (we can obviously make this look better, I just wanted to have this covered in this PR) |
|
Some context:
There are at least 2 solutions:
There are obviously pluses/minuses to both. Looking at this from the user perspective, in my opinion, scalar handling before #6393, made more sense (which this PR "reverts" to). That said we should probably still allow for a custom scalar (like Pint's |
|
Do we have one other option? We could try manually deferring to I'm somewhat pessimistic about a system that relies on third-party libraries registering things, at least specifically with Dask. |
My hope is that this doesn't affect most users at all :) If we ironed out all the issues with the current implementation by having Dask take over for more types, would the two implementations be equivalent from your point of view? |
Just to be clear, in my mind that would not be a "manual" registration (akin to the current design of
Quoting Guido van Rossum from this issue:
afaiu #6393 follows that design (+ NEP-13) for array-like (with explicit downcast hierarchy, which is great and makes sense). For scalars, there is no explicit hierarchy, so it seems like we should follow the pythonic way (subclasses before superclass, left to right)? We could diverge, I'm just no sure what kind of dragons to expect once we cross that threshold? Another alternative could be for scalars to "force" their own op on (Dask) array by setting class TestScalar:
__array_ufunc__= None
def __add__(self, other):
# other could be a Dask array, that this implementation knows how to handle
return TestScalar()
__radd__ = __add__
@pytest.mark.parametrize("arr", [da.from_array([1, 2]), np.asarray([1, 2])])
def test_unknown_scalar_delegation(arr):
assert type(arr + TestScalar()) == TestScalar
assert type(TestScalar() + arr) == TestScalarThe test above works, but actually NOT for the reasons that it should (if you step the debugger you will understand why, more on that below). No change in Dask is needed (apart from this PR, which reverts scalar handling to priori to #6393), and we delegate the responsibility to the scalar classes. BUT it seems Dask Arrays do not follow NEP-13 when to comes to handling classes with disabled Which is slightly modified recommended solution from NEP-13. Also I'm wondering, how many custom scalars apart from Pint's Btw on Pint's |
|
Thanks for that analysis @ravwojdyla. It'll take me some time to digest it :) Input from others (possibly @shoyer if you have time, since you've thought about this a lot) would be welcome. |
|
I agree, thank you for the thoughtful analysis @ravwojdyla! I must admit that my resistance to the change of this PR is that one of the main reasons I put in the effort to make #6393 happen is so that That being said, my impression of this now is that the main point of contention/disagreement is that I haven't considered scalars to be any different than arrays when it comes to the type casting hierarchy dispatching of NEP-13/18, whereas you are considering them to be fully distinct. The main reasons I think that scalars should be treated under NEP-13 style dispatching just like arrays (particularly when one type is an array and another is a scalar) are:
xarray.DataArray * pint.Unit == pint.Unit * xarray.DataArray == xarray.DataArray(pint.Quantity)
dask.array.Array * pint.Unit == pint.Unit * dask.array.Array == pint.Quantity(dask.array.Array)
sparse.COO * pint.Unit == pint.Unit * sparse.COO == pint.Quantity(sparse.COO)
numpy.ndarray * pint.Unit == pint.Unit * numpy.ndarray == pint.QuantityThis is only really broken when downcast types don't respect the type casting heirarchy (Dask pre-#6393, MaskedArrays) Unless I'm missing something, since composition is preferred over inheritance, these are all essentially wrapper classes rather than subclasses, which makes it difficult to ensure commutativity with the standard Python behavior you mentioned (subclasses before superclass, left to right).
Hopefully this helps move the discussion along, and hopefully we can find a resolution that works for everyone's needs! |
|
@jthielen thanks for the feedback. To reiterate, I'm not against Pint's use of @jthielen @TomAugspurger on the 2nd glance at Pint's This implementation of the
First point is fulfilled by the Above implementation passes |
|
@ravwojdyla That fix looks almost perfect to me! It aims to preserve the hierarchical dispatching for any array/scalar implementer of To me, more detailed discussion about scalar handling is only needed if a use case for a non- |
c1d6929 to
e4ca6a5
Compare
|
I'm happy to defer to others here. |
| def __dask_graph__(self): | ||
| # Note: make sure that dask dusk arrays do not interfere with the | ||
| # dispatch mechanism. The return value here, doesn't matter. | ||
| return ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I've never seen this syntax before. I had no idea that it was valid.
|
I've taken a brief look through this and I'm also going to pass I think. If no one shows up to review I'm ok to trust @ravwojdyla and @jthielen and merge . However, I think that it would be good to get more perspectives in here if possible. The code changes here are not large. In addition to @shoyer I'm also going to ping @pentschev , who I think has a good understanding of cross-array interactions. |
| (0.0, True), | ||
| (0, True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why these cases were removed? If we are changing the behavior of is_valid_array_chunk to not include scalars, then at least we should test that by making these False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since float, int are not valid chunks, but they are still handled by dask, it could be counter intuitive to have them here, but if you like to have them here, I can add them back with a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brought them back in 1ce03b5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see your point. Though, since this is an explicit test of the is_valid_array_chunk function (and not a higher-level usage/integration), I'd like to still have them here to confirm the expected behavior of the underlying function, especially since it has changed since the original implementation (which in retrospect didn't correspond well with the function name). Perhaps also adding ("", False) for a str test would be useful as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jthielen added.
|
The most recent updates generally look good to me! Just one suggestion for the modification to an existing test (see inline comment), and one question: given the refactor of binops and |
|
@jthielen I suggest we keep NEP-18 separate for now, and create a couple of issues:
|
1ce03b5 to
a522201
Compare
pentschev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also didn't follow the work prior to this too closely, so I'll also pass and trust @jthielen 's judgment here. I'm only requesting a change on a typo that make things hard to grep.
dask/array/core.py
Outdated
| return decorator | ||
|
|
||
|
|
||
| def _should_delgate(other) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _should_delgate(other) -> bool: | |
| def _should_delegate(other) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually don't like to nitpick, but typos in code make it hard to grep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pentschev oh, yea thanks! Good catch, will fix in a second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pentschev fixed, could you please take another look.
a522201 to
5a5e33a
Compare
pentschev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for fixing the typos @ravwojdyla .
|
Thanks for looking Peter. I'll plan to merge this later today (@jcrist was maybe going to look if he had time). We'll hopefully have a release tomorrow that should include these changes. |
|
The implementation looks good to me, but I confess I don't have a strong understanding of |
Dask 2.18.0 has the fixe included via dask/dask#6632 This reverts commit ffd85e0.
Dask 2.18.0 has the fixe included via dask/dask#6632 This reverts commit ffd85e0.
Be more permissive on the check_if_handled_given_other check, related to #6631 and #6611
black dask/flake8 daskinoperator for char arrays not working with master branch anymore #6611