-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Move blockwise_meta to more general compute_meta function #4954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also check for invalid arguments
|
In general, this looks like a good improvement towards #4952, but I think in the future we may still try to improve this further to allow passing other exceptions to the user as well. |
|
I'm still working on this. It got a bit deeper than I expected. I'll take another crack at this tomorrow. |
|
What is still left that you want to tackle? |
1. We now allow non-array meta. This ends up being common in temporary
arrays for gufuncs and reductions
2. Users can now specify explicit meta= in blockwise
3. We no longer expect metas to support the astype method
|
I'd like to do another review once you're done, could you please ping me then @mrocklin ? |
This got a bit more complex than it was previously. Review would be welcome. |
|
|
||
| if (isinstance(chunks, str) or | ||
| isinstance(chunks, tuple) and | ||
| chunks and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't isinstance(chunks, str) or isinstance(chunks, tuple) and chunks redundant? If chunks is either str or tuple, then it can't be None or False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line can be removed for code clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it can be (), which is falsey :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I didn't know about that, thanks for clarifying!
| inds = tuple(range(x.ndim)) | ||
| # The dtype of `tmp` doesn't actually matter, and may be incorrect. | ||
| tmp = blockwise(chunk, inds, x, inds, axis=axis, keepdims=True, dtype=x.dtype) | ||
| tmp = blockwise(chunk, inds, x, inds, axis=axis, keepdims=True, dtype=dtype or float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't x.dtype be used anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we construct Dask arrays that don't have array-like metadata. This tends to happen in reductions a lot where we pass through tuples or dicts rather than arrays. In these cases, calling x.dtype will err.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better than to have getattr(x, 'dtype', dtype) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't really matter here. The dtype of tmp isn't very relevant. We ignore it in the following lines. We just need to not call x.dtype, which can now sometimes err.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks for clarifying.
|
Testing this branch with pydata/xarray#3027 still shows one test failure: Details |
| meta = np.empty((0,) * ndim, dtype=dtype or x.dtype) | ||
|
|
||
| if np.isscalar(meta): | ||
| meta = np.array(meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this block? It breaks 3 CuPy tests: https://github.com/dask/dask/blob/master/dask/array/tests/test_cupy.py#L15-L17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that I have resolved this in 83ff1d4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can confirm. Thanks for that!
| meta = meta.astype(_dtype) | ||
|
|
||
| if np.isscalar(meta): | ||
| meta = np.array(meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this may cause problems for libraries such as CuPy, just like the entry in meta_from_array, but there's no test at the moment to confirm my suspicion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've resolved this elsewhere, but I'm not entirely certain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's leave it for now, we will address it if we find a failing case in the future.
|
@shoyer I just quickly tried to reproduce the error from |
|
@pentschev the test failure only appears on the xarray branch from pydata/xarray#3027 (with the dask branch from this PR) It would probably make sense to try to only use NumPy functions + indexing/ |
Right, so replacing the Having Xarray indexer objects as metadata in dask arrays probably doesn't make sense. Meta should represent the type of our blocks once they're computed rather than the type of data that we started with. I just tested this with h5py in ad55df3 and it works there because we slice the object down and that produces a numpy array. The lazy slicing of Xarray indexer objects fouls things up here. We can make Dask array operations more robust to this, which is good regardless, but it might also make sense for Xarray to explicitly populate a |
I would like to say: never. But that's probably unrealistic. At the very least it would be nice to increase
If that's an option on the xarray side, I think it would be a nice -- and probably not so dirty -- way of solving it. I don't really have a better idea at the moment. |
Yes, this sounds like a good option. Our "lazy backend array" classes are somewhat unusual -- most classes like this create NumPy arrays when indexed. It seems pretty ugly to need to set Even if you're using backend arrays where the default |
|
I've added |
|
How are you so fast @mrocklin? I was writing an answer and you were ready with a commit for that. |
I would recommend using |
|
This looks good to me, assuming I can write something like from_array(...,
meta=np.ndarray)? We'll need to merge this before I can finish the fix on
the xarray side and reenable our dask-dev tests.
…On Sat, Jun 22, 2019 at 11:06 AM Matthew Rocklin ***@***.***> wrote:
Maybe even supporting a short-cut form like meta=np.ndarray, so I don't
need to get really in the weeds of dask's data model by writing something
like np.empty((0,)*x.ndim, x.dtype)?
Done in 99a83d3
<99a83d3>
I would like to merge this soonish if that's possible. @shoyer
<https://github.com/shoyer> are we ok on the Xarray side?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4954?email_source=notifications&email_token=AAJJFVVVOF5TTFCDK4PEISDP3XMQRA5CNFSM4HYXFZA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYKDVJQ#issuecomment-504642214>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJJFVSQ5Q7FFJ6QDZFDDJTP3XMQRANCNFSM4HYXFZAQ>
.
|
In [1]: import dask.array as da
In [2]: import numpy as np
In [3]: da.from_array(np.ones(5), meta=np.ndarray)
Out[3]: dask.array<array, shape=(5,), dtype=float64, chunksize=(5,)>
In [4]: _._meta
Out[4]: array([], dtype=float64) |
|
This is in. Thanks @shoyer and @pentschev for review. @shoyer please speak up if there are further Xarray issues. |
Also check for invalid arguments
cc @pentschev
I ran into this while trying to get Xarray tests back up and running.