-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix/apply ufunc meta dtype #4022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks @mathause , this works for me. Also with a change of dtype in func, as long as "output_dtypes" is set correctly. |
dcherian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mathause. I just have one minor suggestion. LGTM otherwise.
| # work around dask bug computing meta with vectorized functions: GH5642 | ||
| meta = np.ndarray | ||
| # defer raising errors to _apply_blockwise (e.g. if output_dtypes is None) | ||
| meta = np.ndarray((0, 0), dtype=output_dtypes[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we still set meta = np.ndarray if no output dtype is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_dtypes is required for dask="parallelized" and will error if it is missing:
xarray/xarray/core/computation.py
Lines 670 to 674 in 8051c47
| if output_dtypes is None: | |
| raise ValueError( | |
| "output dtypes (output_dtypes) must be supplied to " | |
| "apply_func when using dask='parallelized'" | |
| ) |
so this wont take effect. I am also not very happy with my approach, but didn't want to copy the checks from apply_blockwise up here - suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the cleaner workaround is to move this down in to _apply_blockwise? Would it be enough to pass vectorize down to that level and then set meta as you are doing here?
Also, it seems like we should raise that error about output_dtypes only if meta.dtype has not been set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree it would be cleaner to thread vectorize through to _apply_blockwise.
Also, it seems like we should raise that error about output_dtypes only if meta.dtype has not been set?
Depends how important output_dtypes is for np.vectorize.
I am happy to work more on this, but I think it would be good to discuss #4060 first, which might make this obsolete.
|
I'll close this in favor of #4060 |
* ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc` for multiple outputs when `dask='parallelized'`, add/fix tests * DOC: Update docstring and whats-new.rst * WIP: apply_gufunc * WIP: apply_gufunc -> reinstate dask='allowed' as per @mathause, adapting tests * WIP: apply_gufunc -> add test for GH #4015, fix test for sparse meta checking * WIP: apply_gufunc -> remove unused `input_dims` * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <[email protected]> * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <[email protected]> * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <[email protected]> * WIP: use dask_gufunc_kwargs, keep vectorize first but only for non-dask-gufunc, rework docstrings, adapt tests * DOC: add reference to internal changes in whats-new.rst * FIX: mypy * FIX: vectorize inside `apply_variable_ufunc` * TST: add tests from #4022 from @mathause * FIX: address black issue * FIX: xfail test for dask < 2.3 * WIP: apply changes in response to @mathause's review comments * WIP: remove line * WIP: catch different chunksize error and allow_rechunk, docstring fixes * WIP: remove comment * WIP: style issues * WIP: revert catch, revert test, add tests without output_dtypes * WIP: fix signature in apply_ufunc->apply_gufunc, handle output_sizes, handle dask version, fix tests * WIP: fix tuple * WIP: add dims_map to _UFuncSignature, adapt output_sizes to fit for apply_gufunc * WIP: black * WIP: raise ValueError if output_sizes dimension mismatch * WIP: raise ValueError if output_sizes is missing for given output_core_dims * WIP: simplify if/else * FIX: resolve conflicts prior merge with master * FIX: combine if's as per review * FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `apply_variable_ufunc` as per review suggestion * FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `da.apply_gufunc` * FIX: address review comments of @keewis and @mathause * FIX: black * FIX: `vectorize` not needed in if-clause * FIX: set DeprecationWarning and stacklevel=2 * FIX: use FutureWarning for user visibility * FIX: remove comment as suggested Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Mathias Hauser <[email protected]> Co-authored-by: Deepak Cherian <[email protected]>
isort -rc . && black . && mypy . && flake8whats-new.rstfor all changes andapi.rstfor new API