-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add Dask Array implementation of pad #3578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Should add this is a rough implementation currently. There are no tests yet (obviously those will be needed). Nor are there API docs. Many padding options are not supported currently. Also there are |
08a8203 to
760e9a4
Compare
dask/array/creation.py
Outdated
| array = asarray(array) | ||
|
|
||
| if mode not in ["constant", "edge", "linear_ramp"]: | ||
| raise NotImplementError("`pad` does not support the given `mode`.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are various statistical paddings like maximum, mean, and minimum. These could be handled by computing on the global array (possibly sliced if stat_length is provided) and then treating them like the constant padding case. The median case gets a little tricky. Though should be doable as this only gets computed along 1-D pieces of the array.
Things like reflect, symmetric, and wrap can be constructed by slicing the Dask Array into pieces, moving them around, and running block to stitch them all together into one Dask Array. Given how this behavior differs from the pad behavior included, we may opt to have this be a separate internal function that pad calls.
Finally there is the possibility of a user supplying an arbitrary function. We probably could implement this. Have some thoughts on it, but it is enough of a special case that we might want to wait and see if people ask for this feature and see what they expect from it.
ac70f12 to
b6e7e79
Compare
|
CI failure appears to be caused by issue ( numpy/numpy#7353 ). Bumping to NumPy 1.11.1 to see if that fixes the issue. |
|
|
||
| if any(map(any, pad_chunk_width)): | ||
| dsk[result_chunk_key] = ( | ||
| np_pad, array_chunk_key, pad_chunk_width, mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we reuse NumPy's pad here. This increases the chunks on the edge of the array and simplifies the logic we need to handle.
An alternative to this would be to do the padding ourselves manually in Dask. The benefit being the input array would be unchanged and the padded data would appear as new chunks on the exterior of the array.
All of the other padding cases already add the padding using new, distinct chunks while not changing the array itself. Could be useful if we wanted to provide a user option to override the chunking of the padded array.
This provides a Dask Array implementation of NumPy's `pad`. It dispatches through 1 of 4 different functions depending on the type of padding (i.e. `mode`) used. If padding only involves edge chunks, then NumPy's `pad` is applied to the edge chunks and internal chunks are left untouched. If padding involves some sort of tiling, then the array is sliced up into pieces that orientated and organized as need, which are then combined with the original array using `block`. If the padding involves the computation of some statistics of the array, the statistics are computed and broadcast to match padding, which are then combined with the original array using `block`. If a user defined function is provided, then the array is padded with 0s and `map_blocks` is used (with some `rechunk`ing) to apply the user function and get the resulting array. Some special cases like computation of the `median` and usage of `reflect_type="odd"` are currently not supported. The former could be supported within some reasonable constraints. The latter has somewhat mystifying behavior, which would need to be understood to be implemented.
Appears a bug fix we need for NumPy's `pad` function is added in NumPy 1.11.1. So try bumping to NumPy 1.11.1 to see if that resolves the issue.
|
This is now a pretty complete implementation of Currently we handle all cases except the following.
As With |
|
Will plan on merging this end of day Friday if there are no comments. |
|
Went ahead and merged. That said, if there are any issues, please let me know. We can fix them in a follow-up. |
|
@jakirkham all travis tests are failing. Please revert until fixed. |
|
Saw some failures related to |
Fixes #1926
Fixes #2415
This provides a Dask Array implementation of NumPy's
pad. It dispatches through 1 of 4 different functions depending on the type of padding (i.e.mode) used. If padding only involves edge chunks, then NumPy'spadis applied to the edge chunks and internal chunks are left untouched. If padding involves some sort of tiling, then the array is sliced up into pieces that orientated and organized as need, which are then combined with the original array usingblock. If the padding involves the computation of some statistics of the array, the statistics are computed and broadcast to match padding, which are then combined with the original array usingblock. If a user defined function is provided, then the array is padded with 0s andmap_blocksis used (with somerechunking) to apply the user function and get the resulting array.flake8 dask