Skip to content

Conversation

@mark-boer
Copy link
Contributor

This PR makes a couple of small improvements to dask.array.pad

  • Fixes bug of mode="mean", where it would change dtype to float un-numpy like behavior of da.pad  #6127
  • Add default value for mode
  • Refactor error message if invalid kwargs are passed to pad function. Uses some of the same code from the numpy project.
  • Tests added / passed
  • Passes black dask / flake8 dask

This PR does not address the issue with wrap, reflect and symmetric. That will require a rewrite of pad_reuse, I'll have a look at this in a next PR.

Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mark-boer, overall this looks good to me.

result_idx = broadcast_to(result_idx, pad_shape, chunks=pad_chunks)

if mode == "mean":
result_idx = _round_if_needed(result_idx, array.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _round_if_needed is only used here and is rather short, I have a small preference to inline it here. There is a maintenance and understanding cost to many-small-methods (see http://matthewrocklin.com/blog/work/2019/06/23/avoid-indirection).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no worries. I saw this function in the numpy code and I copied the basic functionality.

def pad_stats(array, pad_width, mode, *args):
def _round_if_needed(arr, dtype):
if np.issubdtype(dtype, np.integer):
rint(arr, out=arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight preference to avoid using out=arr here. It doesn't make things more efficient, and I find it harder to read (I have to know that rint is a ufunc, and that the out parameter means I'm mutating arr).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether to use the out=arr functionality. Personally I don't know what the performance gain is to doing it this way. I'll just use the out=rint(arr) instead.

@mark-boer
Copy link
Contributor Author

Thx, for the quick review. I changed your suggestions.

@jcrist jcrist merged commit b6e79b4 into dask:master May 15, 2020
@jcrist
Copy link
Member

jcrist commented May 15, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants