-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix the average function when there is a masked array #4236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @dkillick in case you have time to review |
DPeterK
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks @garaud!
|
Thanks! Travis failed on The Python 3.5 build uses numpy 1.11.1 whereas the other build, e.g. 3.6 uses numpy 1.14.1 |
It did... I'm thinking about that right now - I'll get back to you about that soon! |
|
So the failure is being caused by this import of I'm not quite sure what to do about this. The obvious easy fix is to increase the NumPy version in the failing test, but I'm not sure whether that version of NumPy was chosen for a good, but not immediately obvious, reason. Otherwise we need to remove the problem import line, but that too has problems: do we duplicate the @jcrist do you have any thoughts on this? |
|
Perhaps we guard the import on this line? If the Numpy version number is less than that version then presumably the array isn't a dask masked array anyway, so this should be safe? |
|
I think that code duplicate is not a good idea. In that case, I think pragmatic: you can set the minor numpy version to 1.11.2 here https://github.com/dask/dask/blob/master/continuous_integration/travis/install.sh#L68 instead of 1.11 The latest digit is for bug fixes, this change should have a tiny impact. For the other Python versions: 2.7, 3.6 and 3.7, the version of numpy is > 1.11.2. We can do the same for py35. I think we can try a If the minor version of numpy is |
|
I don't personally know why the minimum version is set the way it
is. @jakirkham might have thoughts.
…On Fri, Nov 23, 2018 at 12:04 PM Damien Garaud ***@***.***> wrote:
I think that code duplicate is not a good idea.
In that case, I think pragmatic: you can set the minor numpy version to
1.11.2 here
https://github.com/dask/dask/blob/master/continuous_integration/travis/install.sh#L68
instead of 1.11 The latest digit is for bug fixes, this change should have
a tiny impact.
For the other Python versions: 2.7, 3.6 and 3.7, the version of numpy is >
1.11.2. We can do the same for py35. I think we can try a numpy >= 1.11.2
in the travis build and let's see what's happened.
If the minor version of numpy is 1.11 for some good reasons and we don't
want to upgrade to the minor version 1.11.2, I'll update the PR with the
suggestion of @mrocklin <https://github.com/mrocklin>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4236 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszLcWNLz2n9g8zlQXck85HD0LLxPGks5uyCq7gaJpZM4YwpMA>
.
|
I agree - the only reason I suggested it at all is because that's what NumPy does, I think (ma has a separate |
|
If the numpy version is updated, we also have to update the This change implies removing the support for numpy < |
|
All tests passed with:
|
|
Curious. For unknown reasons, some tests where there are some |
dask/array/routines.py
Outdated
| wgt = wgt.swapaxes(-1, axis) | ||
|
|
||
| # if there is a masked array | ||
| wgt = wgt * (~getmaskarray(a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this here, I'd rather that a thin wrapper around the existing functionality was implemented in the da.ma module. numpy's average function doesn't work for masked arrays, but np.ma.average does.
The best way to handle this might be to extract this full method into a helper method that takes an additional is_masked boolean (or something), and optionally applies this line.
| wgt = wgt * (~getmaskarray(a)) | |
| if is_masked: | |
| from .ma import getmaskarray | |
| wgt = wgt * (~getmaskarray(a)) |
Then you'd define da.average with is_masked set to False, and da.ma.average with is_masked set to True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also fix the numpy version import issue, as this line would only execute if called from within da.ma.average.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the review process, I don't know if it's me or you which click on the "Resolve conversation" button. I left it as it is for now.
This was probably just easy to do at the time. If bumping the patch version helps, I'd recommend doing that. |
|
Sounds good o me
…On Mon, Nov 26, 2018 at 12:38 PM jakirkham ***@***.***> wrote:
I don't personally know why the minimum version is set the way it is.
@jakirkham <https://github.com/jakirkham> might have thoughts.
This was probably just easy to do at the time. If bumping the patch
version helps, I'd recommend doing that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4236 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszGntcAQCpgGUIAtvwsPX32Xz2sBzks5uzCb_gaJpZM4YwpMA>
.
|
|
Thanks for your suggestion @jakirkham ! I'll update it this week. |
The Dask average function does not take into account the masked when the sum of weights is computed.
Sum of weights takes into account the masked array if exists. close dask#3846
e991e7c to
cfd2e01
Compare
add the _average wrapper which shared the average implementation between da.average and da.ma.average
I remove two commits from my branch about the numpy version and the authors.md file which was removed. |
DPeterK
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I like the implementation here with the thin wrapper 👍
|
@jcrist is there anything left here that you think needs to be handled? |
|
Nope, looks good to me. Thanks! |
The Dask average function does not take into account the masked when the sum of weights is computed.
Add a unit test dedicated to this issue.
flake8 daskclose #3846