-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Allow cov & corr to handle missing values #4351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mathause
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I missed this when I reviewed the PR. Anyways, LGTM. Do you want to add a comment to the docstring that NaN values are always skipped? I'd say a skipna option is overkill.
xarray/tests/test_computation.py
Outdated
| ts2 = ts2.where(valid_values) | ||
| # While dropping isn't ideal here, numpy will return nan | ||
| # if any segment contains a NaN. | ||
| ts1 = ts1.where(valid_values, drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not much "nicer" to use
np.ma.cov(
np.ma.masked_invalid(tst1.sel(a=a, x=x).data.flatten()),
np.ma.masked_invalid(tst2.sel(a=a, x=x).data.flatten())
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so familiar with numpy's masking. Will that work?
In [8]: import numpy.ma as ma
...:
...: a = np.arange(5, dtype=float)
...:
...: a[2] = np.NaN
...:
...: a[3] = np.PINF
...:
...: a
Out[8]: array([ 0., 1., nan, inf, 4.])
In [9]: ma.masked_invalid(a)
Out[9]:
masked_array(data=[0.0, 1.0, --, --, 4.0],
mask=[False, False, True, True, False],
fill_value=1e+20)
In [10]: np.cov(_,_)
Out[10]:
array([[nan, nan],
[nan, nan]])There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need np.ma.cov(...). However, we don't skip inf (xr.DataArray([np.inf]).isnull() is False) but masked_invalid does. Thus, np.cov(masked_a, masked_a) will probably not evaluate to the same as your fixed xr.cov(da_a, da_a).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah great, but inf aside, it works! I reckon this is better, I'll make the change
|
BTW: There is a bare test with xarray/xarray/tests/test_computation.py Lines 898 to 899 in a7fb5a9
could you also fix this? |
Sure — though what's wrong with it? |
|
The test is not in a function. I think the link to the test looks like it is indented but it's not... |
My bad ;) |
Ha, no stress @AndrewWilliams3142 ! |
Co-authored-by: keewis <[email protected]>
# Conflicts: # xarray/tests/test_computation.py
|
This is ready to go on green |
|
Lmk any thoughts. I had added some more tests for nd reductions. |
|
Hello @max-sixty! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-24 15:41:53 UTC |
|
Thanks @max-sixty |
Allow reductions with min_count over multiple axes by correctly calculating the product of sizes along all reduced axes. Adds tests to verify min_count now works for multi-dimensional reductions.
isort . && black . && mypy . && flake8whats-new.rstI'm not that excited about how this is tested, very open to alternatives. And the mechanics could take another set of eyes too.