Skip to content

Conversation

@max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Aug 19, 2020

  • Closes NaN in cov & cov? #4349
  • Tests added
  • Passes isort . && black . && mypy . && flake8
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

I'm not that excited about how this is tested, very open to alternatives. And the mechanics could take another set of eyes too.

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I missed this when I reviewed the PR. Anyways, LGTM. Do you want to add a comment to the docstring that NaN values are always skipped? I'd say a skipna option is overkill.

ts2 = ts2.where(valid_values)
# While dropping isn't ideal here, numpy will return nan
# if any segment contains a NaN.
ts1 = ts1.where(valid_values, drop=True)
Copy link
Collaborator

@mathause mathause Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not much "nicer" to use

np.ma.cov(
    np.ma.masked_invalid(tst1.sel(a=a, x=x).data.flatten()),
    np.ma.masked_invalid(tst2.sel(a=a, x=x).data.flatten())
)

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so familiar with numpy's masking. Will that work?

In [8]: import numpy.ma as ma
   ...:
   ...: a = np.arange(5, dtype=float)
   ...:
   ...: a[2] = np.NaN
   ...:
   ...: a[3] = np.PINF
   ...:
   ...: a
Out[8]: array([ 0.,  1., nan, inf,  4.])

In [9]: ma.masked_invalid(a)
Out[9]:
masked_array(data=[0.0, 1.0, --, --, 4.0],
             mask=[False, False,  True,  True, False],
       fill_value=1e+20)

In [10]: np.cov(_,_)
Out[10]:
array([[nan, nan],
       [nan, nan]])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need np.ma.cov(...). However, we don't skip inf (xr.DataArray([np.inf]).isnull() is False) but masked_invalid does. Thus, np.cov(masked_a, masked_a) will probably not evaluate to the same as your fixed xr.cov(da_a, da_a).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great, but inf aside, it works! I reckon this is better, I'll make the change

@mathause
Copy link
Collaborator

BTW: There is a bare test with xr.corr that made its way into test_computation.py:

with raises_regex(TypeError, "Only xr.DataArray is supported"):
xr.corr(xr.Dataset(), xr.Dataset())

could you also fix this?

@max-sixty
Copy link
Collaborator Author

BTW: There is a bare test with xr.corr that made its way into test_computation.py:

with raises_regex(TypeError, "Only xr.DataArray is supported"):
xr.corr(xr.Dataset(), xr.Dataset())

could you also fix this?

Sure — though what's wrong with it?

@mathause
Copy link
Collaborator

The test is not in a function. I think the link to the test looks like it is indented but it's not...

@AndrewILWilliams
Copy link
Contributor

The test is not in a function. I think the link to the test looks like it is indented but it's not...

My bad ;)

@max-sixty
Copy link
Collaborator Author

My bad ;)

Ha, no stress @AndrewWilliams3142 !

@max-sixty
Copy link
Collaborator Author

This is ready to go on green

@max-sixty
Copy link
Collaborator Author

Lmk any thoughts. I had added some more tests for nd reductions.
Otherwise will merge later!

@pep8speaks
Copy link

pep8speaks commented Aug 24, 2020

Hello @max-sixty! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-24 15:41:53 UTC

@max-sixty max-sixty merged commit 1a11d24 into pydata:master Aug 24, 2020
@max-sixty max-sixty deleted the cov-nan branch August 24, 2020 16:39
@dcherian
Copy link
Contributor

Thanks @max-sixty

@max-sixty max-sixty mentioned this pull request Aug 30, 2020
HarshitZom added a commit to HarshitZom/xarray that referenced this pull request Nov 23, 2025
Allow reductions with min_count over multiple axes by correctly
calculating the product of sizes along all reduced axes. Adds tests
to verify min_count now works for multi-dimensional reductions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NaN in cov & cov?

6 participants