Skip to content

Conversation

@aulemahal
Copy link
Contributor

@aulemahal aulemahal commented Jul 2, 2020

  • Closes Polyfit fails with few non-NaN values #4190
  • Tests added
  • Passes isort -rc . && black . && mypy . && flake8
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

Fixes #4190. In cases where the input matrix had a deficient rank (matrix rank != order) because of the number of NaN values, polyfit would fail, simply because numpy's lstsq returned an empty array for the residuals (instead of a size 1 array). This fixes the problem by catching the case and returning np.nan instead.

The other point in the issue was that RankWarning is also not raised in that case. That was due to the fact that da.polyfit was computing the rank from the coordinate (Vandermonde) matrix, instead of the masked data. Thus, is a given line has too many NaN values, its deficient rank was not detected. I added a test and warning at all places where a rank is computed (5 different lines). Also, to match np.polyfit behaviour of no warning when full=True, I changed the warning filters using a context manager, ignoring the RankWarning in that case. Overall, it feels a bi ugly because of the duplicated code and it will print the warning for every line of an array that has a deficient rank, which can be a lot...

@mathause
Copy link
Collaborator

Overall, it feels a bi ugly because of the duplicated code

Looks fine to me.

and it will print the warning for every line of an array that has a deficient rank, which can be a lot...

That could be annoying as a user. This happens in results = da.apply_along_axis(nputils._nanpolyfit_1d, ...), right? Can you use a warnings.simplefilter("once") so the error is only issued once - something along the lines of:

with warnings.catch_warnings():
    warnings.simplefilter("once", np.RankWarning)
    results = da.apply_along_axis(
        nputils._nanpolyfit_1d,
        ...
    )

I think this also warrants tests and a whats-new entry.

@aulemahal
Copy link
Contributor Author

aulemahal commented Aug 14, 2020

So while adding tests, I realized there were more bugs concerning deficient rank matrices and full=True output. I believe I fixed all I could find.
However, I was not able to reduce the number of warnings for the case using dask : the computation occurs outside of the catch_warnings() context...
Also, there is a bug in the dask.array.linalg.lstsq output (see issue dask/dask#6516), so if full=True we must use skipna=True with dask. Same when rank != order (deficient rank), dask.array.linalg.lstsq will fail, so we work around by using our (slower) nan-skipping method.

@mathause
Copy link
Collaborator

Pity that the dask warnings are not caught but I guess there's not much we can do about that.

Does this now work for the originally posted issue? Could you add it as a test?

@aulemahal
Copy link
Contributor Author

@mathause Fixed the tests to catch the warning and added a case for the specific of the issue : too many nans causing a rank deficiency.

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wait a few more days before merging to see if someone else has comments.

@mathause mathause merged commit efabe74 into pydata:master Aug 20, 2020
@mathause
Copy link
Collaborator

thanks @aulemahal!

@aulemahal aulemahal deleted the fix-4190 branch August 20, 2020 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Polyfit fails with few non-NaN values

3 participants