-
-
Notifications
You must be signed in to change notification settings - Fork 12k
BUG,MAINT: Ensure masked elements can be tested against nan and inf. #11122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG,MAINT: Ensure masked elements can be tested against nan and inf. #11122
Conversation
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a full docstring. I think the original name is a bit better. Maybe change the signature to (x, y, func, hasval='nan'). Do we really need both hasval and func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func does not always have a name (used with a lambda below to test for +/- inf). I'll change to func_chk_same_pos.
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this function - can't it just be written return x[~flag], y[~flag]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, x or y can be a scalar while the other is an array. This is why the flags were treated separately before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is avoiding arr_0d[bool_nd], right? Or is it about avoiding non_generic_scalar[bool_0d]?
Would x, y = np.broadcast_arrays(x, y) be off the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not crazy, but I'd rather stick with what I have here...
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like micro-optimization here - can't we just write in place of these 7 lines:
to_remove = xy_isinf | xy_isnan
x = x[~to_remove]
y = y[~to_remove]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't work for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was that xy_isinf could just be False, in which case the flags would not broadcast correctly. But of course, this is easily checked in remove_flagged, so I did that now.
4237e61 to
d324b60
Compare
|
@charris, @eric-wieser - thanks for the comments; I made some changes which I think make the use clearer. |
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd use [] here for a size-0 array - I'm used to seeing () for a 0dim array, so this is a little jarring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python or numpy scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy, I'll add to the comment.
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we spell this check instead of chk? Or even assert_same_predicate_pos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, assert_same_pos
d324b60 to
c1544bf
Compare
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is backward - you should return the scalar boolean mask here, not the array one (which would at this point be equal to the scalar at every position anyway). That allows you to remove all of your special cases in remove_flagged (and replace it with the one-liner I suggest below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, you are completely right - that gets rid of remove_flagged altogether. Now implemented.
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.isposinf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learn something new every day...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doesn't work for complex numbers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does what you have here work for complex numbers?
c1544bf to
46324d1
Compare
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to remove this function now
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this attribute guaranteed to exist? As long as x and y are numpy scalars, then you don't need this guard at all - scalar[bool_scalar] works just fine.
Plus then x and y are always 1d, so you can get rid of the isinstance(val, bool) and ravel() below too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can be False -- initialized as such. I tried changing, but cannot get rid of the if flagged stanza, nor of the isinstance - I suggest we leave this to another PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just initialize it to flagged = np.False_, to at least eliminates the getattr). I'd then spell that if flagged.ndim != 0 for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but I do feel that I'm then decreasing the clarity of the rest of the code - at least in my emacs, False is coloured very nicely, and false_ is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that keeping a variable type consistent is more important than having the color look like in your editor.
I tried changing, but cannot get rid of the if flagged stanza
I'm gonna do a checkout to see what I'm missing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about bool_(False), which meets both requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see why the 0d case still has special handling - I'd forgotten that arr[bool_0d].shape == (?,) + arr.shape, and is not 1D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or an alternate spelling - np.ndim(flagged) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like the ndim - that is much clearer why we do it. Pushed up.
|
Does the inf-checking behave on complex numbers? |
46324d1 to
ceb6e5d
Compare
|
On the |
|
Complex number checking is out of scope here - that's fine. Maybe open an issue about it if you can produce a test-case |
|
Cannot immediately produce one. Partially because of an oddity: But even accounting for that, no, I cannot seem to break it! |
|
Interesting oddity there. I suppose you have to use |
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this is clearer as x_id.ndim == 0 rather than not x_id.shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
|
Just the nits about |
ceb6e5d to
66d22aa
Compare
The removal of nan and inf from arrays that are compared using test routines like assert_array_equal treated the two arrays separately, which for masked arrays meant that some elements would not be removed when they should have been. This PR corrects this.
66d22aa to
5718b33
Compare
|
Took the liberty of adding and explicit |
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these two lines needed? Aren't they covered by the below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd think not, but numerous recursion failures if I leave them out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what happens is that it does recurse, but the if statement prevents it from doing it more than once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is simple to solve - just use any - will push fix up shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't work either, but passing in equal_[nan|inf]=False does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment saying that this is needed to prevent recursion or something?
Good to go either way
21dd5aa to
34bc491
Compare
|
What a pain! But finally got it working with a more sensible implementation. Turns out I was partially misled by test that failed because they used subclasses that were not written correctly (one by me...). |
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the cast to bool to the scalar case below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to cast both x_id and y_id, but that can indeed be done at the end. That does look a bit nicer, so done.
numpy/lib/tests/test_ufunclike.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe in the commit message why these changes were needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This was indeed a non-trivial one - I had to look again to remind me why it was needed. I wondered about the use of stating it in the commit message until I recalled that with git blame one can find why this was added.
34bc491 to
7eb1533
Compare
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably clearer as just a normal if / else chain at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; done.
numpy/testing/_private/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I agree with this - what if x and y are quantities of different dimensions? The test should still compare the empty arrays so that the correct error is raised, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same case as the if x.size == 0 stanza above, but for the case where x and y are both scalar. In principle, I agree the comparison should work for empty elements, but in practice, this is not the case (e.g., matrix changing its shape being one problem...)
Since this was here before, I propose to leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i agree this is out of scope for the patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it sadly remains a chunk of rather fragile code...
|
Sorry for dropping the ball on this one. Would be great to get this into 1.15. |
This brought to light two bugs in tests, which are fixed here, viz., that a sample ndarray subclass that tested propagation of an added parameter was incomplete, in that in propagating the parameter in __array_wrap__ it assumed it was there on self, but that assumption could be broken when a view of self was taken (as is done by x[~flagged] in the test routine), since there was no __array_finalize__ defined. The other subclass bug counted, incorrectly, on only needing to provide one type of comparison, the __lt__ being explicitly tested. But flags are compared with __eq__ and those flags will have the same subclass.
7eb1533 to
3ad49aa
Compare
|
Out of curiosity, how many of the affected tests are already shadowed in |
|
Thanks Marten. |
|
Wow thanks @mhvk and the rest, this indeed was not trivial! |
| # compared usefully, and for which .all() yields masked. | ||
| x_id = func(x) | ||
| y_id = func(y) | ||
| if (x_id == y_id).all() != True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm,
In [10]: ma.masked != True
Out[10]: masked
In [11]: ma.masked == True
Out[11]: masked
In [12]: bool(ma.masked)
Out[12]: False
@mhvk The camparison to True seems to achieve nothing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose is to ensure that the stanza is not executed if x_id or y_id is masked, yet is executed if they are different.
The removal of nan and inf from arrays that are compared using test routines like assert_array_equal treated the two arrays separately, which for masked arrays meant that some elements would not be removed when they should have been. This PR corrects this.
fixes #11121