-
-
Notifications
You must be signed in to change notification settings - Fork 12k
ENH: make some masked array methods behave more like ndarray methods #5706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d11dbf5 to
3cd5514
Compare
|
Closing and reopening to see if the tests can be rejiggered. |
|
Yep, that works. Restarting the tests doesn't have the same effect. |
|
Try again. |
3cd5514 to
dff1dce
Compare
|
Updated Also I noticed that many ma operations will show a warning if you perform an invalid operation on an element even if that element is masked. Eg, will raise a warning, which doesn't seem right. (The first of these also fails when using But actually the correct fix may instead be to do the domain check with the masked array, not the raw data. However, in the case of |
Keeping masked arrays interfaces in sync with ndarray has been a loosing battle in the past because most of the people who enhance ndarray either don't know or don't care about masked arrays. The things only got worse when masked array were reimplemented (over objections of original designers) to inherit from ndarray. After that, any new method added to ndarray has automatically became a broken method of masked arrays. Case in point: the dot method. See #5185. I can see only one way to fix this situation for the long term: add a test that would introspect andarray and masked arrays methods and make sure they match. Unfortunately, introspecting ndarray methods are not always possible because some of them are implemented in C, but in those cases we can probably parse the docstring to extract the signature. See #4356. Finally, I think this issue overlaps with #4537. |
numpy/ma/core.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more idiomatic to use ... when assigning to the entire array. Empty tuple indexing should only be used in generic code where ndim can happen to be 0. Literal a[()] is confusing and unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the second thought, what was wrong with the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm you're right using .flat works. I was worried that a non-1d newmask would cause problems.
I still think outmask[...] = is better (thanks for explaining () vs ...). I guess I am biased against .flat after reading some other devs say they didn't like it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always found assigning to .flat confusing, so as someone who might read your code at some point in the future I'd suggest keeping your change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I only objected to the use of [()]. Getting rid of .flat is an improvement, I was just curious if that change was necessary to implement keepdims.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end I don't think I can replace .flat = with [...] =.
Using .flat is currently necessary for interopreability of ma and matrices. See #4585 (comment), #4615.
Eg, if you sum a 3x3 masked-matrix along the 2nd axis, the resulting data shape is (3,1), but the mask shape becomes (3,) (since mask is an ndarray). [...] = fails when trying to assign a (3,) shape to a (3,1) shape, but .flat = doesn't.
If we really want to get rid of a.flat = b here, something like this might be a replacement:
a[...] = b.reshape(a.shape) if not np.isscalar(b) else b
but that seems a bit ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahaldane - ah, bitten by matrices again... Yes, in that case, I would also just stick with .flat.
|
In my opinion, subclassing from |
numpy/ma/core.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I'd really like to get rid of most of these imports, plain old np.* seems to work in most cases.
|
Might want to look at the corresponding nan functions, I expect there is some commonality. There needs to be tests for all the new functionality, might be able to copy some of those from the nanfunction tests also. Making the masked functions conform is good. We are looking for a masked array maintainer if anyone is interested in pursuing that ;) |
dff1dce to
9baf7a9
Compare
|
Updated. Summary so far:
I haven't tested it much yet. By the way, for a separate PR, I noticed the ndarray doctrings are missing keepdim args. Eg in I also notices that the C-Api function |
6814cfa to
31afad5
Compare
|
More changes: I got burned due to the So I made |
|
Test failure is spurious, happens because http://www.rabbitmq.com is down. |
|
Apart from the two small comments (one of which can be ignored if you wish), to me this seems all ready to go in. Sorry for it having been such a long slog... |
ef527a1 to
451614d
Compare
|
Seems like this is a bit cleaned up/less compared to before. I will try to look over it again sooner rather then later, so hopefully by the weekend, don't feel like it now, but poke me if I don't (though if someone else beats me to it, would be more than happy), and since @mhvk already reviewed it, I think it should be good anyway. That way the changes can hopefully sit at least a little bit on master to notice oddities. |
|
@ahaldane - OK, all seems fine now! |
numpy/ma/core.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are single ticks OK here (honestly not sure)? The sentence reads a bit funny. "is to perform" or "performs" sound fine to me, though who knows english has some weird grammar ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs as I read them, single-backticks go around variable, module, function, and class names, while double-backticks go around inline code. So I think it's right here. (Edit: and note that this docstring was copied from the np.any docstring, though slightly mangled)
I'll fix the grammar though.
|
Just some silly questions/nitpicks. Then I guess I will put it in and hope nothing annoying crops up (with MA I won't try to guess). |
451614d to
a455a22
Compare
Updated any, all, sum, prod, cumsum, cumprod, min, max, argmin, argmax, mean, var
a455a22 to
798dd4f
Compare
|
Updated.
|
|
OK, lets give it a shot. Thanks a lot @ahaldane was a long work.... |
|
Whew! Good thing I had some tenacious reviewers to help, thanks both of you. |
Follow up to numpy#5706. Fixes numpy#7509
Follow up to numpy#5706. Fixes numpy#7509
I recently learned about masked arrays and was trying them out in a project, and found that the masked array functions
mean,var,sum,amaxand similar do not behave quite the same as their ndarray counterparts.They don't support multiple axes like ndarrays, eg
they don't have a
keepdimsargument, and looking at the code the behavior is a little different.In this PR I modified
np.ma.mean,np.ma.sumandnp.ma.countto behave more like ndarray methods, as proof of concept.Does it look like a good idea, and is the code OK so far? If so I'll do the same thing for the rest of the functions (
var,sumetc).Note that I modified a test to get it to pass: I removed the test that the return type of
np.ma.countwas of typenp.intp. The return type can now be an ndarray (for partial axes) or an int. The return type was discussed in #4698, and I think the problem at that time was thatarr.sizecould return variable types, but I removed dependence onarr.sizeso I don't think that's a problem any more.