ENH: make some masked array methods behave more like ndarray methods #5706

ahaldane · 2015-03-22T03:41:57Z

I recently learned about masked arrays and was trying them out in a project, and found that the masked array functions mean, var, sum, amax and similar do not behave quite the same as their ndarray counterparts.

They don't support multiple axes like ndarrays, eg

>>> a = np.random.rand(4,4)
>>> a.mean(axis=(0,1)) # works
>>> b = np.ma.array(a, mask=np.random.rand(4,4)<0.5)
>>> b.mean(axis=(0,1)) # fails with IndexError

they don't have a keepdims argument, and looking at the code the behavior is a little different.

In this PR I modified np.ma.mean, np.ma.sum and np.ma.count to behave more like ndarray methods, as proof of concept.

Does it look like a good idea, and is the code OK so far? If so I'll do the same thing for the rest of the functions (var, sum etc).

Note that I modified a test to get it to pass: I removed the test that the return type of np.ma.count was of type np.intp. The return type can now be an ndarray (for partial axes) or an int. The return type was discussed in #4698, and I think the problem at that time was that arr.size could return variable types, but I removed dependence on arr.size so I don't think that's a problem any more.

charris · 2015-03-22T15:58:06Z

Closing and reopening to see if the tests can be rejiggered.

charris · 2015-03-22T15:59:16Z

Yep, that works. Restarting the tests doesn't have the same effect.

charris · 2015-03-22T16:07:08Z

Try again.

ahaldane · 2015-03-23T01:05:16Z

Updated np.ma.mean to treat dtype parameter more carefully (like in np.mean).

Also I noticed that many ma operations will show a warning if you perform an invalid operation on an element even if that element is masked. Eg,

>>> np.log(np.ma.array([nan, 1, 2], mask=[True, False, False]))
>>> np.log(np.ma.array([0, 1, 2], mask=[True, False, False]))

will raise a warning, which doesn't seem right. (The first of these also fails when using np.ma.log, but not the second). I started to fix this by disabling warnings during the domain calculation. I don't think I'm disabling any legitimate warnings.

But actually the correct fix may instead be to do the domain check with the masked array, not the raw data. However, in the case of _DomainSafeDivide this isn't currently possible since it converts to ndarrays, but that is a recent change that might be wrong.

abalkin · 2015-03-23T01:36:01Z

I recently learned about masked arrays and was trying them out in a project, and found that the masked array functions mean, var, sum, amax and similar do not behave quite the same as their ndarray counterparts.

Keeping masked arrays interfaces in sync with ndarray has been a loosing battle in the past because most of the people who enhance ndarray either don't know or don't care about masked arrays.

The things only got worse when masked array were reimplemented (over objections of original designers) to inherit from ndarray. After that, any new method added to ndarray has automatically became a broken method of masked arrays. Case in point: the dot method. See #5185.

I can see only one way to fix this situation for the long term: add a test that would introspect andarray and masked arrays methods and make sure they match. Unfortunately, introspecting ndarray methods are not always possible because some of them are implemented in C, but in those cases we can probably parse the docstring to extract the signature. See #4356.

Finally, I think this issue overlaps with #4537.

abalkin · 2015-03-23T01:46:51Z

numpy/ma/core.py

It is more idiomatic to use ... when assigning to the entire array. Empty tuple indexing should only be used in generic code where ndim can happen to be 0. Literal a[()] is confusing and unnecessary.

On the second thought, what was wrong with the old code?

hmm you're right using .flat works. I was worried that a non-1d newmask would cause problems.

I still think outmask[...] = is better (thanks for explaining () vs ...). I guess I am biased against .flat after reading some other devs say they didn't like it!

I always found assigning to .flat confusing, so as someone who might read your code at some point in the future I'd suggest keeping your change here.

FWIW, I only objected to the use of [()]. Getting rid of .flat is an improvement, I was just curious if that change was necessary to implement keepdims.

In the end I don't think I can replace .flat = with [...] =.

Using .flat is currently necessary for interopreability of ma and matrices. See #4585 (comment), #4615.

Eg, if you sum a 3x3 masked-matrix along the 2nd axis, the resulting data shape is (3,1), but the mask shape becomes (3,) (since mask is an ndarray). [...] = fails when trying to assign a (3,) shape to a (3,1) shape, but .flat = doesn't.

If we really want to get rid of a.flat = b here, something like this might be a replacement:

a[...] = b.reshape(a.shape) if not np.isscalar(b) else b

but that seems a bit ugly.

@ahaldane - ah, bitten by matrices again... Yes, in that case, I would also just stick with .flat.

mhvk · 2015-03-23T15:55:28Z

In my opinion, subclassing from ndarray was a logical thing to do, and not bad in itself, though from my trials to get MaskedArray to work with astropy's Quantity (#3907, #4586, #4617), I would have to say it was not done quite carefully enough. In #3907 (comment) @charris suggested an overhaul using __numpy_ufunc__ -- I think that would indeed be the way forward (though don't let that detract from at least getting some things work better!).

charris · 2015-03-25T17:52:26Z

numpy/ma/core.py

At some point I'd really like to get rid of most of these imports, plain old np.* seems to work in most cases.

charris · 2015-03-25T18:04:00Z

Might want to look at the corresponding nan functions, I expect there is some commonality. There needs to be tests for all the new functionality, might be able to copy some of those from the nanfunction tests also. Making the masked functions conform is good. We are looking for a masked array maintainer if anyone is interested in pursuing that ;)

ahaldane · 2015-03-27T19:13:11Z

@mhvk, @charris Using __numpy_ufunc__ sounds like a good idea. I'll keep it in mind, but I don't think I'll touch that here just to keep things manageable.

ahaldane · 2015-03-28T21:32:08Z

Updated. Summary so far:

reworked count method (to allow multiple axes)
added keepdims arg to all, any, sum, prod, mean, var, std, min, max
added (unused) arg to same methods in matrix class
updated docstring for those funcs, plus cumsum, cumprod, std, round.
rewrote internals of mean and var. (var was pretty buggy!)
removed anom method, which was just used in var before
hid warnings during domain calculations

I haven't tested it much yet.

By the way, for a separate PR, I noticed the ndarray doctrings are missing keepdim args. Eg in ndarray.mean and others.

I also notices that the C-Api function PyArray_Mean is implemented (in calculations.c) quite differently from ndarray.mean (in core._methods.py). (and var and others too) Probably it should simply call array_mean.

ahaldane · 2015-03-29T00:53:50Z

More changes: I got burned due to the np.ma.masked constant being writeable, which caused some tests to fail depending on the order the tests were run.

So I made np.ma.masked readonly, which uncovered a bug in np.ma.ptp, which (for now) I fixed with some modifications of the MaskedConstant class, but probably something nicer can be done.

ahaldane · 2015-03-29T00:54:40Z

Test failure is spurious, happens because http://www.rabbitmq.com is down.

mhvk · 2016-03-30T18:53:01Z

Apart from the two small comments (one of which can be ignored if you wish), to me this seems all ready to go in. Sorry for it having been such a long slog...

seberg · 2016-03-30T19:30:49Z

Seems like this is a bit cleaned up/less compared to before. I will try to look over it again sooner rather then later, so hopefully by the weekend, don't feel like it now, but poke me if I don't (though if someone else beats me to it, would be more than happy), and since @mhvk already reviewed it, I think it should be good anyway. That way the changes can hopefully sit at least a little bit on master to notice oddities.

mhvk · 2016-03-30T20:39:33Z

@ahaldane - OK, all seems fine now!

seberg · 2016-04-02T14:03:24Z

numpy/ma/core.py

Are single ticks OK here (honestly not sure)? The sentence reads a bit funny. "is to perform" or "performs" sound fine to me, though who knows english has some weird grammar ;).

According to the docs as I read them, single-backticks go around variable, module, function, and class names, while double-backticks go around inline code. So I think it's right here. (Edit: and note that this docstring was copied from the np.any docstring, though slightly mangled)

I'll fix the grammar though.

seberg · 2016-04-02T14:31:28Z

Just some silly questions/nitpicks. Then I guess I will put it in and hope nothing annoying crops up (with MA I won't try to guess).

Updated any, all, sum, prod, cumsum, cumprod, min, max, argmin, argmax, mean, var

ahaldane · 2016-04-04T18:31:20Z

Updated.

count now raises ValueError for 0-d arrays if axis > 0
fixed grammar in average and count docstrings.
I think backtick usage in docstrings is correct
fixed read-only return value of scl in average
added keepdims argument to _check_mask_axis, and used it everywhere

seberg · 2016-04-04T21:00:23Z

OK, lets give it a shot. Thanks a lot @ahaldane was a long work....

ahaldane · 2016-04-04T21:22:05Z

Whew! Good thing I had some tenacious reviewers to help, thanks both of you.

Follow up to numpy#5706. Fixes numpy#7509

ahaldane force-pushed the ma_methods_args branch 3 times, most recently from d11dbf5 to 3cd5514 Compare March 22, 2015 05:28

charris closed this Mar 22, 2015

charris reopened this Mar 22, 2015

charris closed this Mar 22, 2015

charris reopened this Mar 22, 2015

ahaldane force-pushed the ma_methods_args branch from 3cd5514 to dff1dce Compare March 23, 2015 01:03

abalkin reviewed Mar 23, 2015
View reviewed changes

charris reviewed Mar 25, 2015
View reviewed changes

ahaldane mentioned this pull request Mar 27, 2015

ENH: Let MaskedArray getter, setter respect baseclass overrides #4586

Merged

charris added 01 - Enhancement component: numpy.ma masked arrays labels Mar 28, 2015

ahaldane force-pushed the ma_methods_args branch from dff1dce to 9baf7a9 Compare March 28, 2015 21:30

ahaldane force-pushed the ma_methods_args branch 4 times, most recently from 6814cfa to 31afad5 Compare March 29, 2015 00:44

ahaldane force-pushed the ma_methods_args branch from ef527a1 to 451614d Compare March 30, 2016 19:18

seberg reviewed Apr 2, 2016
View reviewed changes

ahaldane force-pushed the ma_methods_args branch from 451614d to a455a22 Compare April 4, 2016 16:59

ahaldane mentioned this pull request Apr 4, 2016

MAIN: fix to #7382, make scl in np.average writeable #7505

Merged

ahaldane added 3 commits April 4, 2016 13:20

ENH: add extra kwargs and update doc of many MA methods

36f76ea

Updated any, all, sum, prod, cumsum, cumprod, min, max, argmin, argmax, mean, var

ENH: update MA average, median

f1c3521

TST: Unit tests for new kwd args in MA methods

798dd4f

ahaldane force-pushed the ma_methods_args branch from a455a22 to 798dd4f Compare April 4, 2016 17:20

seberg merged commit c6e65b7 into numpy:master Apr 4, 2016

ev-br mentioned this pull request Apr 5, 2016

a possible MA regression #7509

Closed

ahaldane added a commit to ahaldane/numpy that referenced this pull request Apr 5, 2016

BUG: MaskedArray.count treats negative axes incorrectly

5ba2007

Follow up to numpy#5706. Fixes numpy#7509

ahaldane mentioned this pull request Apr 5, 2016

BUG: MaskedArray.count treats negative axes incorrectly #7515

Merged

This was referenced Jun 13, 2016

Masked arrays: keepdims is not enforced in numpy.ma.amax #7720

Closed

Backport 5706, ENH: add extra kwargs and update doc of many MA methods #7738

Merged

charris pushed a commit to charris/numpy that referenced this pull request Jun 30, 2016

BUG: MaskedArray.count treats negative axes incorrectly

b775376

Follow up to numpy#5706. Fixes numpy#7509

charris mentioned this pull request Jun 30, 2016

Backport 7515, BUG: MaskedArray.count treats negative axes incorrectly #7793

Merged

Uh oh!

ENH: make some masked array methods behave more like ndarray methods #5706

ENH: make some masked array methods behave more like ndarray methods #5706

Uh oh!

Conversation

ahaldane commented Mar 22, 2015

Uh oh!

charris commented Mar 22, 2015

Uh oh!

charris commented Mar 22, 2015

Uh oh!

charris commented Mar 22, 2015

Uh oh!

ahaldane commented Mar 23, 2015

Uh oh!

abalkin commented Mar 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk commented Mar 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented Mar 25, 2015

Uh oh!

ahaldane commented Mar 27, 2015

Uh oh!

ahaldane commented Mar 28, 2015

Uh oh!

ahaldane commented Mar 29, 2015

Uh oh!

ahaldane commented Mar 29, 2015

Uh oh!

mhvk commented Mar 30, 2016

Uh oh!

seberg commented Mar 30, 2016

Uh oh!

mhvk commented Mar 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Apr 2, 2016

Uh oh!

ahaldane commented Apr 4, 2016

Uh oh!

seberg commented Apr 4, 2016

Uh oh!

ahaldane commented Apr 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants