-
-
Notifications
You must be signed in to change notification settings - Fork 12k
Description
When trying to take a median of a list of masked arrays (using np.ma.median),
the masks are ignored and the median of the unmasked arrays is returned. An example of this is below:
In [1]: import numpy as np
In [2]: data1 = np.array([[1,2,3,4],[5,6,7,8]])
In [3]: masked1 = np.ma.masked_where(data1 == 4, data1)
In [4]: data2 = np.array([[8,7,6,5],[4,3,2,1]])
In [5]: masked2 = np.ma.masked_where(data2==4,data2)
In [8]: list = [masked1,masked2]
In [9]: list
Out[9]:
[masked_array(data =
[[1 2 3 --]
[5 6 7 8]],
mask =
[[False False False True]
[False False False False]],
fill_value = 999999), masked_array(data =
[[8 7 6 5]
[-- 3 2 1]],
mask =
[[False False False False]
[ True False False False]],
fill_value = 999999)]
In [10]: np.ma.median(list,axis=0).data
Out[10]:
array([[ 4.5, 4.5, 4.5, 4.5],
[ 4.5, 4.5, 4.5, 4.5]])but the output of np.ma.median(list,axis=0).data should actually be:
array([[ 4.5, 4.5, 4.5, 5],
[ 5, 4.5, 4.5, 4.5]])
because the np.ma.median function should be ignoring the data values that are masked.
I have since found a workaround for this that works for me, which is to create the mask after making an array of arrays, e.g.:
In [16]: dataarray = np.array([data1,data2])
In [19]: maskedarray = np.ma.masked_where(dataarray==4,dataarray)
In [20]: maskedarray
Out[20]:
masked_array(data =
[[[1 2 3 --]
[5 6 7 8]]
[[8 7 6 5]
[-- 3 2 1]]],
mask =
[[[False False False True]
[False False False False]]
[[False False False False]
[ True False False False]]],
fill_value = 999999)
In [21]: np.ma.median(maskedarray,axis=0).data
Out[21]:
array([[ 4.5, 4.5, 4.5, 5. ],
[ 5. , 4.5, 4.5, 4.5]])
But this was a bug in my code for years (ah!) and because it did not return an error, I never knew that it was not considering the masks when it was taking the medians of the data.
Is there a way to change the np.ma.median code so that if one tries to take the median of a list of masked arrays it at least will return an error so that future coders know that it is not working the way they assume it will work?
Thanks all!
Versions:
numpy: 1.11.2/1.11.3
python: 2.7.12/2.7.13
on MacOSX