Skip to content

Taking median (np.ma.median) of a list of masked arrays. #10757

@lokhorst

Description

@lokhorst

When trying to take a median of a list of masked arrays (using np.ma.median),
the masks are ignored and the median of the unmasked arrays is returned. An example of this is below:

In [1]: import numpy as np
In [2]: data1 = np.array([[1,2,3,4],[5,6,7,8]])
In [3]: masked1 = np.ma.masked_where(data1 == 4, data1)
In [4]: data2 = np.array([[8,7,6,5],[4,3,2,1]])
In [5]: masked2 = np.ma.masked_where(data2==4,data2)
In [8]: list = [masked1,masked2]
In [9]: list
Out[9]: 
[masked_array(data =
  [[1 2 3 --]
  [5 6 7 8]],
              mask =
  [[False False False  True]
  [False False False False]],
        fill_value = 999999), masked_array(data =
  [[8 7 6 5]
  [-- 3 2 1]],
              mask =
  [[False False False False]
  [ True False False False]],
        fill_value = 999999)]
In [10]: np.ma.median(list,axis=0).data
Out[10]: 
array([[ 4.5,  4.5,  4.5,  4.5],
       [ 4.5,  4.5,  4.5,  4.5]])

but the output of np.ma.median(list,axis=0).data should actually be:

array([[ 4.5,  4.5,  4.5,  5],
       [ 5,  4.5,  4.5,  4.5]])

because the np.ma.median function should be ignoring the data values that are masked.

I have since found a workaround for this that works for me, which is to create the mask after making an array of arrays, e.g.:

In [16]: dataarray = np.array([data1,data2])
In [19]: maskedarray = np.ma.masked_where(dataarray==4,dataarray)
In [20]: maskedarray
Out[20]: 
masked_array(data =
 [[[1 2 3 --]
  [5 6 7 8]]

 [[8 7 6 5]
  [-- 3 2 1]]],
             mask =
 [[[False False False  True]
  [False False False False]]

 [[False False False False]
  [ True False False False]]],
       fill_value = 999999)
In [21]: np.ma.median(maskedarray,axis=0).data
Out[21]: 
array([[ 4.5,  4.5,  4.5,  5. ],
       [ 5. ,  4.5,  4.5,  4.5]])

But this was a bug in my code for years (ah!) and because it did not return an error, I never knew that it was not considering the masks when it was taking the medians of the data.

Is there a way to change the np.ma.median code so that if one tries to take the median of a list of masked arrays it at least will return an error so that future coders know that it is not working the way they assume it will work?

Thanks all!

Versions:
numpy: 1.11.2/1.11.3
python: 2.7.12/2.7.13
on MacOSX

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions