-
Notifications
You must be signed in to change notification settings - Fork 300
Description
✨ Feature Request
Enable the fast_percentile_method of the PERCENTILE aggregator to work on masked arrays if mdtol is set to zero.
Motivation
fast_percentile_method is, well, fast. So it would be good to be able to use it in a wider range of contexts. The particular use-case I'm thinking of is where all land points (and no other points) are masked and we aggregate over all the dimensions except the spatial ones. So for the grid cells we are interested in, we have a full (non-masked) set.
Additional context
The fast_percentile_method of the PERCENTILE aggregator uses numpy.percentile, which just ignores any mask, so in general we'd get the wrong values out:
import numpy as np
import numpy.ma as ma
arr = ma.array(np.tile(np.arange(6), 2).reshape(2, 6))
arr[0, 4:] = ma.masked
print(arr)
print(np.percentile(arr, 50, axis=1))[[0 1 2 3 -- --]
[0 1 2 3 4 5]]
/net/project/ukmo/scitools/opt_scitools/conda/deployments/default-2022_04_05/lib/python3.8/site-packages/numpy/lib/function_base.py:4650: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.
arr.partition(
[2.5 2.5]
So we currently just don't allow it to run on masked arrays:
iris/lib/iris/analysis/__init__.py
Lines 1283 to 1286 in 1be3a54
| if fast_percentile_method: | |
| msg = "Cannot use fast np.percentile method with masked array." | |
| if ma.is_masked(data): | |
| raise TypeError(msg) |
However, if mdtol was set to zero, we would always add a mask over those wrong values in the result:
iris/lib/iris/analysis/__init__.py
Lines 588 to 598 in 1be3a54
| if ( | |
| mdtol is not None | |
| and ma.is_masked(data) | |
| and result is not ma.masked | |
| ): | |
| fraction_not_missing = data.count(axis=axis) / data.shape[axis] | |
| mask_update = 1 - mdtol > fraction_not_missing | |
| if ma.isMaskedArray(result): | |
| result.mask = result.mask | mask_update | |
| else: | |
| result = ma.array(result, mask=mask_update) |
or
iris/lib/iris/analysis/__init__.py
Line 1197 in 1be3a54
| def _build_dask_mdtol_function(dask_stats_function): |
So I think it would be safe to relax the check so that this method can still run if mdtol is zero.