Skip to content

ds.mean('dim') drops strings dataarrays, even when the 'dim' is not dimension of the string dataarray #5368

@rcaneill

Description

@rcaneill

What happened:

I have a dataset along many dimensions, e.g. time and experiments. Some of the dataarrays only contain strings and only have the experiments as dimension. When I use ds.mean('time') I lose these dataarrays.

What you expected to happen:

I would expect that the mean on the full dataset would be similar that what happends with float dataarrays that don't contain the dimension: just return them unchanged. As shown in the minimal example attached, using ds.min produced the result I would expect.

Minimal Complete Verifiable Example:

Here da1 corresponds to my dataarrays that are lost. da3 produced what I would expect (same result no matter the data type).

import xarray as xr

da1 = xr.DataArray(['a','b']).rename('da1')
print(da1, '\n')

da3 = xr.DataArray([-1, -2]).rename('da3')
print(da3, '\n')

da2 = xr.DataArray([[0,1],[2,3]]).rename('da2')
print(da2, '\n')

ds = xr.merge([da1,da2,da3])
print(ds, '\n')

print('mean:', ds.mean('dim_1'))
print('min:', ds.min('dim_1'))

And the output is:

<xarray.DataArray 'da1' (dim_0: 2)>
array(['a', 'b'], dtype='<U1')
Dimensions without coordinates: dim_0 

<xarray.DataArray 'da3' (dim_0: 2)>
array([-1, -2])
Dimensions without coordinates: dim_0 

<xarray.DataArray 'da2' (dim_0: 2, dim_1: 2)>
array([[0, 1],
       [2, 3]])
Dimensions without coordinates: dim_0, dim_1 

<xarray.Dataset>
Dimensions:  (dim_0: 2, dim_1: 2)
Dimensions without coordinates: dim_0, dim_1
Data variables:
    da1      (dim_0) <U1 'a' 'b'
    da2      (dim_0, dim_1) int64 0 1 2 3
    da3      (dim_0) int64 -1 -2 

mean: <xarray.Dataset>
Dimensions:  (dim_0: 2)
Dimensions without coordinates: dim_0
Data variables:
    da2      (dim_0) float64 0.5 2.5
    da3      (dim_0) int64 -1 -2
min: <xarray.Dataset>
Dimensions:  (dim_0: 2)
Dimensions without coordinates: dim_0
Data variables:
    da1      (dim_0) <U1 'a' 'b'
    da2      (dim_0) int64 0 2
    da3      (dim_0) int64 -1 -2

I searched in the opened issues but haven't seen any similar one. I hope I did not miss anything there nor in the doc.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-50-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.18.0
pandas: 1.2.4
numpy: 1.20.3
scipy: 1.6.3
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.05.0
distributed: 2021.05.0
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: 0.11.1
numbagg: None
pint: None
setuptools: 44.0.0
pip: 20.0.2
conda: None
pytest: 6.2.4
IPython: 7.23.1
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions