Skip to content

Set one-dimensional data variable as dimension coordinate? #2461

@nedclimaterisk

Description

@nedclimaterisk

Code Sample

I have this dataset, and I'd like to make it indexable by time:

<xarray.Dataset>
Dimensions:                (station_observations: 46862)
Dimensions without coordinates: station_observations
Data variables:
    time                   (station_observations) datetime64[ns] ...
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

Problem description

I expected to be able to use ds.set_coords to make the time variable an indexable coordinate. The variable IS converted to a coordinate, but it is not a dimension coordinate, so I can't index with it. I can use assign_coords(station_observations=ds.time) to make station_observations indexable by time, but then the name in semantically wrong, and the time variable still exists, which makes the code harder to maintain.

Expected Output

ds.set_coords('time', inplace=True)
<xarray.Dataset>
Dimensions:                (station_observations: 46862)
Coordinates:
    time                   (station_observations) datetime64[ns] ...
Dimensions without coordinates: station_observations
Data variables:
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

In [95]: ds.sel(time='1896')
ValueError: dimensions or multi-index levels ['time'] do not exist

with assign_coords:

In [97]: ds=ds.assign_coords(station_observations=ds.time)

In [98]: ds.sel(station_observations='1896')
Out[98]: 
<xarray.Dataset>
Dimensions:                (station_observations: 366)
Coordinates:
  * station_observations   (station_observations) datetime64[ns] 1896-01-01 ...
Data variables:
    time                   (station_observations) datetime64[ns] ...
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

works correctly, but looks ugly. It would be nice if the time variable could be assigned as a dimension directly. I can drop the time variable and rename the station_observations, but it's a little annoying to do so.

Output of xr.show_versions()

Details

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.16.0-041600-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.13.3
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: 1.2.0
cyordereddict: None
dask: 0.16.0
distributed: None
matplotlib: 2.1.1
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions