-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What is your issue?
In xarray-contrib/datatree#309 @marcel-goldschen-ohm pointed out some counterintuitive behaviour of the DataTree.name setter.
# using xarray main
In [2]: from xarray.core.datatree import DataTree
In [3]: root = DataTree.from_dict(
...: {
...: 'child': DataTree(),
...: },
...: name='root',
...: )
In [4]: root['child'].name = 'childish'
In [5]: root
Out[5]:
<xarray.DataTree 'root'>
Group: /
└── Group: /childishthis looks fine, but
In [6]: root.children
Out[6]:
Frozen({'child': <xarray.DataTree 'childish'>
Group: /childish})which means that
In [14]: root['childish']KeyError: 'Could not find node at childish'Okay, so we just fix it so that setting .name changes the key in the parent node right? But as @etienneschalk pointed out in xarray-contrib/datatree#309 (comment), that wouldn't be consistent with DataArray and Dataset names behave: they don't update the key in the wrapping object either. Instead they just completely ignore that you asked to update it at all:
In [15]: ds = xr.Dataset({'a': 0})
In [17]: ds['a'].name = 'b'
In [18]: ds
Out[18]:
<xarray.Dataset> Size: 8B
Dimensions: ()
Data variables:
a int64 8B 0
In [19]: ds.variables
Out[19]:
Frozen({'a': <xarray.Variable ()> Size: 8B
array(0)})This happens because Dataset objects do not actually store DataArray objects, they only store Variables (which are un-named), and reconstruct a new DataArray upon __getitem__ calls. So if you use .name to alter the name of that newly-constructed DataArray you're not actually touching the wrapping Dataset at all, and therefore have no way to update its keys.
Indeed the same thing happens (for the same reason) when altering the name of a Variable that is part of a DataTree node:
In [8]: import xarray as xr
In [9]: root['a'] = xr.DataArray(0)
In [10]: root
Out[10]:
<xarray.DataTree 'root'>
Group: /
│ Dimensions: ()
│ Data variables:
│ a int64 8B 0
└── Group: /child
In [11]: root['a'].name
Out[11]: 'a'
In [12]: root['a'].name = 'b'
In [13]: root['a']
Out[13]:
<xarray.DataArray 'a' ()> Size: 8B
array(0)Is this ignoring of the name setter intended, or a bug? And should DataTree follow the behaviour of Dataset here or not? If it doesn't follow the behaviour of Dataset then we're going to end up with a situation where
root['child'].name = 'new_name'actually makes a change to root, but
root['a'].name = 'new_name'does not, so the behaviour of .name depends on whether or not your key corresponds to a DataTree value or a Variable value. 🫤
In the DataTree case I think it is a lot easier to update the key in the wrapping object when .name is set because there is a bi-directional linkage back to the parent node, which doesn't exist to link the DataArray instance back to Dataset.
cc @shoyer - I would appreciate your thoughts here.