-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
This ends up calling fillna() in a loop inside xarray.core.merge.unique_variable(), something like:
out = variables[0]
for var in variables[1:]:
out = out.fillna(var)Lines 147 to 149 in 55e5b5a
| if combine_method: | |
| for var in variables[1:]: | |
| out = getattr(out, combine_method)(var) |
This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for merge() (which typically only has two arguments) but is problematic for dealing with variables that shouldn't be concatenated inside concat(), which should be able to handle very long lists of arguments.
I encountered this because compat='no_conflicts' is the default for xarray.combine_nested().
I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized.