-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Discussed in #9421
Originally posted by KBodolai September 2, 2024
Hi there! I have a question about the chunking behaviour when using idxmin / idxmax for a chunked array.
What is the expected behaviour for the chunks after we run idxmin over one of the dimensions? Naively I'd expect it to keep the chunks along the other dimensions, but that doesn't seem to be what happens: (Example below with time, x, y)
import numpy as np
import xarray as xr
# create some dummy data and chunk
x, y, t = 1000, 1000, 57
rang = np.arange(t*x*y)
da = xr.DataArray(rang.reshape(t, x, y), coords={'time':range(t), 'x': range(x), 'y':range(y)})
da = da.chunk(dict(time=-1, x=256, y=256))Now when I look at the array, it looks something like this:
da.idxmin('time')But after doing idxmin I get the outputs below

My understanding is that it seems to be trying to keep the number of chunks. But oddly, when we do it for floats:
da = da.astype('float32')before and after doing the idxmin looks like this:
Is this the expected behavour for this operation? I'm guessing the reshaping in the source code happens here, but I haven't been able to figure out how yet.
Thanks!
K.