Skip to content

idxmin / idxmax is not parallel friendly #9425

@dcherian

Description

@dcherian

Discussed in #9421

Originally posted by KBodolai September 2, 2024
Hi there! I have a question about the chunking behaviour when using idxmin / idxmax for a chunked array.

What is the expected behaviour for the chunks after we run idxmin over one of the dimensions? Naively I'd expect it to keep the chunks along the other dimensions, but that doesn't seem to be what happens: (Example below with time, x, y)

import numpy as np
import xarray as xr

# create some dummy data and chunk
x, y, t = 1000, 1000, 57
rang = np.arange(t*x*y)
da = xr.DataArray(rang.reshape(t, x, y), coords={'time':range(t), 'x': range(x), 'y':range(y)})
da = da.chunk(dict(time=-1, x=256, y=256))

Now when I look at the array, it looks something like this:

Screenshot 2024-09-02 at 17 06 22
da.idxmin('time')

But after doing idxmin I get the outputs below
Screenshot 2024-09-02 at 17 00 17

My understanding is that it seems to be trying to keep the number of chunks. But oddly, when we do it for floats:

da = da.astype('float32')

before and after doing the idxmin looks like this:

Screenshot 2024-09-02 at 17 10 25 Screenshot 2024-09-02 at 17 10 11

Is this the expected behavour for this operation? I'm guessing the reshaping in the source code happens here, but I haven't been able to figure out how yet.

Thanks!
K.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions