-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Is your feature request related to a problem?
It is pretty common to want to run cumsum and have the sum reset when a boolean flag array is 1. This is so common it has its own Wikipedia page and is discussed in Blelloch (1993) (Section 1.5)
Here's a real example of someone trying to implement it in a fairly roundabout way.
time_cumsum = cube.cumsum(dim = 'time')
cumsum = time_cumsum - time_cumsum.where(cube== 0).ffill(dim = 'time').fillna(0)We have a few options to implement it:
-
We could introduce a new method
DataArray.segmented_scan(flags, op="sum")or a new classDataArray.segment.cumsum()? A dask/cubed friendly version that does all of this in a single scan should be fairly straightforward to write (and similar to ourffill,bfillwrappers). -
In a way this generalizes
resampleand it just struck me that the example above could be written as the following, which should be OK once flox adds scansgroup_idx = (cube == 0).cumsum('time') cubed.groupby(group_idx).cumsum()
- We could use our new
Grouperfunctionality to expose a "flag" grouper that hides thegroup_idx = (cube == 0).cumsum('time')line.
- We could use our new
My concern with (2) and (2.i) is that they are not at all obvious for most of our userbase.