Using tree reductions with unique

The current `unique` implementation, while lazy, does not currently leverage tree reductions akin to those used for things like `sum`. These tree reductions allow operations to be computed over each chunk and then reduce small groups of chunks into a smaller group of new chunks that can be further reduced until only one small chunk remains. Through this process it is possible to work with much larger pieces of data. To allow `unique` to work on larger pieces of data, it would be nice to leverage this tree reduction strategy in its implementation.

Currently `unique` makes use of a wrapper function `_unique_internal`, which makes it pretty easy to support `unique` with various optional extra results. This function is well designed for performing more complex reductions by supporting reductions of the other arguments as well. So there shouldn't really be any retuning of `unique` needed for this change.

It is almost possible to implement `unique` using the existing reduction utilities in [`dask.array.reductions`]( https://github.com/dask/dask/blob/b9e3487a963b574f91eb403fdfc0baf1bc97d2eb/dask/array/reductions.py ) except that these functions seem to assume that a reduction results in a chunk with a single element (or one element for a dimension). However `unique` may return an unknown number of elements that could range from one to the length of the data itself. Hence we would need a way to explain that the chunks from each reduction may have a different or indeterminate size. Perhaps another argument to `reduction` and `_tree_reduce` is all that is needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using tree reductions with unique #2851

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Using tree reductions with unique #2851

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions