Skip to content

ENH: Introduce multiple pair parameters in the 'repeat' function#23937

Open
pedro-sl wants to merge 5 commits intonumpy:mainfrom
pedro-sl:repeat-feature
Open

ENH: Introduce multiple pair parameters in the 'repeat' function#23937
pedro-sl wants to merge 5 commits intonumpy:mainfrom
pedro-sl:repeat-feature

Conversation

@pedro-sl
Copy link
Copy Markdown
Contributor

@pedro-sl pedro-sl commented Jun 14, 2023

This feature enables pairs of parameters to be passed to the NumPy's repeat function: tuple arguments can now be passed to the 'repeats' parameter and the ‘axis’ parameter can now also receive a sequence of integers. Data types are properly checked. Flattened output arrays (non-specified axes) and repeats broadcasted to the size of the paired axis are also taken into account. For instance, if two pairs of arguments are used and the second one doesn't have an axis specified, a flat output array is returned. This feature was first suggested in #21435.

Moreover, the multiple repeats are processed in ascending order, meaning the repeats that result in a smaller size of its axis in the intermediate output array are processed first. This adjustment renders a processing time reduction of approximately 50% in significantly large repeats (i.e. over 100 repeats per axis).

This enhancement makes the repeat function more versatile and elegant. The greater the number of dimensions of the input array to be repeated over a axis, the more useful this feature is.

Usage example:

>>> x = np.array([[1,2],[3,4]])
>>> x = np.repeat(x, ([3, 3], [1, 2]), (1, 0))
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4],
[3, 3, 3, 4, 4, 4]])

>>> x = np.repeat(x, (3, [1, 2], 1), (1, 0))
array([1, 1, 1, 2, 2, 2,
3, 3, 3, 4, 4, 4,
3, 3, 3, 4, 4, 4])

@rkern
Copy link
Copy Markdown
Member

rkern commented Jun 14, 2023

This seems to me to be better as a separate function that will use np.repeat() as a primitive rather than extending the and complicating the semantics of np.repeat() itself.

Either way, this is the kind of expansion of the API that needs to be discussed on the mailing list first.

@mattip
Copy link
Copy Markdown
Member

mattip commented Jun 14, 2023

Expanding a bit on @rkern's comment:

There is some discussion of adding repeat to the Array API in a future revision, along with a number of other commonly-used APIs. Extending the signature would move NumPy further away from the signature used in other array-processing libraries, and would need to be considered carefully.

@pedro-sl
Copy link
Copy Markdown
Contributor Author

Thank you for your input!
I would like to point out that this feature is able to halve the process time for repeats that result in a very large array size on the repeated dimension by processing the multiple repeats in ascending order (smallest resulting axis size first).
Here are some of the improvements that were consistently managed locally:

image

There is no problem in creating a new separate function - 'repeats' - that uses 'repeat' and preserves all the new functionalities that were implemented.
I can either bring forward those changes or keep the current feature as it is.
Either way, any input is greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Second contribution

Development

Successfully merging this pull request may close these issues.

3 participants