Skip to content

Allow cupy.ndarray as repeats argument to cupy.repeat#9828

Open
eriknw wants to merge 1 commit intocupy:mainfrom
eriknw:repeat_array_repeats
Open

Allow cupy.ndarray as repeats argument to cupy.repeat#9828
eriknw wants to merge 1 commit intocupy:mainfrom
eriknw:repeat_array_repeats

Conversation

@eriknw
Copy link
Copy Markdown

@eriknw eriknw commented Mar 25, 2026

cupy.repeat now accepts a CuPy ndarray for the repeats parameter, matching NumPy's API. Scalar-like inputs (size-1 or 0-D arrays) use an efficient broadcast-copy kernel. Multi-element inputs use cumsum + searchsorted + take on the GPU. List/tuple inputs with len > 1, previously O(n * kernel_launch), now use the same GPU path.

Closes #3849. There are probably good reasons why this addition hasn't been done before (there is a D2H transfer) given how long #3849 has been open for and how much interest there has been in it, so I'm curious to find out why! This feature has actually been on my wishlist for a few years too.

Claude Code (Opus 4.6) was used to assist in the development of this PR, but I was also heavily involved. These changes were actually spawned from my efforts to improve sparse matrix classes in #9825, and like in that PR I'm still doing my final human review of these changes.

cupy.repeat now accepts a CuPy ndarray for the repeats parameter,
matching NumPy's API. Scalar-like inputs (size-1 or 0-D arrays) use
an efficient broadcast-copy kernel. Multi-element inputs use
cumsum + searchsorted + take on the GPU. List/tuple inputs with
len > 1, previously O(n * kernel_launch), now use the same GPU path.

numpy.ndarray inputs are rejected with TypeError (CuPy convention).

Closes cupy#3849
@eriknw eriknw requested a review from a team as a code owner March 25, 2026 13:46
@seberg seberg self-assigned this Mar 25, 2026
@leofang leofang added cat:enhancement Improvements to existing features to-be-backported Pull-requests to be backported to stable branch labels Mar 25, 2026
Copy link
Copy Markdown
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, I have a lot of nitpicking comments, some can be ignored and/or are more general "could be a bit nicer". As usual, tests seem a bit verbose, but I don't mind much.

(I should look through at it once more, but thought I'd post.)

faster than scatter-add + cumsum because CUDA atomic operations have
high per-operation overhead relative to parallel binary search.
"""
import cupy as _cupy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit strange to do it here. Might as well do it top-level?

if not numpy.issubdtype(reps.dtype, numpy.integer):
raise ValueError(
"'repeats' array must have integer dtype, got {}".format(
reps.dtype))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest to just use reps = reps.astype(np.intp, casting="safe")
(safe is what NumPy seems to use, but same_kind actually makes more sense maybe, since it allows casting from uint64 even if shady in theory.)

That replaces the != intp check below and raises the error here also.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But looking below, it also feels like defensive nonsense to do this here. At this point it should already be guaranteed a 1-D intp array.

"'repeats' array must have integer dtype, got {}".format(
reps.dtype))

reps = reps.ravel()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think NumPy does this ravel? a bit unclear why it is here?

Comment on lines +570 to +571

axis = internal._normalize_axis_index(axis, a.ndim)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
axis = internal._normalize_axis_index(axis, a.ndim)
else:
axis = internal._normalize_axis_index(axis, a.ndim)

if reps_cpu.min() < 0:
raise ValueError(
"all elements of 'repeats' should not be negative")
total = int(reps_cpu.sum())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why we would do this on the CPU rather than GPU if we already made it a GPU array?
I mean, sure it may be pretty fast, but we do a cumsum below on the GPU, it seems like a weird mix of things.

Also, boundaries[-1] is also the total sum, so we don't have to do this twice.

elif isinstance(repeats, int) or (
hasattr(repeats, 'dtype') and
numpy.ndim(repeats) == 0 and
numpy.issubdtype(repeats.dtype, numpy.integer)):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to just use:

try:
    reps = operator.index()
except TypeError:
    pass

Admittedly, that is a bit more broad but not much, so if you don't like it.
(I don't love try/excepts, but things like numpy.ndim and numpy.issubdtype are also all shady and slow)

This would let numpy ndarrays pass maybe...

axis = 0
# Non-broadcast sequence (including empty): GPU per-element path.
return _repeat_ndarray_repeats(
a, core.array(repeats, dtype=numpy.intp), axis)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using core.array(repeats, dtype=numpy.intp) means we actually accept a list of floats indirectly.
This is maybe fine, not doing so is actually a bit tedious (you need to special case for len(repeats) == 0, and otherwise use numpy.asarray(), and then do a same_kind cast.

"""Make a reps array on the right platform."""
if xp is cupy:
return cupy.array(vals)
return numpy.array(vals)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is just silly :), since it is just xp.array(vals)

{'shape': (2, 3, 4, 5), 'reps': [2, 1, 3], 'axis': 1},
{'shape': (2, 3, 4, 5), 'reps': [3], 'axis': 3},
)
class TestRepeatNdarrayRepeats(unittest.TestCase):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of unittest and use normal @pytest.parametrize, (all it does is not use shape as an argument rather than self.shape).

Just as a long-term hope to use a more typical pattern in general.

"""Make an integer reps array (int32 dtype), even for empty lists."""
if xp is cupy:
return cupy.array(vals, dtype=numpy.int32)
return numpy.array(vals, dtype=numpy.int32)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also feels like an unnecessary helper?

@eriknw
Copy link
Copy Markdown
Author

eriknw commented Mar 27, 2026

Thanks for the initial review and nits--keep 'em coming! Seem valid to me. I'll give my human replies on Monday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:enhancement Improvements to existing features to-be-backported Pull-requests to be backported to stable branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support array objects as the repeats argument to cupy.repeat

3 participants