Allow cupy.ndarray as repeats argument to cupy.repeat#9828

Open

eriknw wants to merge 1 commit intocupy:mainfrom

eriknw:repeat_array_repeats

eriknw commented Mar 25, 2026

cupy.repeat now accepts a CuPy ndarray for the repeats parameter, matching NumPy's API. Scalar-like inputs (size-1 or 0-D arrays) use an efficient broadcast-copy kernel. Multi-element inputs use cumsum + searchsorted + take on the GPU. List/tuple inputs with len > 1, previously O(n * kernel_launch), now use the same GPU path.

Closes #3849. There are probably good reasons why this addition hasn't been done before (there is a D2H transfer) given how long #3849 has been open for and how much interest there has been in it, so I'm curious to find out why! This feature has actually been on my wishlist for a few years too.

Claude Code (Opus 4.6) was used to assist in the development of this PR, but I was also heavily involved. These changes were actually spawned from my efforts to improve sparse matrix classes in #9825, and like in that PR I'm still doing my final human review of these changes.


          Allow cupy.ndarray as repeats argument to cupy.repeat (cupy#3849)

f41ac42

cupy.repeat now accepts a CuPy ndarray for the repeats parameter,
matching NumPy's API. Scalar-like inputs (size-1 or 0-D arrays) use
an efficient broadcast-copy kernel. Multi-element inputs use
cumsum + searchsorted + take on the GPU. List/tuple inputs with
len > 1, previously O(n * kernel_launch), now use the same GPU path.

numpy.ndarray inputs are rejected with TypeError (CuPy convention).

Closes cupy#3849

eriknw requested a review from a team as a code owner

March 25, 2026 13:46

seberg self-assigned this

leofang added cat:enhancement to-be-backported labels

seberg reviewed

View reviewed changes

Member

seberg left a comment

Looks good in general, I have a lot of nitpicking comments, some can be ignored and/or are more general "could be a bit nicer". As usual, tests seem a bit verbose, but I don't mind much.

(I should look through at it once more, but thought I'd post.)

cupy/_core/_routines_manipulation.pyx

+                  faster than scatter-add + cumsum because CUDA atomic operations have
+                  high per-operation overhead relative to parallel binary search.
+                  """
+                  import cupy as _cupy

Member

seberg Mar 26, 2026

Seems a bit strange to do it here. Might as well do it top-level?

cupy/_core/_routines_manipulation.pyx

+                  if not numpy.issubdtype(reps.dtype, numpy.integer):
+                      raise ValueError(
+                          "'repeats' array must have integer dtype, got {}".format(
+                              reps.dtype))

Member

seberg Mar 26, 2026

I might suggest to just use reps = reps.astype(np.intp, casting="safe")
(safe is what NumPy seems to use, but same_kind actually makes more sense maybe, since it allows casting from uint64 even if shady in theory.)

That replaces the != intp check below and raises the error here also.

Member

seberg Mar 26, 2026

But looking below, it also feels like defensive nonsense to do this here. At this point it should already be guaranteed a 1-D intp array.

cupy/_core/_routines_manipulation.pyx

+                          "'repeats' array must have integer dtype, got {}".format(
+                              reps.dtype))
+                  reps = reps.ravel()

Member

seberg Mar 26, 2026

I don't think NumPy does this ravel? a bit unclear why it is here?

cupy/_core/_routines_manipulation.pyx

Comment on lines +570 to +571


		axis = internal._normalize_axis_index(axis, a.ndim)

Member

seberg Mar 26, 2026

Suggested change

      
                axis = internal._normalize_axis_index(axis, a.ndim)
          
                else:
          
                    axis = internal._normalize_axis_index(axis, a.ndim)

cupy/_core/_routines_manipulation.pyx

+                      if reps_cpu.min() < 0:
+                          raise ValueError(
+                              "all elements of 'repeats' should not be negative")
+                      total = int(reps_cpu.sum())

Member

seberg Mar 26, 2026

Not sure I understand why we would do this on the CPU rather than GPU if we already made it a GPU array?
I mean, sure it may be pretty fast, but we do a cumsum below on the GPU, it seems like a weird mix of things.

Also, boundaries[-1] is also the total sum, so we don't have to do this twice.

cupy/_core/_routines_manipulation.pyx

+                  elif isinstance(repeats, int) or (
+                          hasattr(repeats, 'dtype') and
+                          numpy.ndim(repeats) == 0 and
+                          numpy.issubdtype(repeats.dtype, numpy.integer)):

Member

seberg Mar 26, 2026

I would suggest to just use:

try:
    reps = operator.index()
except TypeError:
    pass

Admittedly, that is a bit more broad but not much, so if you don't like it.
(I don't love try/excepts, but things like numpy.ndim and numpy.issubdtype are also all shady and slow)

This would let numpy ndarrays pass maybe...

cupy/_core/_routines_manipulation.pyx

-                          axis = 0
+                          # Non-broadcast sequence (including empty): GPU per-element path.
+                          return _repeat_ndarray_repeats(
+                              a, core.array(repeats, dtype=numpy.intp), axis)

Member

seberg Mar 26, 2026

Using core.array(repeats, dtype=numpy.intp) means we actually accept a list of floats indirectly.
This is maybe fine, not doing so is actually a bit tedious (you need to special case for len(repeats) == 0, and otherwise use numpy.asarray(), and then do a same_kind cast.

tests/cupy_tests/manipulation_tests/test_tiling.py

+                  """Make a reps array on the right platform."""
+                  if xp is cupy:
+                      return cupy.array(vals)
+                  return numpy.array(vals)

Member

seberg Mar 26, 2026

This function is just silly :), since it is just xp.array(vals)

tests/cupy_tests/manipulation_tests/test_tiling.py

+                  {'shape': (2, 3, 4, 5), 'reps': [2, 1, 3], 'axis': 1},
+                  {'shape': (2, 3, 4, 5), 'reps': [3],        'axis': 3},
+              )
+              class TestRepeatNdarrayRepeats(unittest.TestCase):

Member

seberg Mar 26, 2026

Let's get rid of unittest and use normal @pytest.parametrize, (all it does is not use shape as an argument rather than self.shape).

Just as a long-term hope to use a more typical pattern in general.

tests/cupy_tests/manipulation_tests/test_tiling.py

+                  """Make an integer reps array (int32 dtype), even for empty lists."""
+                  if xp is cupy:
+                      return cupy.array(vals, dtype=numpy.int32)
+                  return numpy.array(vals, dtype=numpy.int32)

Member

seberg Mar 26, 2026

Also feels like an unnecessary helper?

Author

eriknw commented Mar 27, 2026

Thanks for the initial review and nits--keep 'em coming! Seem valid to me. I'll give my human replies on Monday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:enhancement to-be-backported