BUG: Fix np.strings.slice if stop=None or start and stop >= len #29944

aaronkollasch · 2025-10-13T22:58:02Z

Python treats slice(-1) differently from slice(-1, None): The first is interpreted as slice(None, -1, None), while the second becomes slice(-1, None, None), according to the logic in slice_new.

However, np.strings.slice treats these identically, as it cannot distinguish unset arguments from arguments set to None. This makes it impossible to get the last characters of each string, for example:

>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # should return last two characters
array(['hel', 'wor'], dtype='<U5')

This PR fixes that behavior:

>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # returns last characters as expected
array(['lo', 'ld'], dtype='<U5')
>>> np.strings.slice(a, -2)  # original behavior preserved if no stop given
array(['hel', 'wor'], dtype='<U5')

It does this by adding a stop=np._NoValue default argument to np.strings.slice, which can be overridden with None.

The PR adds test conditions to numpy/_core/tests/test_strings.py::TestMethods::test_slice, to verify that the slicing behavior matches Python's slice with these arguments.

It also fixes an error when start and stop are >= the string length and the dtype is StringDType(). To reproduce:

>>> np.__version__
'2.3.3'
>>> a = np.array(['hello', 'world'], dtype="T")
>>> np.strings.slice(a, 5, 7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "numpy-dev/lib/python3.12/site-packages/numpy/_core/strings.py", line 1823, in slice
    return _slice(a, start, stop, step)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError: Failed to allocate string in slice

When running spin ipython based on main, these commands cause the python process to exit with code 251.
To fix this, I added bounds checks to codepoint_offsets[start] in numpy/_core/src/umath/stringdtype_ufuncs.cpp.

seberg

I am slightly curious if the second branch is correct, but I think it should be since it iterates through one by one so that the iteration stopping criteria encodes the incorrect logic.

Thanks for looking into both issues! I wonder if the None change should be mentioned, but overall it seems clear enough (and this function is still fairly new also anyway).

numpy/_core/src/umath/stringdtype_ufuncs.cpp

numpy/_core/strings.py

Python treats `slice(-1)` differently from `slice(-1, None)`: The first is interpreted as `slice(None, -1, None)`, while the second becomes `slice(-1, None, None)`, according to the logic in `slice_new`. However, `np.strings.slice` treats these identically, as it cannot distinguish unset arguments from arguments set to None. This makes it impossible to get the last characters of each string, for example: ```python >>> a = np.array(['hello', 'world']) >>> np.strings.slice(a, -2, None) # should return last two characters array(['hel', 'wor'], dtype='<U5') ``` This commit fixes that behavior: ```python >>> a = np.array(['hello', 'world']) >>> np.strings.slice(a, -2, None) # returns last characters as expected array(['lo', 'ld'], dtype='<U5') >>> np.strings.slice(a, -2) # original behavior preserved if no stop array(['hel', 'wor'], dtype='<U5') ``` It does this by adding a `stop=np._NoValue` default argument to `np.strings.slice`, which can be overridden with `None`. This commit also adds test conditions to `numpy/_core/tests/test_strings.py::TestMethods::test_slice` to verify that the slicing behavior matches Python's `slice`. Note that 4 newly added test conditions are commented out for now, as they cause errors with the "T" dtype. To reproduce: ``` >>> np.__version__ '2.3.3' >>> a = np.array(['hello', 'world'], dtype="T") >>> np.strings.slice(a, 5, 7) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "numpy-dev/lib/python3.12/site-packages/numpy/_core/strings.py", line 1823, in slice return _slice(a, start, stop, step) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ MemoryError: Failed to allocate string in slice ``` This causes either a MemoryError or kills the process with code 251.

Allows commented test_slice conditions to be uncommented.

seberg

LGTM, thanks @aaronkollasch. I think this is good to go in if tests pass (I assume they will).

aaronkollasch · 2025-10-15T14:53:43Z

Sounds great, thanks for reviewing @seberg!

…y#29944) Python treats `slice(-1)` differently from `slice(-1, None)`: The first is interpreted as `slice(None, -1, None)`, while the second becomes `slice(-1, None, None)`, according to the logic in `slice_new`. However, `np.strings.slice` treats these identically, as it cannot distinguish unset arguments from arguments set to None. This makes it impossible to get the last characters of each string, for example: ```python >>> a = np.array(['hello', 'world']) >>> np.strings.slice(a, -2, None) # should return last two characters array(['hel', 'wor'], dtype='<U5') ``` This commit fixes that behavior: ```python >>> a = np.array(['hello', 'world']) >>> np.strings.slice(a, -2, None) # returns last characters as expected array(['lo', 'ld'], dtype='<U5') >>> np.strings.slice(a, -2) # original behavior preserved if no stop array(['hel', 'wor'], dtype='<U5') ``` It does this by adding a `stop=np._NoValue` default argument to `np.strings.slice`, which can be overridden with `None`. This commit also adds test conditions to `numpy/_core/tests/test_strings.py::TestMethods::test_slice` to verify that the slicing behavior matches Python's `slice`. Note that 4 newly added test conditions are commented out for now, as they cause errors with the "T" dtype. To reproduce: ``` >>> np.__version__ '2.3.3' >>> a = np.array(['hello', 'world'], dtype="T") >>> np.strings.slice(a, 5, 7) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "numpy-dev/lib/python3.12/site-packages/numpy/_core/strings.py", line 1823, in slice return _slice(a, start, stop, step) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ MemoryError: Failed to allocate string in slice ``` This causes either a MemoryError or kills the process with code 251. * BUG: Fix np.strings.slice when start and stop >= len Allows commented test_slice conditions to be uncommented.

ngoldbaum · 2025-10-23T17:25:58Z

Thanks so much for the fix and for expanding the tests!

BUG: Fix np.strings.slice if stop=None or start and stop >= len (#29944)

…y#29944) Python treats `slice(-1)` differently from `slice(-1, None)`: The first is interpreted as `slice(None, -1, None)`, while the second becomes `slice(-1, None, None)`, according to the logic in `slice_new`. However, `np.strings.slice` treats these identically, as it cannot distinguish unset arguments from arguments set to None. This makes it impossible to get the last characters of each string, for example: ```python >>> a = np.array(['hello', 'world']) >>> np.strings.slice(a, -2, None) # should return last two characters array(['hel', 'wor'], dtype='<U5') ``` This commit fixes that behavior: ```python >>> a = np.array(['hello', 'world']) >>> np.strings.slice(a, -2, None) # returns last characters as expected array(['lo', 'ld'], dtype='<U5') >>> np.strings.slice(a, -2) # original behavior preserved if no stop array(['hel', 'wor'], dtype='<U5') ``` It does this by adding a `stop=np._NoValue` default argument to `np.strings.slice`, which can be overridden with `None`. This commit also adds test conditions to `numpy/_core/tests/test_strings.py::TestMethods::test_slice` to verify that the slicing behavior matches Python's `slice`. Note that 4 newly added test conditions are commented out for now, as they cause errors with the "T" dtype. To reproduce: ``` >>> np.__version__ '2.3.3' >>> a = np.array(['hello', 'world'], dtype="T") >>> np.strings.slice(a, 5, 7) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "numpy-dev/lib/python3.12/site-packages/numpy/_core/strings.py", line 1823, in slice return _slice(a, start, stop, step) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ MemoryError: Failed to allocate string in slice ``` This causes either a MemoryError or kills the process with code 251. * BUG: Fix np.strings.slice when start and stop >= len Allows commented test_slice conditions to be uncommented.

github-actions bot added the 00 - Bug label Oct 13, 2025

aaronkollasch force-pushed the fix-string-slice-stop branch from db9748c to 3998c3b Compare October 13, 2025 23:33

seberg reviewed Oct 14, 2025

View reviewed changes

numpy/_core/src/umath/stringdtype_ufuncs.cpp Outdated Show resolved Hide resolved

numpy/_core/strings.py Outdated Show resolved Hide resolved

numpy/_core/strings.py Outdated Show resolved Hide resolved

aaronkollasch added 2 commits October 14, 2025 20:38

BUG: Fix np.strings.slice when start and stop >= len

7b9c5fc

Allows commented test_slice conditions to be uncommented.

aaronkollasch force-pushed the fix-string-slice-stop branch from 3998c3b to 7b9c5fc Compare October 15, 2025 00:43

seberg approved these changes Oct 15, 2025

View reviewed changes

seberg merged commit 3958757 into numpy:main Oct 15, 2025
77 checks passed

aaronkollasch deleted the fix-string-slice-stop branch October 16, 2025 00:17

aaronkollasch mentioned this pull request Oct 21, 2025

BUG: Fix np.strings.slice if start > stop #29989

Merged

charris added the 09 - Backport-Candidate PRs tagged should be backported label Oct 21, 2025

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Oct 23, 2025

charris added a commit that referenced this pull request Oct 23, 2025

Merge pull request #30059 from charris/backport-29944

0902aa6

BUG: Fix np.strings.slice if stop=None or start and stop >= len (#29944)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix np.strings.slice if stop=None or start and stop >= len #29944

BUG: Fix np.strings.slice if stop=None or start and stop >= len #29944

Uh oh!

aaronkollasch commented Oct 13, 2025 •

edited

Loading

Uh oh!

seberg left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seberg left a comment

Uh oh!

aaronkollasch commented Oct 15, 2025

Uh oh!

Uh oh!

ngoldbaum commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

BUG: Fix np.strings.slice if stop=None or start and stop >= len #29944

BUG: Fix np.strings.slice if stop=None or start and stop >= len #29944

Uh oh!

Conversation

aaronkollasch commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

aaronkollasch commented Oct 15, 2025

Uh oh!

Uh oh!

ngoldbaum commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aaronkollasch commented Oct 13, 2025 •

edited

Loading