Skip to content

Conversation

@aaronkollasch
Copy link
Contributor

@aaronkollasch aaronkollasch commented Oct 13, 2025

Python treats slice(-1) differently from slice(-1, None): The first is interpreted as slice(None, -1, None), while the second becomes slice(-1, None, None), according to the logic in slice_new.

However, np.strings.slice treats these identically, as it cannot distinguish unset arguments from arguments set to None. This makes it impossible to get the last characters of each string, for example:

>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # should return last two characters
array(['hel', 'wor'], dtype='<U5')

This PR fixes that behavior:

>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # returns last characters as expected
array(['lo', 'ld'], dtype='<U5')
>>> np.strings.slice(a, -2)  # original behavior preserved if no stop given
array(['hel', 'wor'], dtype='<U5')

It does this by adding a stop=np._NoValue default argument to np.strings.slice, which can be overridden with None.

The PR adds test conditions to numpy/_core/tests/test_strings.py::TestMethods::test_slice, to verify that the slicing behavior matches Python's slice with these arguments.

It also fixes an error when start and stop are >= the string length and the dtype is StringDType(). To reproduce:

>>> np.__version__
'2.3.3'
>>> a = np.array(['hello', 'world'], dtype="T")
>>> np.strings.slice(a, 5, 7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "numpy-dev/lib/python3.12/site-packages/numpy/_core/strings.py", line 1823, in slice
    return _slice(a, start, stop, step)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError: Failed to allocate string in slice

When running spin ipython based on main, these commands cause the python process to exit with code 251.
To fix this, I added bounds checks to codepoint_offsets[start] in numpy/_core/src/umath/stringdtype_ufuncs.cpp.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly curious if the second branch is correct, but I think it should be since it iterates through one by one so that the iteration stopping criteria encodes the incorrect logic.

Thanks for looking into both issues! I wonder if the None change should be mentioned, but overall it seems clear enough (and this function is still fairly new also anyway).

Python treats `slice(-1)` differently from `slice(-1, None)`:
The first is interpreted as `slice(None, -1, None)`, while the second
becomes `slice(-1, None, None)`, according to the logic in `slice_new`.

However, `np.strings.slice` treats these identically, as it cannot
distinguish unset arguments from arguments set to None. This makes it
impossible to get the last characters of each string, for example:
```python
>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # should return last two characters
array(['hel', 'wor'], dtype='<U5')
```

This commit fixes that behavior:
```python
>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # returns last characters as expected
array(['lo', 'ld'], dtype='<U5')
>>> np.strings.slice(a, -2)  # original behavior preserved if no stop
array(['hel', 'wor'], dtype='<U5')
```

It does this by adding a `stop=np._NoValue` default argument to
`np.strings.slice`, which can be overridden with `None`.

This commit also adds test conditions to
`numpy/_core/tests/test_strings.py::TestMethods::test_slice`
to verify that the slicing behavior matches Python's `slice`.
Note that 4 newly added test conditions are commented out for now,
as they cause errors with the "T" dtype. To reproduce:
```
>>> np.__version__
'2.3.3'
>>> a = np.array(['hello', 'world'], dtype="T")
>>> np.strings.slice(a, 5, 7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "numpy-dev/lib/python3.12/site-packages/numpy/_core/strings.py", line 1823, in slice
    return _slice(a, start, stop, step)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError: Failed to allocate string in slice
```
This causes either a MemoryError or kills the process with code 251.
Allows commented test_slice conditions to be uncommented.
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @aaronkollasch. I think this is good to go in if tests pass (I assume they will).

@aaronkollasch
Copy link
Contributor Author

Sounds great, thanks for reviewing @seberg!

@seberg seberg merged commit 3958757 into numpy:main Oct 15, 2025
77 checks passed
@aaronkollasch aaronkollasch deleted the fix-string-slice-stop branch October 16, 2025 00:17
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Oct 21, 2025
charris pushed a commit to charris/numpy that referenced this pull request Oct 23, 2025
…y#29944)

Python treats `slice(-1)` differently from `slice(-1, None)`:
The first is interpreted as `slice(None, -1, None)`, while the second
becomes `slice(-1, None, None)`, according to the logic in `slice_new`.

However, `np.strings.slice` treats these identically, as it cannot
distinguish unset arguments from arguments set to None. This makes it
impossible to get the last characters of each string, for example:
```python
>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # should return last two characters
array(['hel', 'wor'], dtype='<U5')
```

This commit fixes that behavior:
```python
>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # returns last characters as expected
array(['lo', 'ld'], dtype='<U5')
>>> np.strings.slice(a, -2)  # original behavior preserved if no stop
array(['hel', 'wor'], dtype='<U5')
```

It does this by adding a `stop=np._NoValue` default argument to
`np.strings.slice`, which can be overridden with `None`.

This commit also adds test conditions to
`numpy/_core/tests/test_strings.py::TestMethods::test_slice`
to verify that the slicing behavior matches Python's `slice`.
Note that 4 newly added test conditions are commented out for now,
as they cause errors with the "T" dtype. To reproduce:
```
>>> np.__version__
'2.3.3'
>>> a = np.array(['hello', 'world'], dtype="T")
>>> np.strings.slice(a, 5, 7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "numpy-dev/lib/python3.12/site-packages/numpy/_core/strings.py", line 1823, in slice
    return _slice(a, start, stop, step)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError: Failed to allocate string in slice
```
This causes either a MemoryError or kills the process with code 251.

* BUG: Fix np.strings.slice when start and stop >= len

Allows commented test_slice conditions to be uncommented.
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Oct 23, 2025
@ngoldbaum
Copy link
Member

Thanks so much for the fix and for expanding the tests!

charris added a commit that referenced this pull request Oct 23, 2025
BUG: Fix np.strings.slice if stop=None or start and stop >= len (#29944)
IndifferentArea pushed a commit to IndifferentArea/numpy that referenced this pull request Dec 7, 2025
…y#29944)

Python treats `slice(-1)` differently from `slice(-1, None)`:
The first is interpreted as `slice(None, -1, None)`, while the second
becomes `slice(-1, None, None)`, according to the logic in `slice_new`.

However, `np.strings.slice` treats these identically, as it cannot
distinguish unset arguments from arguments set to None. This makes it
impossible to get the last characters of each string, for example:
```python
>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # should return last two characters
array(['hel', 'wor'], dtype='<U5')
```

This commit fixes that behavior:
```python
>>> a = np.array(['hello', 'world'])
>>> np.strings.slice(a, -2, None)  # returns last characters as expected
array(['lo', 'ld'], dtype='<U5')
>>> np.strings.slice(a, -2)  # original behavior preserved if no stop
array(['hel', 'wor'], dtype='<U5')
```

It does this by adding a `stop=np._NoValue` default argument to
`np.strings.slice`, which can be overridden with `None`.

This commit also adds test conditions to
`numpy/_core/tests/test_strings.py::TestMethods::test_slice`
to verify that the slicing behavior matches Python's `slice`.
Note that 4 newly added test conditions are commented out for now,
as they cause errors with the "T" dtype. To reproduce:
```
>>> np.__version__
'2.3.3'
>>> a = np.array(['hello', 'world'], dtype="T")
>>> np.strings.slice(a, 5, 7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "numpy-dev/lib/python3.12/site-packages/numpy/_core/strings.py", line 1823, in slice
    return _slice(a, start, stop, step)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError: Failed to allocate string in slice
```
This causes either a MemoryError or kills the process with code 251.

* BUG: Fix np.strings.slice when start and stop >= len

Allows commented test_slice conditions to be uncommented.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants