Skip to content

ENH: add a StringDType scalar type that wraps a UTF-8 string #28165

@mtsokol

Description

@mtsokol

Describe the issue:

Hi @ngoldbaum,

I wonder if it would make sense to show a warning when dtype=str is passed to array/asarray saying that dtype=np.str_ is preferred.
np.dtypes.StringDType.type gives str but when passed to e.g. np.asarray(..., dtype=...) str gives np.str_ dtype. WDYT?

Reproduce the code example:

import numpy as np
arr1 = np.array([1, 2, 3], dtype=np.dtypes.Int32DType)
assert np.asarray(arr1, dtype=arr1.dtype.type).dtype == arr1.dtype

arr2 = np.array(["foo", "bar"], dtype=np.dtypes.StringDType)
np.asarray(arr2, dtype=arr2.dtype.type)

Error message:

TypeError                                 Traceback (most recent call last)
TypeError: Casting from StringDType to a fixed-width dtype with an unspecified size is not currently supported, specify an explicit size for the output dtype instead.

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[30], line 6
      3 assert np.asarray(arr1, dtype=arr1.dtype.type).dtype == arr1.dtype
      5 arr2 = np.array(["foo", "bar"], dtype=np.dtypes.StringDType)
----> 6 np.asarray(arr2, dtype=arr2.dtype.type)

TypeError: cannot cast dtype StringDType() to <class 'numpy.dtypes.StrDType'>.

Python and NumPy Versions:

2.3.0.dev0+git20250115.1e10174
3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]

Runtime Environment:

No response

Context for the issue:

No response

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions