ENH: add fast path for str(scalar_int) #23746

mattip · 2023-05-10T20:39:38Z

Create a fast path for str(scalar_int). The call to str(x) when x is an int scalar goes to genint_type_str, which calls PyObject_Str(gentype_generic_method(self, NULL, NULL, "item")). The gentype_generic_method creates a 0d array of the scalar (allocating a PyArrayObject with 1 data item), copies the item into the new 0d array, then calls arr.item() to get the int value, then wraps that in a PyLongObject.

It is faster to avoid all that and use scalar_value to get the void * value, then cast that to the proper int type before wrapping it in a PyLongObject.

I also added a benchmark to prove there is a speedup.

       before           after         ratio
     [181c15b2]       [fcf8a88e]
     <main>           <scalar-str>
-      29.3±0.3μs      27.8±0.09μs     0.95  bench_scalar.ScalarStr.time_addition('complex128')
-      15.9±0.3μs       15.0±0.2μs     0.94  bench_scalar.ScalarStr.time_addition('float64')
-      42.6±0.9μs       11.7±0.1μs     0.27  bench_scalar.ScalarStr.time_addition('int32')
-      42.1±0.9μs      11.2±0.06μs     0.27  bench_scalar.ScalarStr.time_addition('int16')
-      41.7±0.8μs       11.0±0.2μs     0.26  bench_scalar.ScalarStr.time_addition('int64')

This was motivated by a user who noticed that [str(int(x)) for x in numpy.arange(10000)] is much faster than [str(x) for x in numpy.arange(10000)]

mattip · 2023-05-10T20:40:30Z

numpy/core/src/multiarray/scalartypes.c.src

+            break;
+        default:
+            item = gentype_generic_method(self, NULL, NULL, "item");
+            break;


A fallback for some as-yet-unknown user defined int dtype

We need to generate a function for each type anyway. I don't think that str(self.item()) is a good default to begin with.

That said, I don't mind just doing this, I will just delete it again on main in the next 2 months hopefully. (The work should already be in the repr PR, and its time to push that after branching.)

mattip · 2023-05-10T20:41:51Z

numpy/core/tests/test_arrayprint.py



-    def test_structure_format(self):
+    def test_structure_format_mixed(self):


I split this test into 3, since the other test functions were not related to this one. I hit this by accident when I was trying out a different version of the code and printing was failing.

seberg

Not sure I wouldn't prefer a type specific path (just because its the pattern used I think). But, this will generate a merge conflict with larger changes anyway, so I don't care.

So, lets put it in for 1.25.x.

seberg · 2023-05-11T06:21:29Z

numpy/core/src/multiarray/scalartypes.c.src

+            break;
+        default:
+            item = gentype_generic_method(self, NULL, NULL, "item");
+            break;


We need to generate a function for each type anyway. I don't think that str(self.item()) is a good default to begin with.

That said, I don't mind just doing this, I will just delete it again on main in the next 2 months hopefully. (The work should already be in the repr PR, and its time to push that after branching.)

mattip · 2023-05-11T07:33:08Z

Ahh, type-specific __str__ methods for scalars are already part of the dtype refactor? Sorry, I didn't know that...

ngoldbaum · 2023-05-11T14:57:00Z

Ahh, type-specific str methods for scalars are already part of the dtype refactor?

You could define a dtype with a scalar that isn’t convertible to a string, but basic things like array printing wouldn’t work. We could probably add a check to the dtype registration that makes sure the scalar has a __str__ and __repr__ and raises an error otherwise.

charris · 2023-05-13T16:44:08Z

Thanks Matti.

numpy#23746 introduced a fast path for scalar int conversions, but the map between Python types and C types was subtly wrong. This fixes tests on at least ppc32 (big-endian). Many thanks to Sebastian Berg for debugging this with me and pointing out what needed to be fixed. Bug: numpy#24239 Fixes: 81caed6

numpy#23746 introduced a fast path for scalar int conversions, but the map between Python types and C types was subtly wrong. This fixes tests on at least ppc32 (big-endian). Many thanks to Sebastian Berg for debugging this with me and pointing out what needed to be fixed. Closes numpy#24239. Fixes: 81caed6

add fast path for str(scalar_int)

670842b

github-actions bot added the 01 - Enhancement label May 10, 2023

mattip commented May 10, 2023

View reviewed changes

BUG: typo, linting

5e983f2

seberg approved these changes May 11, 2023

View reviewed changes

seberg added this to the 1.25.0 release milestone May 11, 2023

charris merged commit 81caed6 into numpy:main May 13, 2023

seberg mentioned this pull request Jul 23, 2023

BUG: Unusual test failures (whitespace-related?) on ppc32 with numpy-1.25.1 #24239

Closed

thesamesam mentioned this pull request Jul 23, 2023

BUG: Fix C types in scalartypes #24240

Merged

charris mentioned this pull request Jul 30, 2023

BUG: Fix C types in scalartypes #24293

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: add fast path for str(scalar_int) #23746

ENH: add fast path for str(scalar_int) #23746

Uh oh!

mattip commented May 10, 2023

Uh oh!

mattip May 10, 2023

Uh oh!

seberg May 11, 2023

Uh oh!

mattip May 10, 2023

Uh oh!

seberg left a comment

Uh oh!

seberg May 11, 2023

Uh oh!

mattip commented May 11, 2023

Uh oh!

ngoldbaum commented May 11, 2023

Uh oh!

charris commented May 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def test_structure_format(self):
		def test_structure_format_mixed(self):

Uh oh!

ENH: add fast path for str(scalar_int) #23746

ENH: add fast path for str(scalar_int) #23746

Uh oh!

Conversation

mattip commented May 10, 2023

Uh oh!

mattip May 10, 2023

Choose a reason for hiding this comment

Uh oh!

seberg May 11, 2023

Choose a reason for hiding this comment

Uh oh!

mattip May 10, 2023

Choose a reason for hiding this comment

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

seberg May 11, 2023

Choose a reason for hiding this comment

Uh oh!

mattip commented May 11, 2023

Uh oh!

ngoldbaum commented May 11, 2023

Uh oh!

charris commented May 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants