BUG: Provide correct format in Py_buffer for scalars by vanossj · Pull Request #10564 · numpy/numpy

vanossj · 2018-02-09T22:16:44Z

memory view of scalars will report correct format code as well as ndim==0 and empty shape/strides/suboffsets

Previous behaviour had memoryview of all scalars be an array of bytes.

Format code is stored in Py_buffer.internal. This was a simple place to store that didn't require an allocation of memory. This does somewhat duplicate code in buffer.c. However scalars are a subset of dtypes, that are always native size and aligned and only a single element, and so the resulting code could be simplified to a simple switch statement.

Tests check that scalar format code matches the format code of an empty array with dtype matching the scalar.

dtypes intp and uintp are seem to be analogous to c pointers and according to PEP3118 should report a format code of 'p' or 'P' respectively. However arrays of this dtype use the integer format code. This patch follows that precedent.

eric-wieser · 2018-02-10T02:19:05Z

numpy/core/tests/test_scalarbuffer.py

+
+    # TODO: 'p', 'l', or 'q'?
+    # (np.intp, 'p'),
+    # (np.uintp, 'P'),


p won't be possible, because intp is just an alias for one of the other integer types - they're not distinguishable

eric-wieser

This doesn't report the endianness correctly. Note that as soon as you add endianness markers, you also have to use standard length codes, rather than the native ones (use the ones based on sizeof not the type name).

This also seems to reimplement stuff that I we already have for ndarray. Can you work out some way to share that code? There's a dtype to PEP3118 converter somewhere, ~~probably _internal.py~~

Also, tests for structured void scalars would be good.

vanossj · 2018-02-10T02:30:48Z

Is there a way create a scalar that isn't native endianess?

eric-wieser · 2018-02-10T02:35:08Z

Nice catch, maybe there isn't.

eric-wieser · 2018-02-10T02:38:07Z

numpy/core/src/multiarray/scalartypes.c.src

+
+    outcode = PyArray_DescrFromScalar(self);
+
+    if (flags & PyBUF_FORMAT && 3 <= sizeof view->internal ) {


Parens around (a & b) would be nice.

Are you sure sizeof(view->internal) does anything useful here?

Currently pointless, but the compiler will optimize that away.

I was trying to guard against the scenario were code strings are 5 bytes long, and then it works on the 64bit build but not the 32bit build. Having an explicit call to sizeof was meant to be a reminder to the programmer not to go past their limit. That might be unnecessary/ineffective defensive programming.

It would probably be better to do memset(&view->internal, '\0', sizeof view->internal). That might get the same idea across and zero out the everything instead of the first 3 bytes.

eric-wieser · 2018-02-10T02:39:44Z

numpy/core/tests/test_scalarbuffer.py

+    (np.float32, 4),
+    (np.float64, 8),
+    (np.complex64, 8),
+    (np.complex128, 16),


I think all of these would be better as np.dtype(type).itemsize, to avoid duplication. You could then use longdouble instead of float128 below, which is always defined.

would type().itemsize be better than np.dtype(type).itemsize? Not sure if there is an appreciable difference.

eric-wieser · 2018-02-10T02:40:53Z

numpy/core/tests/test_scalarbuffer.py

+            a = np.array([], dtype=scalar)
+            assert_(x.data.format == a.data.format)
+
+    @dec.skipif(sys.version_info.major < 3, "scalars do not implement buffer interface in Python 2")


Not sure this is true - can you use memoryview(x) to get hold of it?

Oddly, no.

Python 2.7.14 (default, Sep 23 2017, 22:06:14) [GCC 7.2.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> x = np.int16(2) >>> memoryview(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: cannot make memory view because object does not have the buffer interface >>> x.data <read-only buffer for 0x7ff23b84d150, size -1, offset 0 at 0x7ff23b778430> >>>

Hmm, that's pretty odd, given it works fine on arrays

vanossj · 2018-02-10T03:58:35Z

Turns out you can change endianess of scalars
np.int16(1).newbyteorder('S')

PEP3118 format string generation is in buffer.c. It uses in internal struct _tmp_string_t and allocates memory, something I was hoping to avoid.

I'll look into unifying the format string generation now that I know non native endianness and standard size are possibilities.

eric-wieser · 2018-02-10T04:04:46Z

That's not a correct example:

>>> np.int16(1).dtype.byteorder
'='
>>> np.int16(1).newbyteorder('>').dtype.byteorder
'='

In both cases the scalar has native endianness - you're swapping the bytes, but doing so by changing the memory, not the interpretation. .byteswap() does the same.

Note that on arrays .byteswap() swaps the bytes, but newbyteorder('S') changes the interpretation. On scalars, it seems that both mean byteswap().

eric-wieser · 2018-02-10T04:09:39Z

_buffer_format_string is the function to look at.

Something to watch out for - the layout in memory of np.unicode_ objects on windows is not the same as the layout of arrays of these objects - one is UCS2, the other is UCS4.

vanossj · 2018-02-23T14:30:23Z

@eric-wieser This should cover the fixes you requested. format strings from scalar and ndarrays are constructed with the same code

eric-wieser · 2018-02-27T08:08:59Z

numpy/core/src/multiarray/buffer.c

+    err = _buffer_format_string(descr, &fmt, obj, NULL, NULL);
+    if(PyArray_IsScalar(obj, Generic)) {
+        Py_DECREF(descr);
+    }


Would be tidier to just add Py_INCREF(descr) above when you obtain this, so you can do this unconditionally

eric-wieser · 2018-02-27T08:12:42Z

numpy/core/src/multiarray/buffer.c

+    if (descr->type_num == NPY_UNICODE) {
+        elsize >>= 1;
+    }
+#endif


Pleased to see that you did this

eric-wieser

This looks pretty good - just some minor comments

eric-wieser · 2018-02-27T08:15:35Z

numpy/core/tests/test_scalarbuffer.py

+    np.half, np.single, np.double, np.float_, np.longfloat,
+    np.float16, np.float32, np.float64, np.csingle, np.complex_,
+    np.clongfloat, np.complex64, np.complex128,
+]


This list is longer than it needs to be, as most of these are aliases. I'd recommend testing only:

[ np.bool, np.byte, np.short, np.intc, np.int_, np.longlong, np.ubyte, np.ushort, np.uintc, np.uint, npu.longlong, np.half, np.single, np.double, np.longdouble, np.csingle, np.cdouble, np.clongdouble, ]

eric-wieser · 2018-02-27T08:16:12Z

numpy/core/tests/test_scalarbuffer.py

+# platform dependant dtypes
+for dtype in ('float96', 'float128', 'complex192', 'complex256'):
+    if hasattr(np, dtype):
+        scalars.append(getattr(np, dtype))


These are also only aliases, so there's no need to include them over the np.longdouble above.

eric-wieser · 2018-02-27T08:17:33Z

numpy/core/tests/test_scalarbuffer.py

+
+
+class TestScalarPEP3118(object):
+    @dec.skipif(sys.version_info.major < 3, "scalars do not implement buffer interface in Python 2")


I'm curious - can you do

skip_if_no_buffer_interface = dec.skipif(sys.version_info.major < 3, "scalars do not implement buffer interface in Python 2")

and then apply @skip_if_no_buffer_interface to each one? Or does it need a new instance each time?

eric-wieser · 2018-02-27T08:18:34Z

numpy/core/tests/test_scalarbuffer.py

+    @dec.skipif(sys.version_info.major < 3, "scalars do not implement buffer interface in Python 2")
+    def test_void_scalar_structured_data(self):
+        dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))])
+        a = np.array([('Sarah', (8.0, 7.0)), ('John', (6.0, 7.0))], dtype=dt)


You might want to consider making this an 0d array, as np.array(('Sarah', (8.0, 7.0)), ('John', (6.0, 7.0)), dtype=dt)

Not sure I follow, how would a 0d array help?

I'm trying to get an np.void object to test. Ideally I would directly make an np.void scalar, but I don't see how.

np.array(('Sarah', (8.0, 7.0)), dtype=dt) ndim is 0, but type is ndarray

You can convert an 0d array to a scalar with [()], in place of [0]`.

You could then test that the buffer returned from array vs scalar has the same strides, shape, and format

I would prefer to just change to:

x = np.array(('ndarray_scalar', (1.2, 3.0)), dtype=dt)[()]

This way

x is definitely different than a, not just an element of that array.

Its as close to directly creating a structured np.void as I can get.

ndim/shape/strides/suboffsets are tested against explicit values per PEP3118, not just that they match corresponding buffer fields from ndarray a

As a trade off, the format string of the scalar is tested against ndarray format string. Its simpler then testing against the multitude of different, but still valid, PEP3118 format strings for this structure.

eric-wieser · 2018-02-27T08:22:54Z

numpy/core/tests/test_scalarbuffer.py

+
+        mv_a = memoryview(a)
+        assert_(mv_x.itemsize == mv_a.itemsize)
+        assert_(mv_x.format == mv_a.format)


Use assert_equal here and above to give a better error message

eric-wieser · 2018-02-27T08:24:28Z

numpy/core/tests/test_scalarbuffer.py

+        for scalar, code in scalars_set_code:
+            x = scalar()
+            mv_x = memoryview(x)
+            assert_(mv_x.format == code)


This test makes me uneasy due to the difference between "standard" and "native" letter codes, but I suppose my concern is equally valid about the memoryview of arrrays - so I'm happy to keep it until we find a problem with it in a separate PR.

eric-wieser · 2018-02-27T08:26:53Z

numpy/core/tests/test_scalarbuffer.py

+        assert_(isinstance(x, np.void))
+
+        mv_x = memoryview(x)
+        expected_size = 16 * np.unicode_().__array__().itemsize


How about np.dtype((np.unicode, 1)).itemsize? What you have here might change to 0 in future.

eric-wieser · 2018-02-27T08:28:24Z

numpy/core/tests/test_scalarbuffer.py

+
+        mv_x = memoryview(x)
+        expected_size = 16 * np.unicode_().__array__().itemsize
+        expected_size += 2 * np.float64().__array__().itemsize


np.dtype(np.float64).itemsize

eric-wieser · 2018-02-27T08:30:14Z

numpy/core/src/multiarray/buffer.c

-        if (descr->byteorder == '=' &&
-                _is_natively_aligned_at(descr, arr, *offset)) {
+        if(!PyArray_IsScalar(obj, Generic)) {
+            /* only check ndarrays, scalars are always natively aligned */


This would be clearer if you assigned the is_natively_aligne = 1 in the else, and moved the comment there (or swapped the if and else

eric-wieser · 2018-02-27T08:32:41Z

CI failure can be ignored, but will go away if you rebase on master

eric-wieser · 2018-02-28T07:54:53Z

numpy/core/src/multiarray/buffer.c

    }
    else {
        int is_standard_size = 1;
+        int is_natively_aligned = 1;


No point initializing this any more

eric-wieser · 2018-02-28T07:58:24Z

numpy/core/tests/test_scalarbuffer.py

+    np.csingle, np.cdouble, np.clongdouble,
+]
+
+scalars_set_code = [


Could do with a comment like

# PEP3118 format strings for native (standard alignment and byteorder) types

eric-wieser · 2018-02-28T07:59:41Z

numpy/core/src/multiarray/buffer.c

+{
+    _buffer_info_t *info = NULL;
+    PyArray_Descr *descr = NULL;
+    int elsize = 0;


No point initializing any of these either, IMO

eric-wieser

Two nits that my opinion might be worth ignoring on, and a request for a comment in the test.

eric-wieser · 2018-03-18T23:14:11Z

numpy/core/src/multiarray/buffer.c


-        if (descr->byteorder == '=' &&
-                _is_natively_aligned_at(descr, arr, *offset)) {
+        if(PyArray_IsScalar(obj, Generic)) {


super-nit: missing space after if

eric-wieser · 2018-03-18T23:16:42Z

numpy/core/tests/test_scalarbuffer.py

+    (np.half, 'e'),
+    (np.single, 'f'),
+    (np.double, 'd'),
+    (np.float_, 'd'),


These two are aliases, so there's no point having both - just use np.double, and ditch np.float_

eric-wieser · 2018-03-18T23:16:55Z

numpy/core/tests/test_scalarbuffer.py

+    (np.single, 'f'),
+    (np.double, 'd'),
+    (np.float_, 'd'),
+    (np.longfloat, 'g'),


This would normally be spelt np.longdouble

eric-wieser · 2018-03-18T23:17:21Z

numpy/core/tests/test_scalarbuffer.py

+    (np.longfloat, 'g'),
+    (np.csingle, 'Zf'),
+    (np.complex_, 'Zd'),
+    (np.clongfloat, 'Zg'),


np.cdouble and np.clongdouble would be more consistent here.

eric-wieser · 2018-03-18T23:18:30Z

numpy/core/tests/test_scalarbuffer.py

+]
+
+# PEP3118 format strings for native (standard alignment and byteorder) types
+scalars_set_code = [


scalars_and_codes would be a better name

eric-wieser · 2018-03-18T23:18:55Z

numpy/core/tests/test_scalarbuffer.py

+from numpy.testing import run_module_suite, assert_, assert_equal, dec
+
+# types
+scalars = [


I think you can ditch this, and just use

for scalar, _ in scalar_and_codes: ...

in your tests below

charris · 2018-03-19T19:01:31Z

Updated.

eric-wieser · 2018-03-20T00:32:12Z

numpy/core/tests/test_scalarbuffer.py

+        assert_(isinstance(x, np.void))
+        mv_x = memoryview(x)
+        expected_size = 16 * np.dtype((np.unicode_, 1)).itemsize
+        expected_size += 2 * np.dtype((np.float64, 1)).itemsize


Deliberately not using expected_size = dt.itemsize?

Yes. I'd like to test against an explicit values, rather than trusting that dt.itemsize is correct.
I do use dtype.itemsize of unicode_ and float64 when calculating expected_size because of platform dependant sizes of unicode_. But this is still one step more explicit than testing scalar void.itemsize == dtype void.itemsize

The assertion memoryview of void itemsize, not scalar void itemsize - arguably all we care about is that the the itemsize of a memoryview and dtype match.

I don't feel too strongly about this though.

because of platform dependant sizes of unicode_

Note that unicode as a dtype is always 4 byte-characters - you're thinking of unicode_ scalars, which are sometimes 2-byte, due to subclassing the builtin unicode.

eric-wieser · 2018-03-25T05:29:59Z

Thanks for a great first contribution @vanossj!

vanossj added 3 commits February 9, 2018 16:58

BUG: Provide correct format in Py_buffer for scalars

fe33235

no scalar buffer interface to test in python 2

dcf1569

Merge branch 'master' into fix-pep3118-scalar-types

2c816bd

eric-wieser reviewed Feb 10, 2018

View reviewed changes

eric-wieser requested changes Feb 10, 2018

View reviewed changes

eric-wieser reviewed Feb 10, 2018

View reviewed changes

charris added 00 - Bug component: numpy._core labels Feb 16, 2018

vanossj added 2 commits February 17, 2018 11:49

unify scalar and ndarray pep3118 format string generation

434a3ba

c90 compliance, no mixed declarations

a09031c

eric-wieser reviewed Feb 27, 2018

View reviewed changes

clean up test, clarify ref count

073266f

eric-wieser reviewed Feb 28, 2018

View reviewed changes

eric-wieser approved these changes Feb 28, 2018

View reviewed changes

save two clock cycles and add a comment

52d2bcd

eric-wieser reviewed Mar 18, 2018

View reviewed changes

Merge branch 'master' into fix-pep3118-scalar-types

e9fa498

vanossj added 2 commits March 19, 2018 10:34

code style fix

51cad4e

tested types consolidation

f81d641

eric-wieser reviewed Mar 20, 2018

View reviewed changes

eric-wieser added this to the 1.15.0 release milestone Mar 20, 2018

eric-wieser merged commit e4d678a into numpy:master Mar 25, 2018

skrah mentioned this pull request May 20, 2018

loading uint32 from numpy xnd-project/libxnd#19

Closed

mattip mentioned this pull request Aug 9, 2018

BUG: Fix pickle and memoryview for datetime64, timedelta64 scalars #11694

Merged

charris mentioned this pull request Aug 19, 2018

BUG: Fix pickle and memoryview for datetime64, timedelta64 scalars #11785

Merged

vanossj deleted the fix-pep3118-scalar-types branch September 8, 2018 19:23

pierreglaser mentioned this pull request Oct 22, 2018

BUG memory leak using ProcessPoolExecutor #12122

Closed

brian2brian mentioned this pull request Sep 11, 2020

__reduce__ no longer works on user defined types #17294

Closed


		outcode = PyArray_DescrFromScalar(self);

		if (flags & PyBUF_FORMAT && 3 <= sizeof view->internal ) {



		class TestScalarPEP3118(object):
		@dec.skipif(sys.version_info.major < 3, "scalars do not implement buffer interface in Python 2")

Uh oh!

Conversation

vanossj commented Feb 9, 2018

Uh oh!

eric-wieser Feb 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanossj commented Feb 10, 2018

Uh oh!

eric-wieser commented Feb 10, 2018

Uh oh!

eric-wieser Feb 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanossj commented Feb 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser commented Feb 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser commented Feb 10, 2018

Uh oh!

vanossj commented Feb 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Feb 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Feb 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

eric-wieser Feb 10, 2018 •

edited

Loading

eric-wieser left a comment •

edited

Loading

eric-wieser Feb 10, 2018 •

edited

Loading

vanossj commented Feb 10, 2018 •

edited

Loading

eric-wieser commented Feb 10, 2018 •

edited

Loading

eric-wieser Feb 27, 2018 •

edited

Loading