-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Description
Proposed new feature or change:
Here is a minimal example:
>>> x = np.arange(10)[::2] #Create a non-contiguous array
>>> x.dtype
dtype('int64') # This could be int32, in which case adjust the following lines with float32 and int16 as appropriate
>>> x.view(np.float64)
array([0.e+000, 1.e-323, 2.e-323, 3.e-323, 4.e-323]) # So far so good
# This makes perfect sense:
>>> x.view(np.int32)
ValueError: To change to a dtype of a different size, the array must be C-contiguous
# But this does not
>>> x[:, None].view(np.int32)
ValueError: To change to a dtype of a different size, the array must be C-contiguous
When the last dimension is 1, and the new dtype's itemsize is an even divisor of the original's, there is nothing stopping us from creating the view meaningfully.
I can understand the np.ndarray constructor and np.ndarray.view being user-facing and therefore trying to be a bit conservative. However, something like np.lib.stride_tricks.as_strided certainly should have the option of bypassing the dtype check: if I'm a consenting adult that's able to decide how to mess up my memory layout, I should be allowed to do it will the full force of all the tools available to me. To that end, I made a very naive attempt at adding an offset and dtype parameter to as_strided: madphysicist@e422186. This has the same failure as the example above.
The issue came up because of my work on #20694, where I am trying to add slicing to char arrays. It's actually an ideal example of how switching dtypes can be very useful. Additionally, it shows a workaround: I drill down to the base-most array, and either get a C-contiguous block of numpy-allocated memory, or construct a view of the buffer that's contiguous and large enough for the operations I need. The key is that even if the underlying memory is not contiguous for some reason, I am doing extensive checks to make sure I don't overrun the parts I know for sure I have access to: I just need to jump through hoops to convince np.ndarray that I know what I'm doing, but I very strongly feel that I shouldn't have to.
The workaround in #20694 is functional, but quite hacky. A better solution would be to allow as_strided, and potentially ndarray to just do what you ask them. At the very least, an exception to the dtype check should happen when the last dimension is 1 and the source array's dtype is a multiple of the target dtype's itemsize. But I would go further: if I can call ndarray and set an arbitrary offset, which is way more dangerous, why not allow unchecked dtype reassignments?
I propose a solution like this:
- Split
np.ndarrayinto a "checked" and "unchecked" version (I have no clear idea how the code works right now, so this is just a concept) as_stridedbecomes a wrapper for the unchecked version: user has infinite power- The actual function
np.ndarrayand its wrappers/siblings stay exactly the same, using the checked version.
I'd be happy to work on this issue if there is any interest in it.