Skip to content

ENH? BUG? Unexpeced restrictions in behavior of view creating when switching dtypes #20705

@madphysicist

Description

@madphysicist

Proposed new feature or change:

Here is a minimal example:

>>> x = np.arange(10)[::2]  #Create a non-contiguous array
>>> x.dtype
dtype('int64')   # This could be int32, in which case adjust the following lines with float32 and int16 as appropriate
>>> x.view(np.float64)
array([0.e+000, 1.e-323, 2.e-323, 3.e-323, 4.e-323])   # So far so good

# This makes perfect sense:
>>> x.view(np.int32)
ValueError: To change to a dtype of a different size, the array must be C-contiguous

# But this does not
>>> x[:, None].view(np.int32)
ValueError: To change to a dtype of a different size, the array must be C-contiguous

When the last dimension is 1, and the new dtype's itemsize is an even divisor of the original's, there is nothing stopping us from creating the view meaningfully.

I can understand the np.ndarray constructor and np.ndarray.view being user-facing and therefore trying to be a bit conservative. However, something like np.lib.stride_tricks.as_strided certainly should have the option of bypassing the dtype check: if I'm a consenting adult that's able to decide how to mess up my memory layout, I should be allowed to do it will the full force of all the tools available to me. To that end, I made a very naive attempt at adding an offset and dtype parameter to as_strided: madphysicist@e422186. This has the same failure as the example above.

The issue came up because of my work on #20694, where I am trying to add slicing to char arrays. It's actually an ideal example of how switching dtypes can be very useful. Additionally, it shows a workaround: I drill down to the base-most array, and either get a C-contiguous block of numpy-allocated memory, or construct a view of the buffer that's contiguous and large enough for the operations I need. The key is that even if the underlying memory is not contiguous for some reason, I am doing extensive checks to make sure I don't overrun the parts I know for sure I have access to: I just need to jump through hoops to convince np.ndarray that I know what I'm doing, but I very strongly feel that I shouldn't have to.

The workaround in #20694 is functional, but quite hacky. A better solution would be to allow as_strided, and potentially ndarray to just do what you ask them. At the very least, an exception to the dtype check should happen when the last dimension is 1 and the source array's dtype is a multiple of the target dtype's itemsize. But I would go further: if I can call ndarray and set an arbitrary offset, which is way more dangerous, why not allow unchecked dtype reassignments?

I propose a solution like this:

  • Split np.ndarray into a "checked" and "unchecked" version (I have no clear idea how the code works right now, so this is just a concept)
  • as_strided becomes a wrapper for the unchecked version: user has infinite power
  • The actual function np.ndarray and its wrappers/siblings stay exactly the same, using the checked version.

I'd be happy to work on this issue if there is any interest in it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions