BUG: fix `np.vectorize` for object dtype #28624

francois-rozet · 2025-04-02T10:18:00Z

seberg · 2025-04-02T11:10:51Z

Thanks, but this doesn't seem right. If your element-wise function returns arrays, then yes, it is correct to wrap them into another level of the NumPy array and repeat it based on the input shape.

I agree it is awkward if you try to spell this:

arr = np.empty((), dtype=object)
arr[()] = np.arange(3)

f(arr)  # arr is a 0-D array containing an object array

And the fact that this doesn't return a 0-D array itself is even more awkward (I am considering doing something about that, if just out=... to prevent it for now).

I agree dtype=object is unfortunately awkward in some cases. The reason is a fundamental difficulty though. NumPy must make a choice for whether np.arange(3) is a scalar or an array input (for casting), and that decision is that it is an array (to be cast).
That won't be right in all contexts, but unless you write a custom dtype that is what you have for dtype=object...

In the end, np.frompyfunc does everything right, I expect. You can also compare with np.positive(x, dtype=object) (of course that makes it harder to return arrays for the element-wise function).

If vectorize diverges from those, that would be wrong (I am not quite sure if it does).

If you want non-object-arr input to be viewed as a scalar, you need a bigger hammer somewhere.
That is plausible in principle but it would have to be indicated on the DType. A custom DType could actually do that but it can't be the default for the object dtype.

So basically, it might work with a dtype=np.dtypes.ObjectDTypes(non_obj_arrays_are_scalars=True), but then the ufunc/frompyfunc would still have to add special logic to use this for the input.
Or a custom DType (or a build in additional one, maybe a subclass of np.dtypes.ObjectDType), then you would still have to say that the input dtype (like otypes= is not object, but that new dtype.

For all of this we have the machinery available in principle (some bugs may need fixing), but of course it isn't a very small project.

francois-rozet · 2025-04-02T12:06:01Z

I think there is a misunderstanding here. My issue is not about input arrays. It is about outputs. Let's say my function returns a list (or any valid array-like), and I wish to keep it as a list as it is very large, and I don't want to duplicate its content. Then, depending on the shape of the input, the vectorized output can be completely different.

>>> f = lambda x: [1, 2, 3]
>>> v = np.vectorize(f, otypes=[object])
>>> v(None)
array([1, 2, 3], dtype=object)
>>> v([None])
array([list([1, 2, 3])], dtype=object)
>>> v([[None]])
array([[list([1, 2, 3])]], dtype=object)

I believe the current behavior when the input has at-least one axis is correct. But the behavior for "scalar" inputs is inconsistent, with itself and np.frompyfunc.

What my PR does is fixing the behavior when the input is a "scalar" by adding a new axis and removing it.

>>> f = lambda x: [1, 2, 3]
>>> v = np.vectorize(f, otypes=[object])
>>> v(None)
array(list([1, 2, 3]), dtype=object)

An alternative, to match np.frompyfunc more closely, would be to index the new axis instead of squeezing it, which would lead to:

>>> f = lambda x: [1, 2, 3]
>>> v = np.vectorize(f, otypes=[object])
>>> v(None)
[1, 2, 3]

seberg · 2025-04-02T12:24:39Z

Sorry, yeah, I misjudged! I somehow read it as changing more. This diverges from frompyfunc which I agree is wrong. I suppose we could be OK with always returning an array.

I just merged #28576, maybe you can have a look? That allows using out=... to ensure an array return. That in turns means you don't need the asanyarray() call there at all (you can just call res.astype() instead). Admittedly a proper output_dtype= would be even nicer but we don't have that yet...

Then we still need to decide whether we want to return arr[()] (i.e. the scalar) at the end, but I suspect you may not even care about that part.

I do wonder if there may be cases where the old behavior gave something sane, so maybe a release note is good (maybe "change") see doc/release/upcoming_changes/README.md for how to write one.

numpy/lib/tests/test_function_base.py

francois-rozet · 2025-04-02T14:34:58Z

After giving it more thoughts, I think there are other issues with the current behavior of np.vectorize. I think array-like outputs should never be implicitly cast to arrays. Why? Because it is buggy and inconsistent.

>>> np.vectorize(lambda x: [1, 2, 3])(None)
array([1, 2, 3])
>>> np.vectorize(lambda x: [1, 2, 3])([None])
...
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list'

>>> np.vectorize(lambda x: [1, 2, object()])(None)
array([1, 2, <object object at 0x7f60b67d66d0>], dtype=object)
>>> np.vectorize(lambda x: [1, 2, object()])([None])
array([list([1, 2, <object object at 0x7f60b67d66e0>])], dtype=object)

>>> np.vectorize(lambda x: [1, 2, [3, 4]])(None)
...
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

Instead, I think that np.vectorize should detect types with

>>> def get_dtype(x):
...     return x.dtype if isinstance(x, np.ndarray) else np.dtype(type(x))
>>> get_dtype(1)
dtype('int64')
>>> get_dtype(np.array(1.0))
dtype('float64')
>>> get_dtype([1, 2])
dtype('O')

seberg · 2025-04-02T14:54:45Z

@francois-rozet but we are talking about two distinct problems maybe? In this PR, you are passing otypes=[object] and for that, the behavior seems perfectly well defined. While in the new examples now, you are not passing it.

I agree that if you don't pass otypes things don't make much sense unless the return value is indeed a scalar. That is similar to:

np.array([None, [1, 2, 3]])

which raises an error now, although:

np.array([None, [1, 2, 3]], dtype=object)

is -- at least for now -- allowed. Maybe not the best analogy, but the point is that we can put anything into an object dtype, but for everything else, the result should be a scalar (or at least 0-D).

So, I think it would make sense to error if it isn't 0-D, since arguably. Either the result can be 0-D (the user can easily convert it in the function), or the extra dimensions have a meaning, and the right thing would be to use signature= (even if vectorize with signature= isn't ideal).

I think I would be happy to deprecate your examples, but we should try to do deprecate it slowly, I suspect. (It seems easy to end up with a single element but not 0-D array for some odd reason.)

francois-rozet · 2025-04-02T14:56:49Z

I just merged #28576, maybe you can have a look? That allows using out=... to ensure an array return.

I am not sure to understand where I can set this. In the call to the ufunc or in its creation?

seberg · 2025-04-02T14:58:13Z

I am not sure to understand where I can set this. In the call to the ufunc or in its creation?

In the call to the ufunc, you can add out=... now.

francois-rozet · 2025-04-02T15:00:21Z

but we are talking about two distinct problems maybe?

Yes you are right, the type detection and casting are not the same problems. But I figured they are related. Should I remove the fix in the last commit, which ensures that list and tuple in the output are not cast to arrays implicitly?

Note that I do not necessarily agree with you that the output should be a "scalar". When the output is already an array, I think it makes sense for vectorize to stack the outputs into a single array.

francois-rozet · 2025-04-02T15:12:16Z

In the call to the ufunc, you can add out=... now.

Is my last commit what you meant?

seberg · 2025-04-02T15:13:41Z

I think it makes sense for vectorize to stack the outputs into a single array.

Yeah, fair. For a normal ufunc that is what signature= should indicate (that the output shape is different). But I would agree with not messing with things here unnecessarily.
I suppose the point is whether you get into a confusing mess due to that otype object/non-object differing here in obscure ways.

Should I remove the fix in the last commit, which ensures that list and tuple in the output are not cast to arrays implicitly?

Not sure I follow, so please do what you think is best for now. I'll have to think about it a bit more later.

Is my last commit what you meant?

Yes, I think we may be able to change some of the following code a bit with that (things are already arrays), but not sure it is needed.

numpy/lib/_function_base_impl.py

francois-rozet · 2025-04-02T15:42:37Z

To summarize the modifications:

When otype=object, np.ndarray outputs are not cast to dtype=object anymore.
When otypes is not set, list and tuple outputs are not cast to arrays anymore.

seberg · 2025-04-02T15:50:22Z

Thanks for the summary, now I see what you meant above. Maybe it is best to split the two, my first gut feeling is that change 1 is quite a bit safer than 2.
(but again, we are iterating quickly, and I am only paying half attention right now)

francois-rozet · 2025-04-03T08:07:01Z

Sorry, I made a mistake. The two changes are not separable.

The original issue was that for scalar inputs the output of the ufunc was not wrapped into an object array, resulting in inconsistent casts for array-like outputs. This was resolved with out=....

Now, the outputs of the ufunc are always object arrays. If these object arrays contain Python objects including list and tuple they cannot be cast to anything but dtype=object.

However, vectorize currently sets the otype of lists and tuples as the type of their content, instead of object. For example, [1, 2, 3] has otype int. Then, when it attempts to cast the output of the ufunc, it crashes.

In summary, without fixing the otype detection, vectorize does not work with any function that returns lists or tuples.

seberg · 2025-04-03T08:55:44Z

In summary, without fixing the otype detection, vectorize does not work with any function that returns lists or tuples.

Right, but to me it seems possibly OK to force users to set otype=[object] here? We still have two distinct things, if otype is set, we have a clear fix for scalar results.
If otype isn't set you are making certain usage patterns possible that were not before.

I appreciate that you want to make things more useful here, but I am not sure if that is the right way to make it more useful yet. E.g. signature= should already provide a way for list/tuple returns to be incorporated into a numeric array in principle.
(Or I could also say, that I am very comfortable with asking users to rely on otypes= more, because it is safer anyway.)

francois-rozet · 2025-04-03T09:14:14Z

Or I could also say, that I am very comfortable with asking users to rely on otypes= more, because it is safer anyway.

So should we let the code crash? Or should we add an assert warning the user to use otypes if the output is a list?

seberg · 2025-04-03T10:35:05Z

If we know it's definitely going to crash while figuring out otypes, then raising a more specific error seems good. But maybe it is still easier to split the two?

francois-rozet · 2025-04-03T11:57:26Z

_get_ufunc_and_otypes will crash if the output is a nested list/tuple with uneven lengths. If the output is a list of NumPy-compatible types (int, float, str, ...), _get_ufunc_and_otypes will detect otypes incorrectly and vectorize will crash when casting later on.

seberg · 2025-04-04T08:30:59Z

But by "crash" you mean, "raises a hard to understand exception". And if I accept that exception (which for now I do), then the only problem is the "hard to understand" (especially w.r.t. to how to work around).

Which is why I like to split it out, so we can work out there whether we want to improve the error (so it tends to trigger immediately, always and informs of otypes=[object] for example).
(I may be slow to respond for a bit, but I think the first part just seems much clearer since I think in all cases it fixes what is clearly wrong.)

EDIT: We have a hidden function (that we should unhide in parts) np._core.multiarray_umath._discover_array_parameters, since we never added an ndmax parameter to np.asarray() which may be useful there. We can add it without any thoughts there (it's in C, but it is extremely easy to add there). I am not actually sure how that parameter would behave for a tuple (go to object or raise an error unless dtype=object).

francois-rozet · 2025-04-13T15:43:06Z

@seberg I have reverted the otypes detection to the old behavior. The PR now only fixes the incorrect casting when the input is a "scalar". All the bugs reported in #28624 (comment) are therefore still present.

seberg

Thanks for reverting and follow ups! I think this is now indeed always a bug fix, so let's put it in.

We can discuss the other changes on a new PR.

seberg · 2025-05-26T07:27:39Z

Scratch that. Just fix the tests!

You never supported vectorize. Nothing significant there changed at all. The only change is that we stopped calling asarray() rather than asanyarray() (i.e. something you like) and because of that the error changed.

And it only changed if otypes= isn't used anyway.

EDIT: Sorry got the wrong window, this was for astropy

Fix vectorize for object otype

58dda05

github-actions bot added the 00 - Bug label Apr 2, 2025

jorenham changed the title ~~BUG: fix np.vectorize for object otype~~ BUG: fix np.vectorize for object dtype Apr 2, 2025

seberg mentioned this pull request Apr 2, 2025

API,ENH: Allow forcing an array result in ufuncs #28576

Merged

seberg reviewed Apr 2, 2025

View reviewed changes

numpy/lib/tests/test_function_base.py Show resolved Hide resolved

Fix output type detection

3c1fb45

Replace expand/squeeze with out=...

8faa9a7

seberg reviewed Apr 2, 2025

View reviewed changes

numpy/lib/_function_base_impl.py Outdated Show resolved Hide resolved

Cast inputs correctly

ceb8146

melissawm added this to NumPy first-time contributor PRs Apr 2, 2025

melissawm moved this to Pending authors' response in NumPy first-time contributor PRs Apr 2, 2025

Revert otypes detection (to be done in another PR)

1689600

seberg approved these changes Apr 22, 2025

View reviewed changes

seberg merged commit 461d1c0 into numpy:main Apr 22, 2025
71 of 72 checks passed

github-project-automation bot moved this from Pending authors' response to Completed in NumPy first-time contributor PRs Apr 22, 2025

This was referenced May 24, 2025

numpy ufunc change astropy/astropy#18045

Closed

BUG: regression in np.vectorize on numpy 2.3.0dev #29049

Closed

seberg mentioned this pull request Jun 12, 2025

BUG: The output of np.vectorize for int32 inputs is promoted to int64 #29189

Closed

Uh oh!

BUG: fix np.vectorize for object dtype #28624

BUG: fix np.vectorize for object dtype #28624

Uh oh!

Conversation

francois-rozet commented Apr 2, 2025

Uh oh!

seberg commented Apr 2, 2025

Uh oh!

francois-rozet commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Apr 2, 2025

Uh oh!

Uh oh!

francois-rozet commented Apr 2, 2025

Uh oh!

seberg commented Apr 2, 2025

Uh oh!

francois-rozet commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Apr 2, 2025

Uh oh!

francois-rozet commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

francois-rozet commented Apr 2, 2025

Uh oh!

seberg commented Apr 2, 2025

Uh oh!

Uh oh!

francois-rozet commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Apr 2, 2025

Uh oh!

francois-rozet commented Apr 3, 2025

Uh oh!

seberg commented Apr 3, 2025

Uh oh!

francois-rozet commented Apr 3, 2025

Uh oh!

seberg commented Apr 3, 2025

Uh oh!

francois-rozet commented Apr 3, 2025

Uh oh!

seberg commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

francois-rozet commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seberg commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BUG: fix `np.vectorize` for object dtype #28624

BUG: fix `np.vectorize` for object dtype #28624

francois-rozet commented Apr 2, 2025 •

edited

Loading

francois-rozet commented Apr 2, 2025 •

edited

Loading

francois-rozet commented Apr 2, 2025 •

edited

Loading

francois-rozet commented Apr 2, 2025 •

edited

Loading

seberg commented Apr 4, 2025 •

edited

Loading

francois-rozet commented Apr 13, 2025 •

edited

Loading

seberg commented May 26, 2025 •

edited

Loading