-
-
Notifications
You must be signed in to change notification settings - Fork 12k
API: Enforce one copy for __array__ when copy=True
#26215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
592ae0a to
4f629bf
Compare
f7ff1b7 to
d654d93
Compare
d654d93 to
8c9f073
Compare
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two comments for now. But yeah, we may need the copy indicator unfortunately (maybe call it was_copied, though? It has to be passed by reference anyway, so it is obvious for the caller that it is an output).
It would be good to have a test that we don't pass copy=True if someone calls np.array([array_like]) (i.e. there is a list).
numpy/_core/tests/test_multiarray.py
Outdated
| else: | ||
| return base_arr | ||
|
|
||
| process = psutil.Process() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just do this without psutils: Just store copy you return, then compare do an identity/base check. Easier, and no need for psutil magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right - I removed psutils and stored copies/base arrays instead.
| coercion_cache_obj ***coercion_cache_tail_ptr, | ||
| PyArray_DTypeMeta *fixed_DType, enum _dtype_discovery_flags *flags, | ||
| int copy) | ||
| int copy, int *copy_indicator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already has the _dtype_discovery_flags I think it makes sense to use those "internally" to this part of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I merged copy_indicator into _dtype_discovery_flags.
8c9f073 to
973ce28
Compare
@seberg Sure - renamed!
Sure! I changed implementation so that sequences don't pass |
c1a3d15 to
854340f
Compare
854340f to
553c473
Compare
|
I did one more rebase to solve conflicts. |
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One request for clarification inline and one more general suggestion.
I think was_copied and COPY_WAS_CREATED should be renamed was_copied_by__array__ and COPY_WAS_CREATED_BY__ARRAY__ to emphasize that this is specifically to handle subtleties around __array__ implementations. Alternative names or just comments around where they are used explaining that it has something to do with __array__ would help readers trying to understand why we have both an ensure copy flag and a flag tracking if a copy already happened.
|
Does this need a backport? |
No this change is targeted for 2.1 |
These names are clearer and more specific - done! |
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I suppose we could use the argument in PyArray_DiscoverDTypeAndShape for other things in principle, although I am not sure there is anything concrete, so let's use the name, it is much nicer everywhere else (and all other places will go away when the deprecation goes away).
The ENSURE_COPY flag setting logic looks incorrect, though (I admit, I didn't try it).
| /* TODO: As of NumPy 2.0 this path is only reachable by C-API. */ | ||
| Py_SETREF(new, PyArray_NewCopy((PyArrayObject *)new, NPY_KEEPORDER)); | ||
| if (was_copied_by__array__ != NULL && copy == 1 && | ||
| must_copy_but_copy_kwarg_unimplemented == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a comment to remove the was_copied_by__array__ argument again here, but it is probably clear enough (when it happens).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/src/multiarray/ctors.c
Outdated
| assert(cache->converted_obj == op); | ||
| arr = (PyArrayObject *)(cache->arr_or_sequence); | ||
| /* we may need to cast or assert flags (e.g. copy) */ | ||
| if (was_copied_by__array__ == 1 && flags & NPY_ARRAY_ENSURECOPY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (was_copied_by__array__ == 1 && flags & NPY_ARRAY_ENSURECOPY) { | |
| if (was_copied_by__array__ == 1) { |
You don't need to test this, it is OK to unset the flag if it isn't set. (to me that is clearer, but maybe it's me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/src/multiarray/ctors.c
Outdated
| /* we may need to cast or assert flags (e.g. copy) */ | ||
| if (was_copied_by__array__ == 1 && flags & NPY_ARRAY_ENSURECOPY) { | ||
| flags = flags & ~NPY_ARRAY_ENSURECOPY; | ||
| flags = flags | NPY_ARRAY_ENSURENOCOPY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong, so please add a test:
np.asarray(array_like, copy=True, order="F")
should just make two copies. You probably want to make sure that users honors the dtype that was passed. That might be nice, but if it is not very clean to test for, I am not sure we should worry much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, a copy still needs to be made to convert to order="F". Then I shouldn't add ENSURENOCOPY flag but only remove ENSURECOPY one.
About honoring dtype: I can check if arr in cache has the same dtype as requested dtype variable. I wonder if it's sufficient for comparing dtypes in C-API:
PyArray_TYPE(arr) == dtype->type_num
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, unfortunately checking that is not really trivial (the only way to do it, is the way that the function uses later to see if it has to make a copy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized, it is even more complicated, for subarray dtypes (maybe those should just never be passed on). But that is probably not really new, although more pronounced by nudging downstream to actually implement dtype.
Maybe the solution is to just unset the dtype when a copy was made, although unfortunately that doesn't help us in the path when no copy was made and dtype is used by the __array__ implementor.
(Eplenation: if you write dtype=(4,)i you get back an array with an additional dimension and dtype=i. But if __array__ does that, then you end up doing another cast!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's maybe skip checking dtype after __array__ call?
What we currently have is that:
np.array(array_like, dtype=new_dtype, copy=False)errors when a dtype wasn't honored in __array__ implementation (which is nice I think) and
np.arrray(array_like, dtype=new_dtype)get's another copy if dtype wasn't honored, same as order="F" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's maybe skip checking dtype after array call?
For here, that is perfectly fine to me, it is just a mild performance nudge for downstream after all.
However, maybe we should make a new issue. I am slightly worried that downstream until now always just ignored dtype=, and now we will get new bugs with subarray dtypes.
OTOH, I will also say that, I don't expect anyone to explicitly run into it, so in a sense, it is an unrelated bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated flags logic here and added a test with:
np.asarray(array_like, copy=True, order="F")e6a3ce5 to
3549902
Compare
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK to merge now.
I don't think this is going to cause much additional churn over adding the copy keyword in the first place, hopefully downstream did the right thing.
__array__ when copy=True__array__ when copy=True
|
Ok, merging it now. |
Addresses: #26208
Hi @seberg @ngoldbaum,
This PR enables passing
copy=Truetodef __array__(...)and prevents NumPy from making another copy once it can assume thatcopy=Truewas successfully passed and the copy was made by the implementor.Following #26208 (comment) I implemented it with a
copy_indicatorthat is set to1, which later is used to decide whetherNPY_ARRAY_ENSURECOPYflag can be removed.I added a test
test___array__copy_oncethat checks if only one copy is made withnp.array(my_arr, copy=True). Withoutcopy_indicatorthe test fails asnp.array(my_arr, copy=True)doubles delta RSS (I usedpsutilpackage and added it as another test dependency).I assumed that we want to prevent making implicit copies with
__array__. Therefore the user is also responsible for properly implementingdtypeargument.