BUG: Remove unnecessary copying and casting from out array in choose#28206
BUG: Remove unnecessary copying and casting from out array in choose#28206seberg merged 29 commits intonumpy:mainfrom
Conversation
|
Neat! It's a good sign the tests pass. I don't know offhand if we have great test coverage for |
|
@ngoldbaum Added the test, thanks! It fails in |
ngoldbaum
left a comment
There was a problem hiding this comment.
Just one small nitpick. I thought carefully about reference counts and didn't see any issues.
@seberg can you give this a once-over once the failure case for CopyAnyInto gets handled and hit the merge button assuming you don't see any additional issues?
| if (out != NULL && out != obj) { | ||
| Py_INCREF(out); | ||
| PyArray_ResolveWritebackIfCopy(obj); | ||
| PyArray_CopyAnyInto(out, obj); |
There was a problem hiding this comment.
This can fail so you need to check the failure case. The same bug existed in the code you're updating - ResolveWritebackIfCopy also returns -1 on failure.
There was a problem hiding this comment.
I added a simple failure check; hopefully that is sufficient. Thank you!
| if (out != NULL && out != obj) { | ||
| Py_INCREF(out); | ||
| PyArray_ResolveWritebackIfCopy(obj); | ||
| if (PyArray_CopyAnyInto(out, obj) != 0) { |
There was a problem hiding this comment.
| if (PyArray_CopyAnyInto(out, obj) != 0) { | |
| if (PyArray_CopyAnyInto(out, obj) < 0) { |
Slightly more defensive in case another non-error return code is added in the future (almost certainly won't happen but a good defensive programming habit to get into).
There was a problem hiding this comment.
That makes sense; I made this change, thanks.
|
Sorry, it seems some unrelated changes creeped in so I removed them. This new approach should be ready for review now. Thank you |
|
Maybe also another test to make sure the unnecessary copy the old approach triggered doesn't happen? |
|
Not entirely sure if there's a convenient way to test for that in pytest? In either approach, there would probably be no data indicating a copy was made, after the array is overwritten. |
|
Sorry about the accidental changes--I'll be more careful with Git |
seberg
left a comment
There was a problem hiding this comment.
Thanks a lot, yes I think this is correct now! I also think we can simplify things more, by just adding a must_copy flag and using your new branch then unconditionally.
| Py_INCREF(dtype); | ||
| obj = (PyArrayObject *)PyArray_FromArray(out, dtype, flags); | ||
| if (!PyArray_EquivTypes(dtype, PyArray_DESCR(out))) { | ||
| obj = (PyArrayObject *)PyArray_NewFromDescr(Py_TYPE(out), |
There was a problem hiding this comment.
I don't think there is any reason to use Py_TYPE(out) here, just pass NULL and get a base-class array, I think.
(There is a subtle issue here about that, I think, in that ufuncs would ensure they call __array_wrap__ when done and we don't. But this a completely different issue!)
There was a problem hiding this comment.
Makes sense thanks, but looking at the doc for PyArray_NewFromDescr, should we pass &PyArray_Type instead? NULL causes a ValueError.
There was a problem hiding this comment.
Yes, of course. Somehow thought it accepted NULL as PyArray_Type there.
There was a problem hiding this comment.
I made the change but it seems to fail a regression test test_attributes that tests for an identical info string assigned to the out array. Would you like me to push?
There was a problem hiding this comment.
Maybe push, but may not look at it today. I really don't see why it matters since obj is never returned to the user! (And not understanding that worries me more than the change ;)).
There was a problem hiding this comment.
No problem at all, thanks for the feedback. Yes, it is a bit confusing...
There was a problem hiding this comment.
Oops, I think I was editing the wrong line! Things work fine now, pushed. Thanks.
| PyArray_ResolveWritebackIfCopy(obj); | ||
| int success; | ||
| if (is_newarr) { | ||
| success = PyArray_CopyAnyInto(out, obj); |
There was a problem hiding this comment.
In theory, you could set the base-object to your allocation above and set to use WritebackIfCopy even with your custom allocation. But, TBH, I think a direct copy is actually clearer (frankly, I am not sure if we couldn't use it in half the places ResolveWritebackIfCopy is used ;)).
There was a problem hiding this comment.
I see--good to know!
numpy/_core/tests/test_multiarray.py
Outdated
|
|
||
| # gh-28206 test new StringDType casting | ||
| x = np.array(['a', 'b', 'c'], dtype=np.dtypes.StringDType()) | ||
| y = np.array(['abc', 'def', 'ghi'], dtype=np.dtypes.StringDType()) |
There was a problem hiding this comment.
Subtle thing, but make sure to include a long string (otherwise you can't trigger the path which might have crashed).
There was a problem hiding this comment.
Actually that still seems to crash, thanks for noticing. I think PyArray_ConvertToCommonType (it seems to be a backward compatibility function) does not cast properly to a common scenario for StringDType, and EquivTypes passes the dtype unchanged--so, we end up with the dtype only for the first array. Should we do a better promotion of some sort if the types are equivalent? Or perhaps, for now, we can ensure they are identical?
| } | ||
| if (success < 0) { | ||
| Py_DECREF(obj); | ||
| goto fail; |
There was a problem hiding this comment.
Just noticed: this is wrong! At this point the code already did most/all cleanup and doing that again will cause crashes. So you just need to do a straight return NULL.
There was a problem hiding this comment.
Ah yes, the other functions do the cleanup already--fixed this!
seberg
left a comment
There was a problem hiding this comment.
Thanks, but one more iteration, I think. The idea here was to always manually copy out rather than use any auto-setup via PyArray_FromArray and that final writebackifcopy finalization.
So, I think both should be gone at the end, because otherwise it is confusing when what happenes.
| } | ||
| dtype = PyArray_DESCR(mps[0]); | ||
|
|
||
| int is_newarr = 0; |
There was a problem hiding this comment.
Can you find a different name for this? At the end of this, this should have the same meaning as out != NULL && obj != out. (or make it local and keep using that)
I.e. something clarifying that the user passed out array was copied (or must be copied).
There was a problem hiding this comment.
I used copy_existing_out, which hopefully isn't too clumsy!
There was a problem hiding this comment.
Wait, wasn't is_newarr maybe more fitting as it also applies when the array is created from scratch (in out == NULL)? It seems to be a special case of out != NULL && obj != out after out is (re)assigned. I'll do out_newarr maybe?
There was a problem hiding this comment.
Maybe, in my mind it doesn't apply to out == NULL, because to me it tells us if we need to copy back at the end (as well).
There was a problem hiding this comment.
Ah okay, I agree! What about copy_out? Since it's not always user passed...
| NPY_ARRAY_WRITEBACKIFCOPY | | ||
| NPY_ARRAY_FORCECAST; | ||
|
|
||
| obj = (PyArrayObject *)PyArray_FromArray(out, dtype, flags); |
There was a problem hiding this comment.
This branch should now never copy, so the only thing it should be doing is check for PyArray_FailUnlessWriteable. That also means you don't need the writeback-if-copy branch below.
(We can check that earlier also, since it doesn't make sense to wait until the last copy for this to fail in the other branch either.)
There was a problem hiding this comment.
Makes sense, thanks! Addressed this.
| } | ||
| else { | ||
| success = PyArray_ResolveWritebackIfCopy(obj); | ||
| } |
There was a problem hiding this comment.
And thus, you never need this branch here either.
| NULL, NULL, 0, | ||
| (PyObject *)ap); | ||
| out_newarr = 1; | ||
| copy_out = 1; |
There was a problem hiding this comment.
Oh yes, your naming suggestion makes total sense now 😓 I'll change that back too, thanks!
| goto fail; | ||
| } | ||
|
|
||
| PyArray_FailUnlessWriteable(out, "output array"); |
There was a problem hiding this comment.
Can you please add a test for this, because this doesn't work, you need an error return check and deal with that (this is not C++ with exceptions).
There was a problem hiding this comment.
Ah yes sorry, fixed this, adding a test now.
There was a problem hiding this comment.
Test is pushed.
| (PyObject *)out); | ||
| } | ||
| else { | ||
| obj = (PyArrayObject *)PyArray_FromArray(out, dtype, 0); |
There was a problem hiding this comment.
Please only incref here and assign to obj. We rely on that being true now.
There was a problem hiding this comment.
Sorry, just to be clear, this would be the dtype? Made that change.
seberg
left a comment
There was a problem hiding this comment.
LGTM, modulo some refactors (e.g. error refcounting was wrong).
|
Thank you for reviewing. |
Resolves #22237.
This PR does not copy the existing
outarray innp.choose, instead creating the new output array from scratch and copying it back manually.Thank you for reviewing!