-
-
Notifications
You must be signed in to change notification settings - Fork 12k
MAINT: Refactor clearing up of array data #22924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ngoldbaum I don't know if you dare review this, but if you do that would be great and we could also go through together or so. I think the main point is deciding that final API is fine, the rest is tricky code, but should be OK. |
ecd3611 to
59bbf61
Compare
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new API looks fine to me, I think. I can't think of a reason why you'd want to use the auxdata for clearing the array, but it doesn't seem harmful to allow it to be passed in if someone wants it.
Currently I think this only works with embedded references to python objects, but in principle it should work with arbitrary embedded references. Does that just mean expanding the checks to include NPY_ITEM_IS_POINTER or are there any other possible issues?
Presumably user dtypes will also need a way to set the appropriate flags to indicate that they have embedded references?
| PyArrayMethodObject *within_dtype_castingimpl; | ||
| /* | ||
| * Implementation which clears a dtype or NULL. If not given, setting | ||
| * HASREF on the dtype is invalid. We only use the loop getting, not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, HASREF isn't a thing, so this comment might be a bit confusing. It looks like PyDataType_REFCHK checks NPY_ITEM_REFCOUNT, so you could use that? Although maybe NPY_ITEM_IS_POINTER is invalid too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, NPY_ITEM_REFCOUNT is it, except that we may want to rename/alias it to NPY_ITEM_HASREF or something...
To be honest, I am not sure that NPY_ITEM_IS_POINTER is ever used (or useful) except as having the identical meaning to NPY_ITEM_REFCOUNT...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that makes sense. I brought up NPY_ITEM_IS_POINTER because it might be nice to indicate that a dtype uses embedded references but doesn't need the GIL.
| The use of this function is strongly discouraged, within NumPy | ||
| use PyArray_Clear, which DECREF's and sets everything to NULL and can | ||
| work with any dtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should (can?) PyArray_XDECREF be deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, not sure there is a need for it to begin in reasonable code... But seems hard to do, might be a "nuke it with NumPy 2.0" thing...
| * We assume for now that this context can be just passed through in the | ||
| * the future (for structured dtypes). | ||
| */ | ||
| typedef int (simple_loop_function)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe these two new function signatures should be called traverse_loop_function and get_traverse_loop_function to indicate that it's a loop specifically for array traversal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, "simple" is always a bad name anyway ;).
|
Updated based on the comments, thanks! It may also be useful for other things, for example sorting could re-use this pattern (at least partially). Not quite a "traversal" I guess, but needs much the same. |
396f931 to
34440bb
Compare
The incref/decref function shouldn't be able to fail (especially the decref). But right now they can, this will be fixed when we redo clearing (see numpygh-22924)
|
Gentle ping on this clearing of array data PR. |
mattip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use some documentation about how users are meant to design "clear" functions, how "clear" relates to "dealloc" and "decref". Or is there documentation in one of the dtype NEPS?
True, for users I didn't add any public API yet though, so I was thinking to add that together with that. OTOH, if we want internal docs than should definitely add it here! |
aca13d1 to
479170c
Compare
I would at least like a type of glossary. How does "clearing" relate to "dealloc" or "decref"? |
d30f667 to
10e5c99
Compare
|
I added it to the |
Thanks, things are "clearer" now. There are a number of code coverage warnings. Most of them seem like code that should never be reached. Do we have a way to tell coverage "ignore this"? |
I am not aware, although I suppose there probably is a way. But coverage is only reported as changed so it never bothered me? |
mattip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly concerned about the lack of coverage, it seems the copy functions are never used? Other than that this looks good.
| if (data->decref_src.func(&data->decref_src.context, | ||
| &orig_src, &N, &src_stride, data->decref_src.auxdata) < 0) { | ||
| if (data->decref_src.func(NULL, data->decref_src.descr, | ||
| orig_src, N, src_stride, data->decref_src.auxdata) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coverage says this whole section is not hit: is that true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, it might be possible to hit this for structured to object (I suspect not, but I will just try to add a test). And we only have object and structured, user dtypes could (in the future) hit it, though.
|
|
||
|
|
||
| /* Same as in dtype_transfer.c */ | ||
| #define NPY_LOWLEVEL_BUFFER_BLOCKSIZE 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is required these be equivalent, then that should be checked or the same constant used. Otherwise the comment should be changed to indicate the value is arbitrary: "Arbitrary value like the one in dtype_transfer.c"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the identical use case, but doesn't need to be the same value.
| get_traverse_loop_function *get_clear = NPY_DT_SLOTS(NPY_DTYPE(dtype))->get_clear_loop; | ||
| if (get_clear == NULL) { | ||
| PyErr_Format(PyExc_RuntimeError, | ||
| "Internal error, tried to fetch decref/clear function for the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Internal error, tried to fetch decref/clear function for the " | |
| "Internal error, 'get_clear_loop' not set for " |
|
|
||
| /* transfer data copy function */ | ||
| static NpyAuxData * | ||
| fields_clear_data_clone(NpyAuxData *data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no test of this functionality, maybe it should be marked /* untested */. Or is it tested in the userdtype repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird internal API that is hard to use. Let me try, it may actually be possible via nditer...
| npy_intp i, field_count = d->field_count; | ||
|
|
||
| /* Do the traversing a block at a time for better memory caching */ | ||
| const npy_intp blocksize = NPY_LOWLEVEL_BUFFER_BLOCKSIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is more performant than a simpler approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the approach we use everywhere. But yes, it should make sense for cache friendliness.
|
|
||
| /* transfer data copy function */ | ||
| static NpyAuxData * | ||
| subarray_clear_data_clone(NpyAuxData *data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is never used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this one actually seems unreachable, added a copy comment. I think I covered most of the others (but its complicated, will have to see what is actually covered).
| * If the DType class does not implement `get_clear_loop` setting | ||
| * NPY_ITEM_REFCOUNT on its dtype instances is invalid. Note that it is | ||
| * acceptable for NPY_ITEM_REFCOUNT to inidicate references that are not | ||
| * Python objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change this to make NPY_ITEM_HASOBJECT the preferred alias, and replace PyDataType_REFCHK with PyDataType_HASOBJECT, but that can be done in a future PR since this is all internal API.
|
close/reopen |
This removes re-using `write` for the same purpose, which seems surprisingly clean, considering that the "clearing" path was already special in any case.
Also remove accidentally auto-inserted include...
Co-authored-by: Nathan Goldbaum <[email protected]>
We simply don't bother releasing the GIL right now. This could make sense in principle for some user dtypes, though. (NumPy only ever uses clearing with Python objects so the GIL is always needed). I see no plausible use of checking floating point errors, so do not bother to add such cruft, if anyone ever needs it, they should add it.
Co-authored-by: Matti Picus <[email protected]>
cd9be08 to
998cbb5
Compare
We only ever use the "clone" mechanism for `nditer.copy()` but even there, it is only used for buffer transfers (not final cleanups). It appears that structured dtypes are always a single cast step which doesn't require the clearing normally (so this is hard to hit).
967b2c8 to
f2be9f7
Compare
|
Let's give this a try. It might take a few iterations to find all the corner cases, maybe the userdtype repo will find some of them. |
|
Thanks @seberg |
|
Thanks! @ngoldbaum do you want to look into making this public? (just fishing, but it should be very easy.) |
Sure, I can try that out. I'll ping you if I run into trouble. |
|
@seberg - This PR is causing failures failures over at astropy - see astropy/astropy#14460 (and maybe pandas-dev/pandas#51589 as well, given link above; cc @lithomas1) |
|
I couldn't reproduce the segfaults on my computer, but the backtrace from our tests is is Maybe this helps a little? I only found this PR from git blame, where it seems like the only function to touch the functions in the backtrace recently. |
|
In our case, the error message "RuntimeError: Internal error, tried to fetch clear function for the user dtype '�' without fields or subarray (legacy support)" is triggered, which was introduced here... I'm trying to see if this dtype actually has objects inside or not (which is what this PR addresses), but am not very familiar with this part of astropy. Anyway, maybe @seberg has an idea just given what path seems to be hit... |
|
p.s. The test that triggers the problem may well have an empty structured array, since it explicitly reads in a table without columns. Given an installation of astropy, a minimal example to reproduce is at astropy/astropy#14460 (comment) |
|
OK, found a minimal pure numpy example - see #23277 |
|
@lithomas1 thanks. Seems like I forgot to add a NULLing of the buffer when trying to simplify that code. Odd that NumPy test suite doesn't notice that. |
|
Triggered the wheel build, so this hopefully should clear up very soon. Thanks for the reports. |
|
Thanks for the quick fix! |
Replacement for gh-22692 and the "other option" here which does not reuse
ArrayMethodas that seems a bit ridiculous for a single array operation.This does a few things:
get_clear_function()slot, similar to the typical ArrayMethod strided loop getting, but a bit reduced.NULLinitial argument, to have space for something like a "context"...PyArray_ClearBufferandPyArray_ClearArrayand uses them.clearfunction (may never be used, but it is actually nice to have it already in case of an error).I suspect that this should be ready (modulo cleaning up the code a bit with review). It does not yet expose this to user DTypes and a few code paths still use
PyArray_Item_XDECREFwhich need fixing (I bet half of them are dubious anyway, but a different issue).