Skip to content

Conversation

@seberg
Copy link
Member

@seberg seberg commented Jan 3, 2023

Replacement for gh-22692 and the "other option" here which does not reuse ArrayMethod as that seems a bit ridiculous for a single array operation.

This does a few things:

  • Creates a new get_clear_function() slot, similar to the typical ArrayMethod strided loop getting, but a bit reduced.
    • Returns a simplified strided loop function (only one operand).
    • I have included a NULL initial argument, to have space for something like a "context"...
  • Moves the current clearing functions that are based on the casting code out into its own file, this copies a little and moves some other things.
    • Recreates some helper infrastructure from the casting one...
  • Introduces a new PyArray_ClearBuffer and PyArray_ClearArray and uses them.
    • Effectively replaces the old DECREF API, but I didn't dare delete it since it is in principle public. We also use the single-item DECREF in a few places still.
    • (The buffer version is almost unused now, after refactoring npyiter, but that seems OK)
  • Fixes up NpyIter to track the clear function (may never be used, but it is actually nice to have it already in case of an error).

I suspect that this should be ready (modulo cleaning up the code a bit with review). It does not yet expose this to user DTypes and a few code paths still use PyArray_Item_XDECREF which need fixing (I bet half of them are dubious anyway, but a different issue).

@seberg
Copy link
Member Author

seberg commented Jan 3, 2023

@ngoldbaum I don't know if you dare review this, but if you do that would be great and we could also go through together or so. I think the main point is deciding that final API is fine, the rest is tricky code, but should be OK.

@seberg seberg force-pushed the refactor-decref-meth-only branch from ecd3611 to 59bbf61 Compare January 4, 2023 10:26
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new API looks fine to me, I think. I can't think of a reason why you'd want to use the auxdata for clearing the array, but it doesn't seem harmful to allow it to be passed in if someone wants it.

Currently I think this only works with embedded references to python objects, but in principle it should work with arbitrary embedded references. Does that just mean expanding the checks to include NPY_ITEM_IS_POINTER or are there any other possible issues?

Presumably user dtypes will also need a way to set the appropriate flags to indicate that they have embedded references?

PyArrayMethodObject *within_dtype_castingimpl;
/*
* Implementation which clears a dtype or NULL. If not given, setting
* HASREF on the dtype is invalid. We only use the loop getting, not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, HASREF isn't a thing, so this comment might be a bit confusing. It looks like PyDataType_REFCHK checks NPY_ITEM_REFCOUNT, so you could use that? Although maybe NPY_ITEM_IS_POINTER is invalid too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, NPY_ITEM_REFCOUNT is it, except that we may want to rename/alias it to NPY_ITEM_HASREF or something...

To be honest, I am not sure that NPY_ITEM_IS_POINTER is ever used (or useful) except as having the identical meaning to NPY_ITEM_REFCOUNT...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense. I brought up NPY_ITEM_IS_POINTER because it might be nice to indicate that a dtype uses embedded references but doesn't need the GIL.

The use of this function is strongly discouraged, within NumPy
use PyArray_Clear, which DECREF's and sets everything to NULL and can
work with any dtype.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should (can?) PyArray_XDECREF be deprecated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, not sure there is a need for it to begin in reasonable code... But seems hard to do, might be a "nuke it with NumPy 2.0" thing...

* We assume for now that this context can be just passed through in the
* the future (for structured dtypes).
*/
typedef int (simple_loop_function)(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe these two new function signatures should be called traverse_loop_function and get_traverse_loop_function to indicate that it's a loop specifically for array traversal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, "simple" is always a bad name anyway ;).

@seberg
Copy link
Member Author

seberg commented Jan 5, 2023

Updated based on the comments, thanks! auxdata is already used for structured dtypes (we need to store how to unpack the struct).

It may also be useful for other things, for example sorting could re-use this pattern (at least partially). Not quite a "traversal" I guess, but needs much the same.

@seberg seberg force-pushed the refactor-decref-meth-only branch from 396f931 to 34440bb Compare January 6, 2023 13:16
seberg added a commit to seberg/numpy that referenced this pull request Jan 11, 2023
The incref/decref function shouldn't be able to fail (especially the
decref).  But right now they can, this will be fixed when we redo
clearing (see numpygh-22924)
@seberg
Copy link
Member Author

seberg commented Feb 9, 2023

Gentle ping on this clearing of array data PR.

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some documentation about how users are meant to design "clear" functions, how "clear" relates to "dealloc" and "decref". Or is there documentation in one of the dtype NEPS?

@seberg
Copy link
Member Author

seberg commented Feb 9, 2023

This could use some documentation about how users are meant to design "clear" functions, how "clear" relates to "dealloc" and "decref". Or is there documentation in one of the dtype NEPS?

True, for users I didn't add any public API yet though, so I was thinking to add that together with that. OTOH, if we want internal docs than should definitely add it here!

@seberg seberg force-pushed the refactor-decref-meth-only branch from aca13d1 to 479170c Compare February 9, 2023 15:03
@mattip
Copy link
Member

mattip commented Feb 15, 2023

if we want internal docs

I would at least like a type of glossary. How does "clearing" relate to "dealloc" or "decref"?

@seberg seberg force-pushed the refactor-decref-meth-only branch from d30f667 to 10e5c99 Compare February 15, 2023 11:59
@seberg
Copy link
Member Author

seberg commented Feb 15, 2023

I added it to the get_clear_loop and mention decref+NULLing in a few more places. (except the last commit, this is just a rebase).

@mattip
Copy link
Member

mattip commented Feb 15, 2023

I added it to the get_clear_loop and mention decref+NULLing

Thanks, things are "clearer" now. There are a number of code coverage warnings. Most of them seem like code that should never be reached. Do we have a way to tell coverage "ignore this"?

@seberg
Copy link
Member Author

seberg commented Feb 15, 2023

Do we have a way to tell coverage "ignore this"?

I am not aware, although I suppose there probably is a way. But coverage is only reported as changed so it never bothered me?

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly concerned about the lack of coverage, it seems the copy functions are never used? Other than that this looks good.

if (data->decref_src.func(&data->decref_src.context,
&orig_src, &N, &src_stride, data->decref_src.auxdata) < 0) {
if (data->decref_src.func(NULL, data->decref_src.descr,
orig_src, N, src_stride, data->decref_src.auxdata) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage says this whole section is not hit: is that true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, it might be possible to hit this for structured to object (I suspect not, but I will just try to add a test). And we only have object and structured, user dtypes could (in the future) hit it, though.



/* Same as in dtype_transfer.c */
#define NPY_LOWLEVEL_BUFFER_BLOCKSIZE 128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is required these be equivalent, then that should be checked or the same constant used. Otherwise the comment should be changed to indicate the value is arbitrary: "Arbitrary value like the one in dtype_transfer.c"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the identical use case, but doesn't need to be the same value.

get_traverse_loop_function *get_clear = NPY_DT_SLOTS(NPY_DTYPE(dtype))->get_clear_loop;
if (get_clear == NULL) {
PyErr_Format(PyExc_RuntimeError,
"Internal error, tried to fetch decref/clear function for the "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Internal error, tried to fetch decref/clear function for the "
"Internal error, 'get_clear_loop' not set for "


/* transfer data copy function */
static NpyAuxData *
fields_clear_data_clone(NpyAuxData *data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test of this functionality, maybe it should be marked /* untested */. Or is it tested in the userdtype repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird internal API that is hard to use. Let me try, it may actually be possible via nditer...

npy_intp i, field_count = d->field_count;

/* Do the traversing a block at a time for better memory caching */
const npy_intp blocksize = NPY_LOWLEVEL_BUFFER_BLOCKSIZE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is more performant than a simpler approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the approach we use everywhere. But yes, it should make sense for cache friendliness.


/* transfer data copy function */
static NpyAuxData *
subarray_clear_data_clone(NpyAuxData *data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never used?

Copy link
Member Author

@seberg seberg Feb 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this one actually seems unreachable, added a copy comment. I think I covered most of the others (but its complicated, will have to see what is actually covered).

* If the DType class does not implement `get_clear_loop` setting
* NPY_ITEM_REFCOUNT on its dtype instances is invalid. Note that it is
* acceptable for NPY_ITEM_REFCOUNT to inidicate references that are not
* Python objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change this to make NPY_ITEM_HASOBJECT the preferred alias, and replace PyDataType_REFCHK with PyDataType_HASOBJECT, but that can be done in a future PR since this is all internal API.

@charris
Copy link
Member

charris commented Feb 19, 2023

close/reopen

@charris charris closed this Feb 19, 2023
@charris charris reopened this Feb 19, 2023
seberg and others added 6 commits February 19, 2023 19:52
Co-authored-by: Nathan Goldbaum <[email protected]>
We simply don't bother releasing the GIL right now.  This could make
sense in principle for some user dtypes, though.
(NumPy only ever uses clearing with Python objects so the GIL is always
needed).

I see no plausible use of checking floating point errors, so do not
bother to add such cruft, if anyone ever needs it, they should add it.
@seberg seberg force-pushed the refactor-decref-meth-only branch from cd9be08 to 998cbb5 Compare February 19, 2023 20:00
We only ever use the "clone" mechanism for `nditer.copy()` but even
there, it is only used for buffer transfers (not final cleanups).

It appears that structured dtypes are always a single cast step
which doesn't require the clearing normally (so this is hard to
hit).
@seberg seberg force-pushed the refactor-decref-meth-only branch from 967b2c8 to f2be9f7 Compare February 20, 2023 11:46
@mattip
Copy link
Member

mattip commented Feb 20, 2023

Let's give this a try. It might take a few iterations to find all the corner cases, maybe the userdtype repo will find some of them.

@mattip mattip merged commit af9f656 into numpy:main Feb 20, 2023
@mattip
Copy link
Member

mattip commented Feb 20, 2023

Thanks @seberg

@seberg seberg deleted the refactor-decref-meth-only branch February 20, 2023 13:19
@seberg
Copy link
Member Author

seberg commented Feb 20, 2023

Thanks! @ngoldbaum do you want to look into making this public? (just fishing, but it should be very easy.)

@ngoldbaum
Copy link
Member

Thanks! @ngoldbaum do you want to look into making this public? (just fishing, but it should be very easy.)

Sure, I can try that out. I'll ping you if I run into trouble.

@mhvk
Copy link
Contributor

mhvk commented Feb 25, 2023

@seberg - This PR is causing failures failures over at astropy - see astropy/astropy#14460 (and maybe pandas-dev/pandas#51589 as well, given link above; cc @lithomas1)

@lithomas1
Copy link
Collaborator

lithomas1 commented Feb 25, 2023

I couldn't reproduce the segfaults on my computer, but the backtrace from our tests is is

0x00007ffff588c71b in OBJECT_copyswapn () from /home/runner/micromamba/envs/test/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so
#0  0x00007ffff588c71b in OBJECT_copyswapn () from /home/runner/micromamba/envs/test/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so
#1  0x00007ffff5915c72 in _new_argsortlike () from /home/runner/micromamba/envs/test/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so
#2  0x00007ffff591a16d in PyArray_ArgSort () from /home/runner/micromamba/envs/test/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so
#3  0x00007ffff59690bf in array_argsort () from /home/runner/micromamba/envs/test/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so
#4  0x000055555568bd[83](https://github.com/pandas-dev/pandas/actions/runs/4262938751/jobs/7419045241#step:8:85) in cfunction_vectorcall_FASTCALL_KEYWORDS (func=func@entry=<built-in method argsort of numpy.ndarray object at remote 0x7fffb456e550>, args=0x55555ff63278, nargsf=<optimized out>, kwnames=('kind',)) at /usr/local/src/conda/python-3.10.9/Objects/methodobject.c:446

Maybe this helps a little?

I only found this PR from git blame, where it seems like the only function to touch the functions in the backtrace recently.

@mhvk
Copy link
Contributor

mhvk commented Feb 25, 2023

In our case, the error message "RuntimeError: Internal error, tried to fetch clear function for the user dtype '�' without fields or subarray (legacy support)" is triggered, which was introduced here... I'm trying to see if this dtype actually has objects inside or not (which is what this PR addresses), but am not very familiar with this part of astropy. Anyway, maybe @seberg has an idea just given what path seems to be hit...

@mhvk
Copy link
Contributor

mhvk commented Feb 25, 2023

p.s. The test that triggers the problem may well have an empty structured array, since it explicitly reads in a table without columns. Given an installation of astropy, a minimal example to reproduce is at astropy/astropy#14460 (comment)

@mhvk
Copy link
Contributor

mhvk commented Feb 25, 2023

OK, found a minimal pure numpy example - see #23277

@seberg
Copy link
Member Author

seberg commented Feb 25, 2023

@lithomas1 thanks. Seems like I forgot to add a NULLing of the buffer when trying to simplify that code. Odd that NumPy test suite doesn't notice that.

@seberg
Copy link
Member Author

seberg commented Feb 25, 2023

Triggered the wheel build, so this hopefully should clear up very soon. Thanks for the reports.

@mhvk
Copy link
Contributor

mhvk commented Feb 25, 2023

Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants