Skip to content

Conversation

@seberg
Copy link
Member

@seberg seberg commented Nov 30, 2022

Draft for now, since I want to discuss the approach a bit (I have two).

Basically, this changes it so that we use a clear function for each DType rather than the current "specialized" PyArray_Decref, which cannot be extended to non-objects.

We already have functions to clear buffers internally in the casting machinery, so this wires them in. There are two very unrelated questions:

  • Should we even bother keeping PyArray_Decref() in case someone wants to decref without also clearing? I would still prefer deprecating it, I think. But maybe we should first expose the new API... Unless we can just rename the symbol.

  • I have two different approaches to do this:

    1. This one uses the ArrayMethod which is very simple because it fits with our casting setup. But we don't need resolve_dtypes and need to pass a lot of indirections (there is 1 array to operate, so we only need 1 stride, data pointer, etc.)
    2. We could create a new specialized function. This requires duplicating a bit of the casting setup (for support of structured dtypes) and means a bit more new API. OTOH, such API might be useful for re-use for a TRAVERSE slot, which I presume we need eventually.
      (For some Python implementations, I believe you must have a traverse).

@seberg seberg marked this pull request as draft November 30, 2022 14:25
@seberg seberg changed the title Refactor decref ENH: Refactor deallocation of dtypes with embedded references Nov 30, 2022
…onsense...

... one day I need to figure out why.
@ngoldbaum
Copy link
Member

I think for visibility to dtype authors, it would be nice to have something in the dtype API for clearing memory. That said, documentation would also be sufficient.

@seberg
Copy link
Member Author

seberg commented Jan 3, 2023

I have worked on the alternative option, unfortunately after being practically finished now, I see that NpyIter uses the "write to NULL" logic to clean up buffers with references when buffers are read-only. I guess that can be fixed, although it is a bit tedious (on the up-side the change would clean up gh-22790)

@ngoldbaum the PR simply didn't make this available yet to user DTypes, I thought I would follow up rather than include it. But lets see, it is a tiny addition after all.

@ngoldbaum
Copy link
Member

Darn, well as long as we document that you have to clear buffers using a cast it should be fine. I'm happy with whatever approach you'd prefer for getting this exposed for user dtypes. @peytondmurray was working last week on a dtype implementation that leaks memory that could use this.

@seberg
Copy link
Member Author

seberg commented Jan 3, 2023

It wasn't that bad :), I just had reached the point of calling it a day (and then continued anyway). gh-22924 is the result. For now, I assume we will go with that approach. The first step is a lot more involved, but I think in general I prefer it.

@seberg seberg closed this Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants