Skip to content

Conversation

@mhvk
Copy link
Contributor

@mhvk mhvk commented Oct 5, 2025

The overhead of allocating memory for array data becomes large when the inputs are scalars. So for those this PR allocates memory as part of the object using tp_alloc. (This solves the same problem #29876 aimed to solve, but arguably in a much nicer manner.)

EDIT: quite a bit of the changes are just changes in indentation, so may want to review ignoring that, via https://github.com/numpy/numpy/pull/29878/files?w=1

Timings

%timeit np.empty(())
112->67 ns  (custom py311: 89->61)

a = 1.6
%timeit np.array(a)
134->89 ns (custom py311: 105->79)

%timeit np.add(a, a)
527->413 ns (390 ns for np.float64); custom py311 459->370)

In principle, this could be trivially extended to doing this generally when the required amount of memory is small.

Notes/queries

  • The above requires setting tp_itemsize=1 for the array object, which means that always at least 1 extra byte gets allocated [1]. I verified that with the current struct, this does not increase the size of the object (it remains 96 bytes). Alignment should be OK, since the last item on the current struct is a pointer.
  • Given the change I had to make to __setstate__, it might be an idea to add a flag that tells whether the data is on the object. That avoids comparing pointers. Logically, one might want to use mem_handler == NULL for that, but apparently that is already assumed to mean the data was gotten with malloc. Maybe mem_handler = Py_None` is reasonable?

[1] https://github.com/python/cpython/blob/37d16f7f620d79af7db1c00c8d638122af889bf9/Objects/typeobject.c#L2490-L2497

@mhvk

This comment was marked as outdated.

mhvk added 2 commits October 5, 2025 11:06
The overhead of allocating memory for array data becomes large when the inputs
are scalars. So for those this PR allocates memory as part of the object using
`tp_alloc`.

Timings
```
%timeit np.empty(())
112->67 ns

a = 1.6
%timeit np.array(a)
134->89 ns

%timeit np.add(a, a)
527->413 ns (390 ns for np.float64)
```

In principle, this could be trivially extended to doing this generally when
the required amount of memory is small.

One note: the above requires setting `tp_itemsize=1` for the array object,
which means that always at least 1 extra byte gets allocated [1].  I verified
that with the current struct, this does not increase the size of the object
(it remains 96 bytes).  Alignment should be OK, since the last item on the
current struct is a pointer.

[1] https://github.com/python/cpython/blob/37d16f7f620d79af7db1c00c8d638122af889bf9/Objects/typeobject.c#L2490-L2497
@mhvk mhvk force-pushed the array-allocate-small-via-object branch from 78f87ed to 70d058c Compare October 5, 2025 15:10
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment, the speedup does look promising. Unfortunately, I think we should check PyDataMem_GetHandler for being the default one (which will add 10ns or so).

Just commenting, for now, we may need to iterate a bit, but it's a nice PoC!

PyVarObject_HEAD_INIT(NULL, 0)
.tp_name = "numpy.ndarray",
.tp_basicsize = sizeof(PyArrayObject_fields),
.tp_itemsize = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think this might break subclasses if they were written in C. But I do suspect if we have a basetype, we can just manually add similar logic.

(Also, I don't think elements have alignment guarantees, so padding may be necessary.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that they might have used tp_itemsize differently? (note that I changed things so that I don't rely anymore on PyArray_Type.tp_basicsize, but now properly take Py_TYPE(self)->tp_basicsize)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suppose a custom tp_itemsize may be the only, but it may not be checked so if we test for no subtype we would be safe also?
It's a bit awkward, because we do not actually have a VarObject (a var object would have an ob_size field). The fact that the above has PyVarObject_HEAD_INIT is just wrong, it is setting the *data field or so to 0, which ends up working out.
But while we can't insert an ob_size field that field is also irrelevant except at allocation time, I think (i.e. Python will write garbage at allocation, but we just overwrite it).

We have broken some C classes before when increasing the size of the struct, that might just happen again, I guess.
(Part of me is tempted to just call PyObject_Malloc() directly we have enough code that does that, but I am not sure that is ideal -- especially on free-threaded builds that seem to use different allocation heaps for different type of objects. But it would side-step most of these worries.)

if ((fa->flags & NPY_ARRAY_OWNDATA) && fa->data) {
/* Deallocate data, if present and not a small allocation in the object */
if ((fa->flags & NPY_ARRAY_OWNDATA) && fa->data != NULL
&& fa->data != (void *)((char *)fa + PyArray_Type.tp_basicsize)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a (private) flag as well, although maybe this is check is nice enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I realized the same. How about setting mem_handler = Py_None? (cannot use NULL as that is interpreted as meaning that data might have been gotten with malloc... see around the changes to array_dealloc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, mem_handler = Py_None is probably asking for problems, as surely there is code that just acts on it being non-NULL. One would need a no-op function or so. But that becomes more complicated than just having a new flag...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, right now I went for a semi-private _PyArray_DATA_ON_INSTANCE in-line function in ndarraytypes.h - that way we can always decide later whether or not to define a flag, etc.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 5, 2025

Unfortunately, I think we should check PyDataMem_GetHandler for being the default one (which will add 10ns or so).

I'm trying to understand why it might be a problem if a scalar is on the object itself (mostly because numpy scalars are stored on their own object anyway...). I was thinking that the allocator would be mostly used for very large arrays. But might it be needed in case someone, e.g., expects all data to be allocated in GPU memory or so? But wouldn't they then want the whole object there anyway?

Shall I perhaps send an e-mail to numpy-dev to ask? It would be silly to guard against something that actually is a benefit also to those with their own memory allocator.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 5, 2025

p.s. Am confused why the tests all fail when on my machine spin test works correctly.

@seberg
Copy link
Member

seberg commented Oct 5, 2025

I was thinking that the allocator would be mostly used for very large arrays. But might it be needed in case someone, e.g., expects all data to be allocated in GPU memory or so? But wouldn't they then want the whole object there anyway?

Yes, precisely, someone might be asking for an alignment of 512bytes or so and any new array allocated within a context and returned to the user should be expected to have this.
Arguably, we don't have to care for e.g. ufuncs (or in fact any array created from a scalar where a copy=True isn't done, I guess), but we don't have that information easily available here.

p.s. Am confused why the tests all fail when on my machine spin test works correctly.

To be fair, a lot of tests are passing here also :). Maybe try running with PYTHONMALLOC=malloc_debug or so, might make the failure obvious.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 5, 2025

Now for the correct issue: OK, at least I can reproduce it, it has to do with being on python 3.11. It is also clear it has something to do with subclassing.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 5, 2025

Hmm, this is trickier than I hoped. Under python 3.11, I can get rid of the crashes by starting PyArrayObject_fields with PyObject_VAR_HEAD, but then multiple inheritance breaks:

class A(np.ndarray):
    pass

class B(np.ndarray):
    pass

class C(A, B):
    pass

TypeError: multiple bases have instance lay-out conflict

Under python 3.13, this all works fine.

I'll push the version with the change anyway, since it would seem to be correct, but it is unclear if I really can proceed like this...

@mhvk mhvk force-pushed the array-allocate-small-via-object branch from 76f01b1 to 43f1b29 Compare October 5, 2025 22:20
Standard tp_alloc worked on python 3.13, but not python 3.11 for
subclassing.  Could not figure out what the problem was. Possibly
needed a PyObject_VAR_HEAD, but that also gave problems (though
at least no segfaults).
@mhvk mhvk force-pushed the array-allocate-small-via-object branch from 43f1b29 to 93f2c69 Compare October 6, 2025 01:30
@mhvk
Copy link
Contributor Author

mhvk commented Oct 6, 2025

OK, I ended up just doing the allocation directly, instead of via tp_alloc. That works for both python versions, and keeps the basic size of the instances the same. I guess the non-so-great part is that this is hidden; suggestions for improvements welcome.

p.s. Still need to do the mem_handler check, but first want to be sure people are happy with proceeding along these lines...

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this as such, but I would like to have slightly more confidence about bare PyObject_malloc use here.

There might also be an interesting thing whether this whole thing is just much faster on Python 3.14 or so due to the use of mimalloc (or is that only free-threading)?

And in the end, I have to also think about arr.resize calls which might be weird but we can likely figure something out there (although need a release note, I wouldn't be surprised if someone is insane enough to replace NumPy array allocations under our feet...).

* calculate full nbytes up front and just use that here?
*
* Note: alignment constraint written such that compiler can just
* show it is true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but since this is a static check, I think it should be a static_assert (otherwise you won't notice if it starts failing). Might also use NPY_ALIGNOF(npy_clongdouble), not sure that we can actually trust sizeof(npy_intp) to be a maximally aligned type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I was wondering how best to do that, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, not sure about NPY_ALIGNOF(npy_clongdouble) - if that can be more than sizeof(npy_intp), then I'm not sure we can count on PyObject_Malloc to allocate to that level (i.e., our basic size may be 96 and thus divisible by 16, but malloc may only care about alignment of 8?). If that's a problem, it might have to be a check on data itself...

But probably it is not a problem. How about,

            static_assert((sizeof(PyArrayObject_fields)
                           & (sizeof(npy_intp) - 1)) == 0);
            static_assert(NPY_ALIGNOF(npy_clongdouble) <= sizeof(npy_intp));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm static_assert(NPY_ALIGNOF(npy_clongdouble) <= sizeof(npy_intp)); fails.

I wonder, though, whether this is a problem - we do not seem to allocate our data at that alignment either (or at least I don't see it in PyDataMem_UserNEW, which seems to just use malloc under the hood).

Copy link
Member

@seberg seberg Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any malloc implementation (including Python's) guarantees at least the maximum alignment of any type on the system (Not necessarily the maximum of any instruction, but that is a different matter).
If that assertion fails, that is OK. Should have just written the previous one with NPY_ALIGNOF(npy_clongdouble) instead (might as well use % == 0 since speed is irrelevant).

If that also fails, then that is also OK! Just introduce a helper struct to clarify this. Maybe you prefer that either way even...

struct {
    PyArrayObject_fields arr;
    npy_clongdouble data[];
} _sizeof_arr_plus_data_struct;

then you can skip on the asssertions and use the sizeof that.

(If we want even more alignment, I would probably just do the math manually, but I suspect the struct thing is enough. I.e. on most systems the malloc functions probably all give 16 bytes alignment which is larger than the maximum alignment type probably -- it might make sense to just honor that.)
EDIT: Sorry, alignof longdouble is 16, so everything adds up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, funnily enough, the python documentation suggests something similar for variable size object (see last example in https://docs.python.org/3/c-api/typeobj.html#examples):



typedef struct {
    PyObject_VAR_HEAD
    const char *data[1];
} MyObject;

static PyTypeObject MyObject_Type = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name = "mymod.MyObject",
    .tp_basicsize = sizeof(MyObject) - sizeof(char *),
    .tp_itemsize = sizeof(char *),
};

Notice the tp_basicsize calculation. I quite like this. Do you think we could do the same here, i.e., just add npy_clongdouble data[] to the end of the regular PyArrayObject_field struct, but remove it from tp_basicsize?

I ask in part since if the problems I had before with subclasses giving errors on python 3.11 are solved on python 3.12, then perhaps we can go for allocating extra stuff using the regular allocator -- and either work around python 3.11 or count on support for it being dropped very soon (2025 quarter 4 according to SPEC 0, i.e., basically now).

Copy link
Member

@seberg seberg Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could. But no need to "remove" anything, the only thing for the removal is that it uses data[1] rather than data[]. And the only reason for that is that the data[] notation is C99 (and I am not sure which C++ versios support it).
Hopefully C++17 does, because otherwise we may have to work around things. But we do use data[] a lot in NumPy, I am just not sure that it is in public headers yet (and this has to be public).

EDIT: Actually, not 100% sure about the public, but I suspect it needs to be (although we can say it is private of course).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, about Python versions. Without thinking it through: Yes, I can see doing such tricks only on Python versions that properly support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may try - the nice thing about having some npy_clongdouble extra[] is that the test for whether the data is stored on the instance becomes a more obvious one: self->data == &self->extra (would need to think of a better name...).

I think it is OK to note in the header file that this field is private and only exists if data points to it (or something like that). A name starting with an underscore would probably be wise...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I went ahead and added _storage, with an explanation. Also added a _PyArray_DATA_ON_INSTANCE function for checking whether that is the case. It makes the other code changes more obvious. Not sure it should be in ndarraytypes.h, but didn't know where else to put it (and there are other private ones in there).

}
memset(alloc, '\0', size);
PyObject_Init((PyObject *)alloc, subtype);
fa = (PyArrayObject_fields *)alloc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use the ndim == 0 more here, then I wonder if we shouldn't always use this at least for the shape/dims allocation as well and potentially for all very small arrays=.

@ngoldbaum I wonder if you know how bad it is to use PyObject_Malloc directly. I am sure it works (numpy has a lot of weird code that does it), but it is very unclear to me if there are downsides.

Copy link
Contributor Author

@mhvk mhvk Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was wondering about storing the strides on the object as well, even if just to keep the actual values near the object in memory.

Note that this code is just a condensed version of PyType_GenericAlloc -- it uses PyObject_Malloc too.

(Note that in an earlier version, I wrote a separate array_alloc. If we go that route, we can of course do what we want, but may need an array_free as well. This seemed less invasive...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, that may be reasonable if I think about it (a custom allocate). I don't think that means you need a custom free at all. Means it is inherited just fine (only problem if someone would add an itemsize, but I think I might gamble on that).


fa = (PyArrayObject_fields *) subtype->tp_alloc(subtype, 0);
npy_bool data_on_instance = NPY_FALSE;
if (subtype == &PyArray_Type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle one could add a check for tp_basicsize == our size and tp_alloc == PyObject_GenericAlloc (with alignment logic without the tp_basicsize check). But in the end, this is mainly about scalars so subclasses do not matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add a note to the already long comment... We'd probably want tp_itemsize=0 too...

@mhvk
Copy link
Contributor Author

mhvk commented Oct 7, 2025

I am fine with this as such, but I would like to have slightly more confidence about bare PyObject_malloc use here.

As mentioned in-line, that use comes straight from PyType_GenericAlloc, so is no change in that sense.

There might also be an interesting thing whether this whole thing is just much faster on Python 3.14 or so due to the use of mimalloc (or is that only free-threading)?

I'll see if I can get 3.14rc3 to install...

And in the end, I have to also think about arr.resize calls which might be weird but we can likely figure something out there (although need a release note, I wouldn't be surprised if someone is insane enough to replace NumPy array allocations under our feet...).

Oops, I thought I had looked carefully for other uses of the memory handler, but missed it because I was looking for mem_handler rather than PyArray_HANDLER. I fixed the resize (though noticed what I think is a bug in it; see #29887). Turns out there is another one in textreading/rows.c, though there the array is always allocated with ndim>=1, so it is unaffected. It does make me wonder if it wouldn't be better to have a fake'ish memory handler after all, though.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 7, 2025

OK, I now also added the mem_handler == PyDataMem_DefaultHandler check. As you said, it adds about 5 to 10 ns, which is a pity, but it still is a nice improvement on before.

Separately, you mentioned somewhere (I think) that the actual allocation functions might be sped up a bit. Were you thinking about the fact that every time the handler capsule is opened? It would indeed seem worthwhile to check whether it is the default handler and then avoid that pointer chasing...

@seberg
Copy link
Member

seberg commented Oct 7, 2025

Were you thinking about the fact that every time the handler capsule is opened?

Right. I had a start here with a slightly heavy handed approach to add a custom PyCapsule_GetPointerUnchecked() that assumes the pointer is the first thing stored in the capsule object (feels like plausible Python C-API, although maybe not on GraalPython).
(UFuncs also use a capsule right now for the errstate, so one more little thing. But it was <5%, although I did most timings with spin ipython which does a debug build, which is terrible for these timings.)

It's not much, I was surprised to see that a try to avoid the any allocation for scalar ufunc inputs made only a <10%, but then profiling isn't quite as simple as that. (I should time this, I wonder if I barely notice it on the M1 in a simple %timeit)

@mhvk
Copy link
Contributor Author

mhvk commented Oct 7, 2025

Right. I had a start here with a slightly heavy handed approach to add a custom PyCapsule_GetPointerUnchecked() that assumes the pointer is the first thing stored in the capsule object (feels like plausible Python C-API, although maybe not on GraalPython).

I think one could be faster by just defining a little in-line function that has a shortcut, like

static inline PyDataMem_Hander*
_get_hander(PyObject *mem_handler_capsule) {
    if (mem_handler_capsule == PyDataMem_DefaultHandler) {
        return &default_handler;
    }
    else {
        return (PyDataMem_Handler *) PyCapsule_GetPointer(
            mem_handler_capsule, MEM_HANDLER_CAPSULE_NAME);
    }
}

and then in all the PyDataMem functions one would have

PyDataMem_Handler *handler = _get_handler(mem_handler);

which would be nicer regardless.

p.s. Am slightly sad that the handler isn't just supposed to take care of the tracking as well, i.e., that the handler itself contains pointers to UserNEW, etc. One could imagine a scenario where one just wants to dispense with that to reduce overhead, but that is not possible. Anyway, that ship has sailed, presumably.

@seberg
Copy link
Member

seberg commented Oct 7, 2025

I think one could be faster by just defining a little in-line function that has a shortcut, like

Yes, was my first thought also then realized that the GetPointerUnchecked version is probably identically quick and also works for custom ones and the extobj.
Anyway, either works, my version might not survive if we ever try with the limited API only, but I suspect that is so far off that I don't really care.

p.s. Am slightly sad that the handler isn't just supposed to take care of the tracking as well, i.e., that the handler itself contains pointers to UserNEW

Yeah, but I also think it probably made sense to not make it very hard to support tracing in a custom allocator. I.e. it does seem like an orthgonal option to me, that maybe we can even do with a global.

@mhvk mhvk force-pushed the array-allocate-small-via-object branch from 3051737 to a3c5541 Compare October 8, 2025 02:19
@mhvk
Copy link
Contributor Author

mhvk commented Oct 8, 2025

Hmm, MSVC does not like the zero-sized array, unfortunately. I went for the size-1 version, where that is not included in tp_basicsize (I do like the marker idea, explicit is better!).

@mhvk
Copy link
Contributor Author

mhvk commented Oct 8, 2025

Sorry, about Python versions. Without thinking it through: Yes, I can see doing such tricks only on Python versions that properly support it.

I checked and with python 3.12, defining an itemsize and using tp_alloc(subtype, n_items) works fine. I'm not sure that is necessarily better - it does add 8 bytes to every object for ob_size... And I'm not sure it would break all kinds of things if the array layout changed that way.

So, perhaps what's here is better. Do you think I should go for a separate allocfunc? It feels a bit cleaner, but also less explicit.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 8, 2025

Sorry, about Python versions. Without thinking it through: Yes, I can see doing such tricks only on Python versions that properly support it.

I now tested, and on python 3.12, things works properly using PyObject_VAR_HEAD and then having non-zero itemsize (and on python 3.11 itemsize=1 and 8 both fail). But would this break things with the different API versions, etc., as this inserts ob_size in the front of the structure?

I'm also still wondering about creating our own allocfunc - it is a bit cleaner, though perhaps having it explicit like now is better. Also, it is not great to have the call interpret n_items differently from what is expected of an allocator (in the sense that it does something with it even though tp_itemsize=0). So, it seems best to do this only if we go with non-zero itemsize, which would then require PyObject_VAR_HEAD as well. (In this case, it does help that we can avoid the overallocation that PyType_GenericAlloc does; it uses n_items+1).

Anyway, let me know how far you think we should pursue this... It is still a very nice performance gain (though for the ufuncs, #29819 remains a much better solution...).

@seberg
Copy link
Member

seberg commented Oct 8, 2025

it does add 8 bytes to every object for ob_size... And I'm not sure it would break all kinds of things if the array layout changed that way.

Yes, this is effectively impossible unfortunately. (You would have to force every to recompile and fix a lot of downstream projects that don't just use our headers directly. So yeah, that is completely impossible, doing it would need a year long preparation at least.)

@seberg
Copy link
Member

seberg commented Oct 8, 2025

Anyway, let me know how far you think we should pursue this...

I'll try discussing it in the meeting tonight and see if others have opinions. I think I need to let it sink a bit. It may be ugly enough that I am not sure it's worthwhile unless it is a large speedup, but it did seem pretty large. And I also could see that if we do this then merging the shape/strides is actually more worthwhile than targeting scalars.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 8, 2025

OK, I'll abandon any thought about PyObject_VAR_HEAD!

I will think a bit about incorporating strides. It would be trivial if we could guarantee they would never change, i.e., in the absence of the ability to set shape (less so strides itself) and being able to do PyArray_Resize... But given how it goes through npy_free_cache_dim_array(self), one can use a similar trick to check whether it is on the array. Might just work...

Another obvious extension would be for small arrays (say, less than tp_basicsize, so ~96 bytes or 12 doubles). Or should one just always have the data on the instance (in the absence of a memory handler)? I don't really know that there is a downside... At least not when we have completely moved to ndarray being (shallow) immutable, so the size of the data and strides cannot change.

I'm not sure I can make the meeting (not sure I even have the e-mail still), but I think it is worth mentioning that numpy scalar to array conversion happens a lot, and small arrays are common, so speeding up those paths is worthwhile. Arguably, a larger question is whether it makes sense to move towards a future where everything is part of the instance.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 13, 2025

Moving to draft since while I am sold on the idea of storing the data on the instance, what I have here is a bit too fragile. Also, would like to rebase once #29925 and #29929 are in

@mhvk mhvk marked this pull request as draft October 13, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants