ENH: define a gufunc for vecdot (with BLAS support)#25416
ENH: define a gufunc for vecdot (with BLAS support)#25416ngoldbaum merged 2 commits intonumpy:mainfrom
Conversation
That seems all OK to me on first sight, nice! I remember that we have Also ping @mtsokol as I think he was maybe considering looking into this. |
|
@seberg - thanks! I checked the flag setting - I don't think the reaons apply to numpy/numpy/_core/src/umath/ufunc_object.c Lines 263 to 275 in 2cc2112 |
You mean to fall back to
You mean move all Also, I saw that numpy/numpy/_core/src/umath/matmul.c.src Lines 442 to 446 in 2cc2112 This seems even more relevant for |
|
Yeah, don't worry about the Using einsum could be nice, because einsum may actually have vectorized versions for these kernels, I somewhat think matmul does that, but I am not quite sure how matmul does so without checking.
I am not 100% sure again, but basically, you will have to make this a "new-style" ufunc, to use the So, it should be doable, but I think its only useful if you have the interest and time to tinker with it. |
seberg
left a comment
There was a problem hiding this comment.
The only larger code block is effectively a copy of other code. So, I think this is almost good to go. Yes, it isn't super nice to reach all over the code for these inner-loops, but it is also not terrible and a gufunc is much nicer.
Some smaller notes to maybe think about:
- IIRC, vecdot was actually defined twice, all namespaces should only have true aliases to the ufunc.
- Do we actually still need the
inner1dtest at all for anything, or can we replace it everywhere? (Maybe better as a follow-up). - We need to make a call on Object, I think either call
.conjugate()or just not define it (for now) I don't have an opinion either way (we can always add it easily after all).
That is now dealt with - there remains a function
Right now the only usage left is for a version with a docstring. That can certainly be changed, but maybe separate from this PR?
I added a new loop which calls Thanks! |
If the only difference is being more limited, lets make it an alias? But thanks. Need another closer look, but I suspect this is fine to go in especially if anyone else has a look and likes it. EDIT: That is just my opinion/tendency, I don't care too much. |
I don't care too much either, but in Who else would be good for review? |
|
I’ll try to take a look at this. |
seberg
left a comment
There was a problem hiding this comment.
Had a bit of a closer look, the object loop needs a bit of cleanup IMO. But otherwise I am comfortable with merging, although if Nathan can have a quick look that would be great.
numpy/_core/src/umath/matmul.c.src
Outdated
| } | ||
| else { | ||
| PyObject *obj1conj = PyObject_CallMethod(*((PyObject **)ip1), | ||
| "conjugate", NULL); |
There was a problem hiding this comment.
Would be nice to make this use an interned string, creating that is quite a bit of overhead. OTOH, tricky here for subinterpreters, so I don't care.
There was a problem hiding this comment.
Decided not to bother...
numpy/_core/src/umath/matmul.c.src
Outdated
| for (i = 0; i < n; i++, ip1 += is1, ip2 += is2) { | ||
| if ((*((PyObject **)ip1) == NULL) || (*((PyObject **)ip2) == NULL)) { | ||
| tmp1 = Py_False; | ||
| Py_INCREF(Py_False); |
There was a problem hiding this comment.
NULL is None not False. This looks stolen, but I tihnk it shold be:
obj1 = *((PyObject **)ip1
if (obj1 == NULL) {
obj1 = Py_None;
}
and the same for obj2. No need for any incref/decref at all.
There was a problem hiding this comment.
This comes straight from the loop in arraytypes - is it even possible for the pointers to be NULL?
Assuming it is, I don't think we can replace by Py_None, since adding None to anything is not possible. Might it then make sense to just error straight away? At least that would be obvious.
There was a problem hiding this comment.
But for matmul we do the same, of just assigning Py_None, so reasonable to do that here too... Will try that.
There was a problem hiding this comment.
Yes, and that doesn't matter: NULL is synonymous for None if it should happen. Yes, it will error, but explicitly erroring is actually also wrong (because the error will at least mention None, while mentioning NULL has no meaning in python).
There was a problem hiding this comment.
Basically, None isn't an assumption, it is the correct definition... maybe we should have a npy_load_object() helper to do it. If arraytypes does it differently, it is simply buggy...
numpy/_core/src/umath/matmul.c.src
Outdated
| } | ||
| else { | ||
| tmp2 = PyNumber_Add(tmp, tmp1); | ||
| Py_XDECREF(tmp); |
There was a problem hiding this comment.
Could use result = NULL and then use Py_SETREF(result, PyNumber_Add(result, ...) also. Just a bit less awkard with tmp = tmp2 below IMO.
numpy/_core/src/umath/matmul.c.src
Outdated
| } | ||
| tmp3 = (PyObject**) op; | ||
| tmp2 = *tmp3; | ||
| *((PyObject **)op) = tmp; |
There was a problem hiding this comment.
Also here: Py_SETREF(*((PyObject **)op), tmp)
numpy/_core/src/umath/matmul.c.src
Outdated
| /* | ||
| * TODO: refactor this out to an inner_loop_selector? | ||
| */ | ||
| @TYPE@_@DOT@(ip1, is1_n, ip2, is2_n, op, dn, NULL); |
There was a problem hiding this comment.
Ah, missed one thing...
This is annoying, because we really should have an error return, for object, the error paths are not particularly well defined (might do more operations).
It needs moving to the new API to fix, so maybe you can add a PyErr_Occured() check here for object (as much as I dislike it)?
EDIT: Pretty much any array filled with nonsense should be a valid test for this (i.e. uninitialized None). But it needs to be more than 1-D, so we get the outer loop.
There was a problem hiding this comment.
OK, done (and test added)
a43334a to
75456d8
Compare
ngoldbaum
left a comment
There was a problem hiding this comment.
Nice, I'm glad we nerdsniped you into writing a BLAS-accelerated gufunc! 😛
Everything looks great to me, I just had two minor comments.
numpy/linalg/_linalg.py
Outdated
| """ | ||
| Computes the vector dot product. | ||
|
|
||
| This function is Array API compatible, contrary to :func:`numpy.vecdot`. |
There was a problem hiding this comment.
Should this wrinkle should be mentioned in the array API compat notes?
There was a problem hiding this comment.
I could, but don't know where those live - here, I just copied what was done for linalg.matmul.
There was a problem hiding this comment.
Looking at that, I don't see special text for matmul -- I think this falls under np.vecdot being a superset of what the array api allows (except for the definition of the axis argument, but that is subject of data-apis/array-api#724
But maybe I should rephrase this part of the docstring to something like
This function is restricted to arguments compatible with the Array API, contrary to :func:`numpy.vecdot`.
or something similar?
There was a problem hiding this comment.
Yeah, that language is clearer I think. No need to touch the table I linked to.
a72b945 to
8456f57
Compare
numpy/_core/src/umath/matmul.c.src
Outdated
| */ | ||
| char *ip1=args[0], *ip2=args[1], *op=args[2]; | ||
| @TYPE@_@DOT@(ip1, is1, ip2, is2, op, n_inner, NULL); | ||
| #if @TYPE@ == OBJECT |
There was a problem hiding this comment.
Hmmm, does this work if OBJECT is undefined, I think? I sometimes use #define IS_@TYPE@
There was a problem hiding this comment.
OK, just goes to show how I don't really grok pre-processing... I just added an extra iteration item...
4c68744 to
87fd5cf
Compare
|
OK, should now be all set (after too many pushes to try to get object be special-cased and add |
Also use it in the tests instead of the old inner1d from the test umath functions.
|
Thanks @mhvk! |
Instead of changing the python definition of
vecdot, as I tried to do in #25411, it seemed better to just define a proper gufunc forvecdot, so that we can have good blas support and just use the standard definition ofaxis. This PR does that in the second commit (the first reverts the addition of a python version from #25155).Note that I don't think I have added a ufunc before (or at least forgotten about it), so please check that my implementation makes sense. In particular, it would seem that the inner loop could be selected more logically, since mostly I'm just wrapping loops defined in
arraytypes.c.src. Hence, ping @seberg.