Skip to content

ENH: define a gufunc for vecdot (with BLAS support)#25416

Merged
ngoldbaum merged 2 commits intonumpy:mainfrom
mhvk:vecdot-ufunc
Dec 21, 2023
Merged

ENH: define a gufunc for vecdot (with BLAS support)#25416
ngoldbaum merged 2 commits intonumpy:mainfrom
mhvk:vecdot-ufunc

Conversation

@mhvk
Copy link
Copy Markdown
Contributor

@mhvk mhvk commented Dec 18, 2023

Instead of changing the python definition of vecdot, as I tried to do in #25411, it seemed better to just define a proper gufunc for vecdot, so that we can have good blas support and just use the standard definition of axis. This PR does that in the second commit (the first reverts the addition of a python version from #25155).

Note that I don't think I have added a ufunc before (or at least forgotten about it), so please check that my implementation makes sense. In particular, it would seem that the inner loop could be selected more logically, since mostly I'm just wrapping loops defined in arraytypes.c.src. Hence, ping @seberg.

@seberg
Copy link
Copy Markdown
Member

seberg commented Dec 18, 2023

so please check that my implementation makes sense. In particular, it would seem that the inner loop could be selected more logically, since mostly I'm just wrapping loops defined in arraytypes.c.src

That seems all OK to me on first sight, nice! I remember that we have set_matmul_flags which might be important for some niche cases also for vecdot.
Matmul sometimes falls back to einsum which might be nicer. Overall, I would also be happy to move the definitions here and maybe even remove them from arraytypes.c.src (leaving the implementation NULL).

Also ping @mtsokol as I think he was maybe considering looking into this.

@mhvk
Copy link
Copy Markdown
Contributor Author

mhvk commented Dec 18, 2023

@seberg - thanks! I checked the flag setting - I don't think the reaons apply to vecdot --

/*
* The default output flag NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE allows
* perfectly overlapping input and output (in-place operations). While
* correct for the common mathematical operations, this assumption is
* incorrect in the general case and specifically in the case of matmul.
*
* NPY_ITER_UPDATEIFCOPY is added by default in
* PyUFunc_GeneralizedFunction, which is the variant called for gufuncs
* with a signature
*
* Enabling NPY_ITER_WRITEONLY can prevent a copy in some cases.
*/
((PyUFuncObject *)matmul)->op_flags[2] = (NPY_ITER_WRITEONLY |

@mhvk
Copy link
Copy Markdown
Contributor Author

mhvk commented Dec 18, 2023

Matmul sometimes falls back to einsum which might be nicer.

You mean to fall back to einsum when blas is not present?

Overall, I would also be happy to move the definitions here and maybe even remove them from arraytypes.c.src (leaving the implementation NULL).

You mean move all _dot definitions from arraytypes.c.src? I can do that, but perhaps first see whether there are other comments.

Also, I saw that matmul had a comment that one could (I think) avoid the outer loop --

/*
* TODO: refactor this out to a inner_loop_selector, in
* PyUFunc_MatmulLoopSelector. But that call does not have access to
* n, m, p and strides.
*/

This seems even more relevant for vecdot, but is this easy to do?

@seberg
Copy link
Copy Markdown
Member

seberg commented Dec 18, 2023

Yeah, don't worry about the arraytypes.c.src, was just thinking it might be nicer to move towards a future where we rely mostly on the ufunc. I don't think it is necessary here.

Using einsum could be nice, because einsum may actually have vectorized versions for these kernels, I somewhat think matmul does that, but I am not quite sure how matmul does so without checking.

This seems even more relevant for vecdot, but is this easy to do?

I am not 100% sure again, but basically, you will have to make this a "new-style" ufunc, to use the get_loop() function. That function is passed the fixed_strides arrays so it should have the strides available. OTOH, I am not sure I took care that all core dimensions are filled for gufuncs.

So, it should be doable, but I think its only useful if you have the interest and time to tinker with it.

Copy link
Copy Markdown
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only larger code block is effectively a copy of other code. So, I think this is almost good to go. Yes, it isn't super nice to reach all over the code for these inner-loops, but it is also not terrible and a gufunc is much nicer.

Some smaller notes to maybe think about:

  • IIRC, vecdot was actually defined twice, all namespaces should only have true aliases to the ufunc.
  • Do we actually still need the inner1d test at all for anything, or can we replace it everywhere? (Maybe better as a follow-up).
  • We need to make a call on Object, I think either call .conjugate() or just not define it (for now) I don't have an opinion either way (we can always add it easily after all).

@mhvk
Copy link
Copy Markdown
Contributor Author

mhvk commented Dec 19, 2023

Some smaller notes to maybe think about:

* IIRC, vecdot was actually defined twice, all namespaces should only have true aliases to the ufunc.

That is now dealt with - there remains a function np.linalg.vecdot that has a more limited, array-API compliant signature.

* Do we actually still need the `inner1d` test at all for anything, or can we replace it everywhere?  (Maybe better as a follow-up).

Right now the only usage left is for a version with a docstring. That can certainly be changed, but maybe separate from this PR?

* We need to make a call on Object, I think either call `.conjugate()` or just not define it (for now) I don't have an opinion either way (we can always add it easily after all).

I added a new loop which calls .conjugate() on the first argument.

Thanks!

@seberg
Copy link
Copy Markdown
Member

seberg commented Dec 19, 2023

np.linalg.vecdot that has a more limited, array-API compliant signature.

If the only difference is being more limited, lets make it an alias? But thanks. Need another closer look, but I suspect this is fine to go in especially if anyone else has a look and likes it.

EDIT: That is just my opinion/tendency, I don't care too much.

@mhvk
Copy link
Copy Markdown
Contributor Author

mhvk commented Dec 19, 2023

EDIT: That is just my opinion/tendency, I don't care too much.

I don't care too much either, but in linalg, we do the same for matmul, so for this PR probably best to just follow precedent...

Who else would be good for review?

@mhvk mhvk added this to the 2.0.0 release milestone Dec 19, 2023
@mhvk mhvk requested a review from mtsokol December 19, 2023 18:58
@ngoldbaum
Copy link
Copy Markdown
Member

I’ll try to take a look at this.

Copy link
Copy Markdown
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a bit of a closer look, the object loop needs a bit of cleanup IMO. But otherwise I am comfortable with merging, although if Nathan can have a quick look that would be great.

}
else {
PyObject *obj1conj = PyObject_CallMethod(*((PyObject **)ip1),
"conjugate", NULL);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to make this use an interned string, creating that is quite a bit of overhead. OTOH, tricky here for subinterpreters, so I don't care.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided not to bother...

for (i = 0; i < n; i++, ip1 += is1, ip2 += is2) {
if ((*((PyObject **)ip1) == NULL) || (*((PyObject **)ip2) == NULL)) {
tmp1 = Py_False;
Py_INCREF(Py_False);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL is None not False. This looks stolen, but I tihnk it shold be:

obj1 = *((PyObject **)ip1
if (obj1 == NULL) {
    obj1 = Py_None;
}

and the same for obj2. No need for any incref/decref at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes straight from the loop in arraytypes - is it even possible for the pointers to be NULL?

Assuming it is, I don't think we can replace by Py_None, since adding None to anything is not possible. Might it then make sense to just error straight away? At least that would be obvious.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for matmul we do the same, of just assigning Py_None, so reasonable to do that here too... Will try that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that doesn't matter: NULL is synonymous for None if it should happen. Yes, it will error, but explicitly erroring is actually also wrong (because the error will at least mention None, while mentioning NULL has no meaning in python).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, None isn't an assumption, it is the correct definition... maybe we should have a npy_load_object() helper to do it. If arraytypes does it differently, it is simply buggy...

}
else {
tmp2 = PyNumber_Add(tmp, tmp1);
Py_XDECREF(tmp);
Copy link
Copy Markdown
Member

@seberg seberg Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use result = NULL and then use Py_SETREF(result, PyNumber_Add(result, ...) also. Just a bit less awkard with tmp = tmp2 below IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much clearer!

}
tmp3 = (PyObject**) op;
tmp2 = *tmp3;
*((PyObject **)op) = tmp;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here: Py_SETREF(*((PyObject **)op), tmp)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/*
* TODO: refactor this out to an inner_loop_selector?
*/
@TYPE@_@DOT@(ip1, is1_n, ip2, is2_n, op, dn, NULL);
Copy link
Copy Markdown
Member

@seberg seberg Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed one thing...

This is annoying, because we really should have an error return, for object, the error paths are not particularly well defined (might do more operations).

It needs moving to the new API to fix, so maybe you can add a PyErr_Occured() check here for object (as much as I dislike it)?

EDIT: Pretty much any array filled with nonsense should be a valid test for this (i.e. uninitialized None). But it needs to be more than 1-D, so we get the outer loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done (and test added)

@mhvk mhvk force-pushed the vecdot-ufunc branch 2 times, most recently from a43334a to 75456d8 Compare December 20, 2023 21:11
Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'm glad we nerdsniped you into writing a BLAS-accelerated gufunc! 😛

Everything looks great to me, I just had two minor comments.

"""
Computes the vector dot product.

This function is Array API compatible, contrary to :func:`numpy.vecdot`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this wrinkle should be mentioned in the array API compat notes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but don't know where those live - here, I just copied what was done for linalg.matmul.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of this table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at that, I don't see special text for matmul -- I think this falls under np.vecdot being a superset of what the array api allows (except for the definition of the axis argument, but that is subject of data-apis/array-api#724

But maybe I should rephrase this part of the docstring to something like

This function is restricted to arguments compatible with the Array API, contrary to :func:`numpy.vecdot`.

or something similar?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that language is clearer I think. No need to touch the table I linked to.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@mhvk mhvk force-pushed the vecdot-ufunc branch 2 times, most recently from a72b945 to 8456f57 Compare December 20, 2023 21:48
*/
char *ip1=args[0], *ip2=args[1], *op=args[2];
@TYPE@_@DOT@(ip1, is1, ip2, is2, op, n_inner, NULL);
#if @TYPE@ == OBJECT
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, does this work if OBJECT is undefined, I think? I sometimes use #define IS_@TYPE@

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, just goes to show how I don't really grok pre-processing... I just added an extra iteration item...

@mhvk mhvk force-pushed the vecdot-ufunc branch 3 times, most recently from 4c68744 to 87fd5cf Compare December 20, 2023 22:46
@mhvk
Copy link
Copy Markdown
Contributor Author

mhvk commented Dec 20, 2023

OK, should now be all set (after too many pushes to try to get object be special-cased and add clongdouble in the repeat loop).

Also use it in the tests instead of the old inner1d from
the test umath functions.
@ngoldbaum
Copy link
Copy Markdown
Member

Thanks @mhvk!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants