-
-
Notifications
You must be signed in to change notification settings - Fork 12k
ENH: Implement axis for generalized ufuncs. #11018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from goto fail to return here and below is because in fail I now need to decref axes (since it might be a new object created from axis). Fortunately, we have not taken any references here (and below), so it is safe to just return.
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not strictly necessary (and checked); signatures like (i),(j)->() are fine for keepdims. Will change after having had further review.
1ba5e99 to
4c98f44
Compare
4c98f44 to
7e8763e
Compare
doc/release/1.15.0-notes.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to clarify (and test!) it this also works if the output has the same dimension. For example, a signature of (i)->(i).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. The problem is that the _umath_test module does not contain any example of a gufunc with such a signature, and I don't feel quite up to writing one... (sadly, linalg also does not have one). Note, though, that the code path does get tested - with keepdims=True, any no-core arguments do get upgraded to having a single core dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. Of course this isn't a reason to clarify the documentation! Indeed, the keepdims case itself might be useful for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now added a bit more text to keepdims to clarify this.
doc/source/reference/ufuncs.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: missing broadcast and this incorrectly states the single dimension needs to be shared.
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment with the python type would be nice for this and the above line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean something like
# list of tuples of specific axes for each argument
# axis to use for a shared, single core dimension
Am not sure it will help much to have that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like that. Maybe even just /* type: T */ and /* type: List[Tuple[T]] */, adopting PEP484 syntax (and considering @shoyer's remarks about non-int types
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if value is np.false_? How do we handle this in subok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply copied the stanza from subok, but agree it is somewhat odd. I'd prefer to leave it for a quick MAINT PR, though.
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use a npy_bool for this and out_subok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the main routine, it is int, though, and I use -1 as a value to indicate "unitialized". Overall, perhaps best done in separate cleanup.
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see the test separate from the error message. Something like bool _ufunc_supports_keepdims(ufunc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. I'll do the same for axis (even though it is a trivial one).
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is keepdims being added in the loop? That seems wrong to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does need to be there, so that the iterator can allocate dimensions in the correct order (see comment just above this piece of code)
numpy/core/tests/test_ufunc.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little misleading for someone greping for in1d and looking for np.in1d
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of computing this over and over again, why not something like this at the start:
npy_intp *core_num_dims;
if (keepdims) {
// or could just use a static buffer
core_num_dims = alloca(sizeof(*core_num_dims) * NPY_MAX_DIMS);
for (i = 0; i < nin; ++i) {
core_num_dims[i] = ufunc->core_num_dims[i];
}
for(; i < nop; i++) {
core_num_dims[i] = 1;
}
}
else {
core_num_dims = ufunc->core_num_dims;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, better indeed, especially since all dimensions are just 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems appveyor/windows does not like alloca, so I now just use a static buffer.
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to think that we should have a type hierarchy for gufuncs, rather than making methods available or not based on their properties. Not something to deal with in this PR, but something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. For that purpose, the separate _check_..._support functions are also sensible.
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was better with the two ternaries line wrapped to match up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now have all arguments on their own line.
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just leave it alone then? This doesn't seem to justify doing either at the moment.
Edit: Looks like the real reason we need it is we allocate a new axes below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll edit the comment to state this explicitly.
7e8763e to
1842a95
Compare
|
OK, pushed up changes... |
7f9032e to
83bbb65
Compare
|
I couldn't resist - I now allow |
83bbb65 to
93297e0
Compare
|
Rebased to get rid of the merge conflict. Am fairly happy with this and hope it can go in 1.15. |
|
@eric-wieser @shoyer Be good for the previous reviewers to sign off on this, or not. |
numpy/core/src/umath/override.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this optimization might make more sense inside normalize_signature_keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I put it here is that normalize_signature_keyword gets passed normal_kwds, which by this time may well have gotten an out entry even if there were no original keywords.
numpy/core/src/umath/override.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should axis be converted to axes here, so that __array_ufunc__ only needs to support axes?
Perhaps a normalize_axis_keyword(*normal_kwds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think __array_ufunc__ should just get whatever is passed in, since in principle at least it may act differently than the normal ufunc. In fact, I am somewhat torn about even including the check here; we do just pass on everything else that the user passes in, even if it would give an error message if no __array_ufunc__ is present. But then, we do check sig and signature. But then again, that is a deprecation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the whole point in these normalization function not to pass on whatever was passed? I'd argue that by not normalizing axis, we make it harder for overriders to implement it and axes correctly.
The question really is - is your description of how axis maps to axes an invariant of ufuncs, or just one for the specific case of ndarrays? I think that if a short-hand can mean different things for different subclasses, then it's a dangerous shorthand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I saw it, the normalization was mostly to ensure all but the inputs were in keyword arguments, so that __array_ufunc__ implementations didn't have to worry about that; sig and signature seemed to be there just because sig is deprecated.
But I can see your argument as well that one should just translate axis because we should not encourage __array_ufunc__ implementations that can handle axis but not axes. I'll try to see how it looks if I split out the conversion of axis to axes as a helper function, as you suggest below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding myself slightly stuck: I now have a nice _build_axes_tuple in ufunc_object.c, but to use this in override.c, I'd need to include it in a .h file -- but in umath, those seem to contain only stuff to be exported, which this obviously is not. Most logical might be to start a new ufunc_parser.c or ufunc_helper.c which contains all these helper functions for parsing, but that feels a bit beyond this PR...
An alternative might be just to remove the override commit for now, with the logic that __array_ufunc__ implementations should do whatever they want; they'll get an error anyway if they pass this back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but let's make sure it works for arbitrary Python objects, not just integers.
+1 on this. axis='rows' for a ufunc like (i),(i) -> () should end up as axes=[['rows'], ['rows'], []]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current normalization does work for arbitrary objects. It just puts axis in a single-element tuple, and creates a list with that tuple as entries (or an empty one if an argument has no core dimensions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhvk: Can you just put it in numpy\core\src\umath\ufunc_object.h? Those functions are not exported either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, putting it in ufunc_object.h works (I had mistakenly looked in ufuncobject.h, which does have things for export...), but it has a problem that then we need even more parsing... (I'll move to the main thread, since I think this discussion should not disappear after another change.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the fact that both of those filenames exist is super confusing...
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: type: Tuple[np.dtype] would match the mypy-style annotation used below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inconsistent spacing. Could break this over multiple lines too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add this to the ufunc doc signature template? Arguments against: That template is only really applied to non-gufuncs right now, which don;t support this argument. Making me think that #8868 might be worth picking back up.
doc/source/reference/ufuncs.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Not supported for non-generalized ufuncs" might be nice. Does np.add([1], [2], axis=0) fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does say specifically "over which a generalized ufunc should operate. And, yes, an error is raised:
In [2]: np.add(1., 1., axis=0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-578c404b564c> in <module>()
----> 1 np.add(1., 1., axis=0)
TypeError: 'axis' is an invalid keyword to ufunc 'add'
But this was not tested - now added.
|
I agree that we should not add |
0283828 to
93d4b6f
Compare
cf5eaa6 to
580399d
Compare
|
Rebased on top of #11257 - which means |
580399d to
5a1fd86
Compare
|
@charris are you comfortable with this going into 1.15? |
|
It does complement |
|
@mattip I'm letting the |
|
I'll probably kick all this off to 1.16 come Friday otherwise, as I'd like to get 1.15.0rc1 out this weekend. |
shoyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I would still be happier if we had a test case ufunc that included an output core dimension (e.g., with signature='(i)->(i)').
numpy/core/src/umath/ufunc_object.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this RuntimeError here but TypeError above (in override.c)? I would expect both to give the same error (probably TypeError).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was for symmetry with sig and signature, but I must admit a TypeError makes much more sense, so I've changed it.
It is only allowed for gufuncs with a single, shared core dimension.
Both in the general documentation and in the release notes.
Much faster, and much less code duplication than I feared.
5a1fd86 to
b02fa32
Compare
|
@shoyer - now that I've been writing other gufuncs in |
shoyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thanks for adding the cumsum tests!
EDIT: this is now for
axisonly - keepdims has been added.Follow-up on #8819, specifically for generalized ufuncs that are like reductions on a single axis.
EDIT: most of the below has become irrelevant.
As is, the implementation is not super-elegant:EDIT: now quite elegant.keepdims,checks are added every time the output core dimensions are accessed; it might make more sense to adjust just once.EDIT: this is now solved.axis,a full(EDIT) currently a separate routine is used from that foraxeslist is constructed rather than directly use the value (an advantage, though, is that an extension to passing in a tuple orNoneforaxiswould be that much easier).axes, which means that any reinterpretation of what values it can have (currently, only int) would have to be done in two places. This would involve only a trivial refactor, though.But I think this is worth having a careful look. It would be nice to have this in 1.15, together with the
axesargument.p.s. I tried to keep the commits adding
keepdimsandaxisseparate, as they are somewhat separate enhancements (and it might help review).EDIT: they certainly helped to find an annoying reference counting bug. All seems OK now.