-
-
Notifications
You must be signed in to change notification settings - Fork 12k
MAINT: cleanup of fast_loop_macros.h #13208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3b5b208 to
ff646bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're touching this - what are your thoughts on renaming it to UNARY_CONSTANT_LOOP? This is describing (arg) -> const, I'd expect OUTPUT_LOOP to describe () -> result, possibly used in something like random if that ever becomes a ufunc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me - are you thinking s/OUTPUT_/UNARY_CONSTANT_/ for everything in these two files?
numpy/core/src/umath/loops.c.src
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious - is it any slower is we just use UNARY_LOOP_FAST(npy_bool, npy_bool, *out=@val@) here? Is the compiler able to remove the unused iteration over the input?
If it is, I'd be inclined to maintain fewer macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug up the benchmark I wrote and quickly tried out your suggestion. I'll need to check my code in the morning but it looks like it causes a substantial regression:
$ asv compare upstream/master HEAD --only-changed
before after ratio
[db5fcc8e] [0c76cb6e]
<maximum_speedup~1> <bool_perf>
+ 177±3μs 234±2μs 1.32 bench_ufunc.IsNan.time_isnan('float16')
+ 2.99±0.06μs 32.8±0.2μs 10.96 bench_ufunc.IsNan.time_isnan('int16')
+ 3.09±0.5μs 33.1±1μs 10.72 bench_ufunc.IsNan.time_isnan('int32')
+ 3.07±0.07μs 33.2±0.3μs 10.80 bench_ufunc.IsNan.time_isnan('int64')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news - there was a typo in what I tested last night and there's no performance impact if we just use UNARY_LOOP_FAST instead. I've pushed a commit that uses that and removes the OUTPUT_LOOP_FAST macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting replacing op; by *out = val so that the call passes @val@ instead of the assignment statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but that was in output_loop which is now gone completely
eric-wieser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, although I might need to use a local difftool to check that only indents changed
|
@eric-wieser One thing to note is that |
numpy/core/src/umath/loops.c.src
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does changing this to (void)in; *out = @val@ do the trick?, without costing performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising - no warnings locally and no performance regression. Just rebased and pushed, so will see what Travis has to say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a comment explaining the (void) - that the macro provides an in variable, and we need to tell the compiler we are deliberately ignoring it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dd3050a to
3107b1f
Compare
3107b1f to
838abd7
Compare
|
Thanks @qwhelan |
This is a followup to #12988 and incorporates comments suggested there:
OUTPUT_LOOP_FASTnow takes thevalto output rather than*out = valloops.c.srchave been updatedtout * outhas been changed totout *outin all casescc @charris