Skip to content

Conversation

@qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Mar 29, 2019

This is a followup to #12988 and incorporates comments suggested there:

  • OUTPUT_LOOP_FAST now takes the val to output rather than *out = val
    • The two call sites in loops.c.src have been updated
  • Whitespace between macros inserted
  • tout * out has been changed to tout *out in all cases
  • All macros have had their indentation fixed

cc @charris

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're touching this - what are your thoughts on renaming it to UNARY_CONSTANT_LOOP? This is describing (arg) -> const, I'd expect OUTPUT_LOOP to describe () -> result, possibly used in something like random if that ever becomes a ufunc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me - are you thinking s/OUTPUT_/UNARY_CONSTANT_/ for everything in these two files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious - is it any slower is we just use UNARY_LOOP_FAST(npy_bool, npy_bool, *out=@val@) here? Is the compiler able to remove the unused iteration over the input?

If it is, I'd be inclined to maintain fewer macros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug up the benchmark I wrote and quickly tried out your suggestion. I'll need to check my code in the morning but it looks like it causes a substantial regression:

$ asv compare upstream/master HEAD --only-changed
       before           after         ratio
     [db5fcc8e]       [0c76cb6e]
     <maximum_speedup~1>       <bool_perf>
+         177±3μs          234±2μs     1.32  bench_ufunc.IsNan.time_isnan('float16')
+     2.99±0.06μs       32.8±0.2μs    10.96  bench_ufunc.IsNan.time_isnan('int16')
+      3.09±0.5μs         33.1±1μs    10.72  bench_ufunc.IsNan.time_isnan('int32')
+     3.07±0.07μs       33.2±0.3μs    10.80  bench_ufunc.IsNan.time_isnan('int64')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news - there was a typo in what I tested last night and there's no performance impact if we just use UNARY_LOOP_FAST instead. I've pushed a commit that uses that and removes the OUTPUT_LOOP_FAST macros.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting replacing op; by *out = val so that the call passes @val@ instead of the assignment statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but that was in output_loop which is now gone completely

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, although I might need to use a local difftool to check that only indents changed

@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 30, 2019

@eric-wieser One thing to note is that UNARY_LOOP_FAST declares in, which is unused in the output mode and thus generates unused variable warnings from gcc. This is the cause of the Travis failure and I'm not sure on the preferred way to suppress or resolve that warning.

Copy link
Member

@eric-wieser eric-wieser Mar 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing this to (void)in; *out = @val@ do the trick?, without costing performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising - no warnings locally and no performance regression. Just rebased and pushed, so will see what Travis has to say

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a comment explaining the (void) - that the macro provides an in variable, and we need to tell the compiler we are deliberately ignoring it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mattip mattip merged commit a8eca5c into numpy:master Mar 31, 2019
@mattip
Copy link
Member

mattip commented Mar 31, 2019

Thanks @qwhelan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants