Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize fmaxf etc. #9689

Merged
merged 5 commits into from
Oct 23, 2019
Merged

Optimize fmaxf etc. #9689

merged 5 commits into from
Oct 23, 2019

Conversation

kripken
Copy link
Member

@kripken kripken commented Oct 22, 2019

The wasm builtins are very similar to the normal libc functions, except that nans are handled differently. Keep the musl nan handling, and otherwise use the builtins. This is ~41 bytes less in each fmaxf etc. function, and is 30% faster on this silly benchmark:

#include <math.h>
#include <stdio.h>

int main() {
  union {
    int i;
    float f;
  } u;
  float sum = 0;
  const int N = 20000;
  for (int i = 0; i < N; i++) {
    for (int j = 0; j < N; j++) {
      u.i = ((i << 15) + j + 5085) ^ j ^ (i >> 2);
      sum += fmaxf(u.f, 0.5);
    }
  }
  printf("%.2f\n", sum);
}

After the speedup we are about equal with gcc natively.

Verified this does not change the output of our tests on this, and added more test coverage.

cc @sunfishcode

@kripken kripken requested review from sbc100 and tlively October 22, 2019 19:19
@sunfishcode
Copy link
Collaborator

Clever! I'm surprised by the magnitude of the speedup, but I guess that means those signbit calls aren't easy to optimize.


// fmin etc. are not specced to be sensitive to negative zero, and LLVM does
// depend on that for optimizations, so check only the absolute value there
#define TESTS(name) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay macros that make things easier to read :)

@kripken
Copy link
Member Author

kripken commented Oct 23, 2019

I added more tests for negative zero. Wasm and musl do handle it correctly even if the libc spec doesn't require it, so nice to make sure we don't regress that.

@kripken kripken merged commit 3bd5e10 into incoming Oct 23, 2019
@delete-merged-branch delete-merged-branch bot deleted the fs branch October 23, 2019 22:44
sunfishcode added a commit to WebAssembly/wasi-libc that referenced this pull request Oct 25, 2019
Use wasm's builtin min and max operators to implement libc `fmin`,
`fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly.

Credit to emscripten-core/emscripten#9689
for spotting this opportunity!
sunfishcode added a commit to WebAssembly/wasi-libc that referenced this pull request Nov 8, 2019
Use wasm's builtin min and max operators to implement libc `fmin`,
`fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly.

Credit to emscripten-core/emscripten#9689
for spotting this opportunity!
sunfishcode added a commit to WebAssembly/wasi-libc that referenced this pull request Nov 20, 2019
Use wasm's builtin min and max operators to implement libc `fmin`,
`fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly.

Credit to emscripten-core/emscripten#9689
for spotting this opportunity!
belraquib pushed a commit to belraquib/emscripten that referenced this pull request Dec 23, 2020
The wasm builtins are very similar to the normal libc functions, except that
nans are handled differently. Keep the musl nan handling, and otherwise use the
builtins. This is ~41 bytes less in each fmaxf etc. function, and is 30% faster
on this silly benchmark:

#include <math.h>
#include <stdio.h>

int main() {
  union {
    int i;
    float f;
  } u;
  float sum = 0;
  const int N = 20000;
  for (int i = 0; i < N; i++) {
    for (int j = 0; j < N; j++) {
      u.i = ((i << 15) + j + 5085) ^ j ^ (i >> 2);
      sum += fmaxf(u.f, 0.5);
    }
  }
  printf("%.2f\n", sum);
}

After the speedup we are about equal with gcc natively.

Verified this does not change the output of our tests on this, and added more
test coverage, including of negative zero which libc is not guaranteed to get
right, but the implementation actually does, and using wasm builtins preserves
that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants