-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize fmaxf etc. #9689
Optimize fmaxf etc. #9689
Conversation
Clever! I'm surprised by the magnitude of the speedup, but I guess that means those |
|
||
// fmin etc. are not specced to be sensitive to negative zero, and LLVM does | ||
// depend on that for optimizations, so check only the absolute value there | ||
#define TESTS(name) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay macros that make things easier to read :)
I added more tests for negative zero. Wasm and musl do handle it correctly even if the libc spec doesn't require it, so nice to make sure we don't regress that. |
Use wasm's builtin min and max operators to implement libc `fmin`, `fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly. Credit to emscripten-core/emscripten#9689 for spotting this opportunity!
Use wasm's builtin min and max operators to implement libc `fmin`, `fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly. Credit to emscripten-core/emscripten#9689 for spotting this opportunity!
The wasm builtins are very similar to the normal libc functions, except that nans are handled differently. Keep the musl nan handling, and otherwise use the builtins. This is ~41 bytes less in each fmaxf etc. function, and is 30% faster on this silly benchmark: #include <math.h> #include <stdio.h> int main() { union { int i; float f; } u; float sum = 0; const int N = 20000; for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { u.i = ((i << 15) + j + 5085) ^ j ^ (i >> 2); sum += fmaxf(u.f, 0.5); } } printf("%.2f\n", sum); } After the speedup we are about equal with gcc natively. Verified this does not change the output of our tests on this, and added more test coverage, including of negative zero which libc is not guaranteed to get right, but the implementation actually does, and using wasm builtins preserves that.
The wasm builtins are very similar to the normal libc functions, except that nans are handled differently. Keep the musl nan handling, and otherwise use the builtins. This is ~41 bytes less in each fmaxf etc. function, and is 30% faster on this silly benchmark:
After the speedup we are about equal with gcc natively.
Verified this does not change the output of our tests on this, and added more test coverage.
cc @sunfishcode