Skip to content

Conversation

@mcourteaux
Copy link
Contributor

@mcourteaux mcourteaux commented Mar 17, 2025

This clears up all the preprocessor prologues or wrapper function prologues.

Fixes #8594
Fixes #8569

Update: pow() for WGSL is removed from the preamble now, and its behavior is emulated during codegen in the WGSL_CodeGen backend. There is a potential for simplyfing that again if there are meaningful bounds on expressions inferred. I'd consider that future work. Removing this from the preamble and using Halide's IR to emulate this, enables support for vectorized calls. There is a caveat with this: WGSL says there are no nans. However, their own builtin pow() function returns nan for negative inputs to x and y. Emulating that was tricky, because I think the WGSL simplifier removes those? Not sure why I couldn't simply do make_const(x.type(), std::nanf("")). I had to resort to make_const(x.type(), 1.0f) * Call::make(Float(32), "nan_f32", {}, Call::PureExtern) to get the NaN value in.

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Mar 21, 2025

I seem to have created a few issues. https://buildbot.halide-lang.org/master/#/builders/98/builds/88 (halide-testbranch-main-llvm20-arm-64-osx-cmake) shows useful errors. Will address them.

@mcourteaux
Copy link
Contributor Author

Is this Windows issue fixed by now?

JIT session error: could not register eh-frame: __register_frame function not found
Assertion failed: !G && "InFlight alloc neither abandoned nor finalized", file C:\build_bot\worker\llvm-20-x86-64-windows\llvm-project\llvm\lib\ExecutionEngine\JITLink\JITLinkMemoryManager.cpp, line 251

@steven-johnson Could you request a rebuild for those failed build bots?

@abadams
Copy link
Member

abadams commented Apr 23, 2025

Working on the windows error in #8615

@mcourteaux mcourteaux force-pushed the math_funcs_table_gpu branch from e883584 to 1c730e5 Compare April 25, 2025 20:44
@mcourteaux
Copy link
Contributor Author

@alexreinking Can you fix mac-x86-worker-2: Python package numpy is no found. If that one is not found, I'm assuming it's a fresh install and more stuff might be missing...

@mcourteaux
Copy link
Contributor Author

@abadams can you assess the test failure on arm. There is a hexagon related test that failed. I doubt it is related to my work.
@derek-gerstmann can you assess the test failure on Windows? I don't see any error.

@derek-gerstmann
Copy link
Contributor

@mcourteaux The test failure on Windows was the correctness_math test on the Vulkan backend segfaulting due to a failed compilation for mismatched types for built-in constants for inf_f32, neg_inf_f32, and nan_f32. These are handled as extern calls and they assumed they were always scalars, which caused the pow vector x2 tests to fail. I've pushed fixes to make sure proper vector constants are generated.

@abadams abadams requested a review from derek-gerstmann May 6, 2025 19:38
@abadams
Copy link
Member

abadams commented May 6, 2025

I will defer to Derek for approval

Copy link
Contributor

@derek-gerstmann derek-gerstmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need so many alias macros or is there cleaner way? Otherwise LGTM.

@mcourteaux
Copy link
Contributor Author

Well, so my motivation was:

  1. Halide uses suffixes for floating-point math functions, like _f32, _f64, _f16.
  2. Every graphics API spec declares those math functions, but are tend to have different names, depending on the API (e.g., fabs() vs overloaded abs()).
  3. Not every API provides implementations for f16 or f64 variants of the math functions.

As such, general-purpose Halide IR must use sin_f32 as the function to compute sine of (a vector of) 32-bit floats. Halide's C backend runtime provides those functions explicitly and simply redirects them to libm (sin, sinf), and we pay for the extra function-call overhead. LLVM-based backends automatically scalarize those calls.

Given the above, a general-purpose mapping table in the C codegen base-class seemed like the right place to put all of this shared functionality to rewrite extern function calls to have different function names (e.g., from sin_f32 to sin).

In general, there is a lot less code now, compared to all of the #defines in the shader code.
The alias macros each write 2 or 3 renaming entries into the table. But as I said, automating the generation of those entries is not elegant, as every backend uses different naming schemes, and some don't even provide functionality with the same semantics as we expect (i.e., WebGPU with pow()). In my opinion, I replaced several long tables with several short tables, with increased functionality (i.e., support for vector-arguments).

@mcourteaux mcourteaux added code_cleanup No functional changes. Reformatting, reorganizing, or refactoring existing code. gpu labels May 22, 2025
@mcourteaux mcourteaux force-pushed the math_funcs_table_gpu branch from d675a4b to 067cc26 Compare May 25, 2025 09:34
@mcourteaux
Copy link
Contributor Author

Ping @derek-gerstmann.

Copy link
Contributor

@derek-gerstmann derek-gerstmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was merely commenting on the usage of #define alias which didn't seem necessary since it only replicated 2-3 lines of code and hid the actual mapping declaration.

@mcourteaux mcourteaux merged commit 4a08ef9 into halide:main May 27, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code_cleanup No functional changes. Reformatting, reorganizing, or refactoring existing code. gpu

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WGSL backend does not support vectorized function calls. Correct soname for libOpenCL.so?

3 participants