Fix illegal instruction error on CPUs without SSE4.2 support#993
Conversation
This fixes issue ohler55#989 where oj 3.16.13 crashes with "Illegal instruction" on CPUs that don't support SSE4.2. The problem was that enabling -msse4.2 at compile time based on compiler support caused prebuilt gems to include SSE4.2 instructions even when the target CPU doesn't support them. Solution: - Use runtime CPU detection to select the appropriate SIMD implementation - Detection happens once during library initialization (no per-parse overhead) - Cross-platform support: GCC/Clang (__builtin_cpu_supports) and MSVC (__cpuid) - Function-level target attributes enable SIMD code without global flags - Portable macros (OJ_PREFETCH, OJ_CTZ, OJ_UNLIKELY) for cross-compiler support This is similar to the approach used by ruby/json for SIMD detection. Fixes ohler55#989 Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
ext/oj/oj.c
Outdated
| // Initialize CPU feature detection (required on some systems) | ||
| __builtin_cpu_init(); | ||
|
|
There was a problem hiding this comment.
This isn't necessary unless we're using ifunc's. We resolve the scan_func directly so this can be removed.
| #else | ||
| // Fallback: Use CPUID instruction directly | ||
| unsigned int eax, ebx, ecx, edx; | ||
| if (__get_cpuid(1, &eax, &ebx, &ecx, &edx)) { |
There was a problem hiding this comment.
Do we need to #include <cpuid.h> for __get_cpuid?
| #define HAVE_SIMD_SSE2 1 | ||
| #else | ||
| // Try to include headers anyway for target attribute functions | ||
| #if __has_include(<x86intrin.h>) |
There was a problem hiding this comment.
Any reason we don't try to detect these in the extconf.rb?
have_header('x86intrin.h')
have_header('nmmintrin.h') # SSE4.2
have_header('emmintrin.h') # SSE2
|
Addressed the review comments: Comment 1 ( Comment 2 ( Comment 3 ( |
…lude - Remove unnecessary __builtin_cpu_init() call since we're not using IFUNCs - Add #include <cpuid.h> with __has_include guard for __get_cpuid fallback Co-Authored-By: Claude Opus 4.5 <[email protected]>
samyron
left a comment
There was a problem hiding this comment.
I think this looks good.
|
Thanks. I'll merge this evening. |
Summary
This PR fixes issue #989 where oj 3.16.13 crashes with "Illegal instruction" on CPUs that don't support SSE4.2.
The Problem:
-msse4.2at compile time based on compiler supportThe Solution:
scan_funcis a direct function pointer with zero overheadKey Changes:
simd.h: Added cross-platform compatibility macros:
OJ_PREFETCH,OJ_CTZ,OJ_LIKELY/UNLIKELYfor compiler portabilityOJ_TARGET_SSE42,OJ_TARGET_SSE2for function-level SIMD enabling<intrin.h>and_BitScanForwardoj.c: Added
oj_get_simd_implementation()function:__builtin_cpu_supports("sse4.2")__cpuidintrinsic__get_cpuidif builtins unavailableparse.c: Updated SIMD functions:
__attribute__((target("sse4.2"))))oj_scanner_init()extconf.rb: Removed global
-msse4.2flag-msse2is added (baseline for x86_64)Comparison with PR #992:
This PR addresses the same issue as #992 but adds:
__builtin_prefetch,__builtin_ctz,__builtin_expect__builtin_cpu_supportsis unavailableTest plan
ruby -I lib test/tests.rb- 445 runs, 0 failures)Fixes #989
🤖 Generated with Claude Code