Skip to content

Fix illegal instruction error on CPUs without SSE4.2 support#993

Merged
ohler55 merged 3 commits intoohler55:developfrom
sebyx07:fix/runtime-simd-detection-crossplatform
Jan 27, 2026
Merged

Fix illegal instruction error on CPUs without SSE4.2 support#993
ohler55 merged 3 commits intoohler55:developfrom
sebyx07:fix/runtime-simd-detection-crossplatform

Conversation

@sebyx07
Copy link
Copy Markdown
Contributor

@sebyx07 sebyx07 commented Jan 24, 2026

Summary

This PR fixes issue #989 where oj 3.16.13 crashes with "Illegal instruction" on CPUs that don't support SSE4.2.

The Problem:

  • Commit 318bf55 enabled -msse4.2 at compile time based on compiler support
  • This caused prebuilt/distributed gems to include SSE4.2 instructions
  • When these gems run on CPUs without SSE4.2, they crash with "Illegal instruction"

The Solution:

  • Use runtime CPU detection to select the appropriate SIMD implementation
  • Detection happens once during library initialization (no per-parse overhead)
  • After initialization, scan_func is a direct function pointer with zero overhead
  • Cross-platform support for Windows (MSVC), Linux, and macOS (GCC/Clang)

Key Changes:

  1. simd.h: Added cross-platform compatibility macros:

    • OJ_PREFETCH, OJ_CTZ, OJ_LIKELY/UNLIKELY for compiler portability
    • OJ_TARGET_SSE42, OJ_TARGET_SSE2 for function-level SIMD enabling
    • MSVC support via <intrin.h> and _BitScanForward
  2. oj.c: Added oj_get_simd_implementation() function:

    • GCC/Clang: Uses __builtin_cpu_supports("sse4.2")
    • MSVC: Uses __cpuid intrinsic
    • Fallback to __get_cpuid if builtins unavailable
  3. parse.c: Updated SIMD functions:

    • Added target attributes (__attribute__((target("sse4.2"))))
    • Use portable macros instead of GCC-specific builtins
    • Runtime selection in oj_scanner_init()
  4. extconf.rb: Removed global -msse4.2 flag

    • Only -msse2 is added (baseline for x86_64)
    • SSE4.2 is enabled per-function via target attributes

Comparison with PR #992:
This PR addresses the same issue as #992 but adds:

  • Full Windows/MSVC support
  • Portable macros for __builtin_prefetch, __builtin_ctz, __builtin_expect
  • Fallback CPUID detection when __builtin_cpu_supports is unavailable

Test plan

  • All existing tests pass (ruby -I lib test/tests.rb - 445 runs, 0 failures)
  • Long string tests pass (exercises SIMD string scanning)
  • Strict mode tests pass
  • Basic parsing tests with special characters
  • Performance verified (~575k parses/second)

Fixes #989

🤖 Generated with Claude Code

sebyx07 and others added 2 commits January 24, 2026 10:32
This fixes issue ohler55#989 where oj 3.16.13 crashes with "Illegal instruction"
on CPUs that don't support SSE4.2.

The problem was that enabling -msse4.2 at compile time based on compiler
support caused prebuilt gems to include SSE4.2 instructions even when the
target CPU doesn't support them.

Solution:
- Use runtime CPU detection to select the appropriate SIMD implementation
- Detection happens once during library initialization (no per-parse overhead)
- Cross-platform support: GCC/Clang (__builtin_cpu_supports) and MSVC (__cpuid)
- Function-level target attributes enable SIMD code without global flags
- Portable macros (OJ_PREFETCH, OJ_CTZ, OJ_UNLIKELY) for cross-compiler support

This is similar to the approach used by ruby/json for SIMD detection.

Fixes ohler55#989

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
ext/oj/oj.c Outdated
Comment on lines +1818 to +1820
// Initialize CPU feature detection (required on some systems)
__builtin_cpu_init();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary unless we're using ifunc's. We resolve the scan_func directly so this can be removed.

#else
// Fallback: Use CPUID instruction directly
unsigned int eax, ebx, ecx, edx;
if (__get_cpuid(1, &eax, &ebx, &ecx, &edx)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to #include <cpuid.h> for __get_cpuid?

#define HAVE_SIMD_SSE2 1
#else
// Try to include headers anyway for target attribute functions
#if __has_include(<x86intrin.h>)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we don't try to detect these in the extconf.rb?

have_header('x86intrin.h')
have_header('nmmintrin.h') # SSE4.2
have_header('emmintrin.h') # SSE2

@sebyx07
Copy link
Copy Markdown
Contributor Author

sebyx07 commented Jan 25, 2026

Addressed the review comments:

Comment 1 (__builtin_cpu_init): Agree - removed. You're right that this is only necessary for IFUNCs where the resolver runs before C runtime initialization. Since we call this from regular code, CPU features are already initialized.

Comment 2 (cpuid.h include): Good catch! Added #include <cpuid.h> in simd.h with a __has_include guard for the GCC/Clang path.

Comment 3 (extconf.rb header detection): That's a good suggestion and would be more idiomatic for Ruby extensions. For now I've kept the __has_include approach since it works at the preprocessor level without requiring extconf.rb changes, but this could be refactored to use have_header() in a follow-up if preferred.

…lude

- Remove unnecessary __builtin_cpu_init() call since we're not using IFUNCs
- Add #include <cpuid.h> with __has_include guard for __get_cpuid fallback

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link
Copy Markdown
Contributor

@samyron samyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good.

@ohler55
Copy link
Copy Markdown
Owner

ohler55 commented Jan 26, 2026

Thanks. I'll merge this evening.

@ohler55 ohler55 merged commit 2e92056 into ohler55:develop Jan 27, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

oj gem 3.16.13 causes an illegal instruction on CPUs that do not support SSE4.2

3 participants