Skip to content

Increase fuzzing test coverage#948

Merged
lemire merged 7 commits intomasterfrom
claude/increase-fuzzing-coverage-IuhFE
Mar 30, 2026
Merged

Increase fuzzing test coverage#948
lemire merged 7 commits intomasterfrom
claude/increase-fuzzing-coverage-IuhFE

Conversation

@anonrig
Copy link
Copy Markdown
Member

@anonrig anonrig commented Mar 23, 2026

Asked Claude to improve the fuzzing test coverage. In Ada, it helped us find 5 bugs.

claude added 2 commits March 23, 2026 13:07
Add fuzz/with_replacement.cpp:
  Differentially tests convert_utf16le_to_utf8_with_replacement and
  convert_utf16be_to_utf8_with_replacement (and their length functions).
  These functions handle unpaired surrogates by replacing them with U+FFFD
  and were previously unexercised by any fuzz target.
  Checks: all impls agree, length function matches written bytes, output is
  valid UTF-8, and no-surrogate inputs match regular conversion length.

Add fuzz/base64_details.cpp:
  Differentially tests base64_to_binary_details (char and char16_t) which
  returns a full_result carrying both input_count and output_count even on
  errors. Cross-checks with base64_to_binary via the full_result->result
  cast operator, verifies bounds on input_count/output_count, and confirms
  output bytes agree on success.

Add fuzz/find.cpp:
  Differentially tests find(char*,char*,char) and
  find(char16_t*,char16_t*,char16_t) across implementations.
  Checks: result in [start,end], *result==needle when found, no earlier
  occurrence exists, and needle is absent when result==end.

Update fuzz/CMakeLists.txt to register the three new targets.

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
@anonrig anonrig requested a review from lemire March 23, 2026 13:13
These were the remaining uncovered virtual methods in the implementation API.

case 9 — validate_utf16_as_ascii:
  Differentially tests validate_utf16le_as_ascii and validate_utf16be_as_ascii
  across all implementations. Also verifies that a true result implies the
  input is valid UTF-16 (ASCII is a strict subset).

case 10 — to_well_formed_utf16:
  Differentially tests to_well_formed_utf16le and to_well_formed_utf16be.
  Checks: all impls agree on output, output is always valid UTF-16, and
  a valid input is passed through unchanged.

Ncases bumped from 9 to 11. The actionmask remains 15 (bit_ceil(11)==16)
so the change is backwards-compatible with existing corpus files.

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
Comment thread fuzz/base64_details.cpp Fixed
claude added 4 commits March 27, 2026 23:33
When base64_to_binary_details returns an error (e.g. INVALID_BASE64_CHARACTER),
output_count reflects how many bytes were written before the error was detected.
Different SIMD implementations process input in different chunk widths (64, 32,
16, or 1 characters) so they may flush different amounts of output before
stopping -- this is valid implementation-defined behaviour, not a bug.

Only compare output_count (and output_hash) across implementations when the
result is SUCCESS, where the output is fully determined.  error, input_count,
and padding_error are still compared unconditionally.

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
With stop_before_partial and trailing ignored whitespace, some
implementations include the trailing whitespace in input_count while
others do not. On SUCCESS, input_count is not part of the base contract
(full_result::operator result() uses output_count on success), so it
legitimately differs. Only compare input_count when the conversion
fails, where it carries the error position and must be deterministic.

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
mingw-w64-i686-clang is no longer available in MSYS2's package
repository, causing the MINGW32 build jobs to fail at the package
installation step. MINGW32 (i686) with GCC is already covered by
msys2.yml, so remove the broken MINGW32+clang entries from this
workflow.

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
@anonrig
Copy link
Copy Markdown
Member Author

anonrig commented Mar 28, 2026

@lemire would you mind reviewing this?

@pauldreik
Copy link
Copy Markdown
Collaborator

increasing fuzz coverage is good!

have you seen the newly added https://github.com/simdutf/simdutf/blob/master/AI_USAGE_POLICY.md ?

have you run the fuzzers, and if so on which architectures?

have you evaluated the coverage somehow?

@anonrig
Copy link
Copy Markdown
Member Author

anonrig commented Mar 28, 2026

increasing fuzz coverage is good!

have you seen the newly added https://github.com/simdutf/simdutf/blob/master/AI_USAGE_POLICY.md ?

have you run the fuzzers, and if so on which architectures?

have you evaluated the coverage somehow?

Yes, @lemire wrote a similar one for Ada. I've run it on x64 linux only. Unfortunately, other than oss-fuzz code coverage, I don't know a way of generating fuzzing coverage.

@lemire
Copy link
Copy Markdown
Member

lemire commented Mar 28, 2026

It is on my todo to review this. There are nearly 800 lines of code to review so it will take time.

@lemire lemire merged commit 2f1ec65 into master Mar 30, 2026
106 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants