Increase fuzzing test coverage by anonrig · Pull Request #948 · simdutf/simdutf

anonrig · 2026-03-23T13:13:21Z

Asked Claude to improve the fuzzing test coverage. In Ada, it helped us find 5 bugs.

Add fuzz/with_replacement.cpp: Differentially tests convert_utf16le_to_utf8_with_replacement and convert_utf16be_to_utf8_with_replacement (and their length functions). These functions handle unpaired surrogates by replacing them with U+FFFD and were previously unexercised by any fuzz target. Checks: all impls agree, length function matches written bytes, output is valid UTF-8, and no-surrogate inputs match regular conversion length. Add fuzz/base64_details.cpp: Differentially tests base64_to_binary_details (char and char16_t) which returns a full_result carrying both input_count and output_count even on errors. Cross-checks with base64_to_binary via the full_result->result cast operator, verifies bounds on input_count/output_count, and confirms output bytes agree on success. Add fuzz/find.cpp: Differentially tests find(char*,char*,char) and find(char16_t*,char16_t*,char16_t) across implementations. Checks: result in [start,end], *result==needle when found, no earlier occurrence exists, and needle is absent when result==end. Update fuzz/CMakeLists.txt to register the three new targets. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

These were the remaining uncovered virtual methods in the implementation API. case 9 — validate_utf16_as_ascii: Differentially tests validate_utf16le_as_ascii and validate_utf16be_as_ascii across all implementations. Also verifies that a true result implies the input is valid UTF-16 (ASCII is a strict subset). case 10 — to_well_formed_utf16: Differentially tests to_well_formed_utf16le and to_well_formed_utf16be. Checks: all impls agree on output, output is always valid UTF-16, and a valid input is passed through unchanged. Ncases bumped from 9 to 11. The actionmask remains 15 (bit_ceil(11)==16) so the change is backwards-compatible with existing corpus files. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

When base64_to_binary_details returns an error (e.g. INVALID_BASE64_CHARACTER), output_count reflects how many bytes were written before the error was detected. Different SIMD implementations process input in different chunk widths (64, 32, 16, or 1 characters) so they may flush different amounts of output before stopping -- this is valid implementation-defined behaviour, not a bug. Only compare output_count (and output_hash) across implementations when the result is SUCCESS, where the output is fully determined. error, input_count, and padding_error are still compared unconditionally. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

With stop_before_partial and trailing ignored whitespace, some implementations include the trailing whitespace in input_count while others do not. On SUCCESS, input_count is not part of the base contract (full_result::operator result() uses output_count on success), so it legitimately differs. Only compare input_count when the conversion fails, where it carries the error position and must be deterministic. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

mingw-w64-i686-clang is no longer available in MSYS2's package repository, causing the MINGW32 build jobs to fail at the package installation step. MINGW32 (i686) with GCC is already covered by msys2.yml, so remove the broken MINGW32+clang entries from this workflow. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

anonrig · 2026-03-28T01:43:19Z

@lemire would you mind reviewing this?

pauldreik · 2026-03-28T18:01:40Z

increasing fuzz coverage is good!

have you seen the newly added https://github.com/simdutf/simdutf/blob/master/AI_USAGE_POLICY.md ?

have you run the fuzzers, and if so on which architectures?

have you evaluated the coverage somehow?

anonrig · 2026-03-28T18:18:26Z

increasing fuzz coverage is good!

have you seen the newly added https://github.com/simdutf/simdutf/blob/master/AI_USAGE_POLICY.md ?

have you run the fuzzers, and if so on which architectures?

have you evaluated the coverage somehow?

Yes, @lemire wrote a similar one for Ada. I've run it on x64 linux only. Unfortunately, other than oss-fuzz code coverage, I don't know a way of generating fuzzing coverage.

lemire · 2026-03-28T23:11:30Z

It is on my todo to review this. There are nearly 800 lines of code to review so it will take time.

claude added 2 commits March 23, 2026 13:07

gitignore: add build-* pattern to cover build-fuzz and similar dirs

d1b4697

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

anonrig requested a review from lemire March 23, 2026 13:13

github-advanced-security AI found potential problems Mar 23, 2026

View reviewed changes

Comment thread fuzz/base64_details.cpp Fixed

claude added 4 commits March 27, 2026 23:33

fuzz: apply clang-format to new fuzz files

7325b80

https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify

lemire approved these changes Mar 30, 2026

View reviewed changes

lemire merged commit 2f1ec65 into master Mar 30, 2026
106 checks passed

BrewTestBot mentioned this pull request Apr 22, 2026

simdutf 9.0.0 Homebrew/homebrew-core#278750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase fuzzing test coverage#948

Increase fuzzing test coverage#948
lemire merged 7 commits intomasterfrom
claude/increase-fuzzing-coverage-IuhFE

anonrig commented Mar 23, 2026

Uh oh!

Uh oh!

anonrig commented Mar 28, 2026

Uh oh!

pauldreik commented Mar 28, 2026

Uh oh!

anonrig commented Mar 28, 2026

Uh oh!

lemire commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

anonrig commented Mar 23, 2026

Uh oh!

Uh oh!

anonrig commented Mar 28, 2026

Uh oh!

pauldreik commented Mar 28, 2026

Uh oh!

anonrig commented Mar 28, 2026

Uh oh!

lemire commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants