Conversation
Add fuzz/with_replacement.cpp: Differentially tests convert_utf16le_to_utf8_with_replacement and convert_utf16be_to_utf8_with_replacement (and their length functions). These functions handle unpaired surrogates by replacing them with U+FFFD and were previously unexercised by any fuzz target. Checks: all impls agree, length function matches written bytes, output is valid UTF-8, and no-surrogate inputs match regular conversion length. Add fuzz/base64_details.cpp: Differentially tests base64_to_binary_details (char and char16_t) which returns a full_result carrying both input_count and output_count even on errors. Cross-checks with base64_to_binary via the full_result->result cast operator, verifies bounds on input_count/output_count, and confirms output bytes agree on success. Add fuzz/find.cpp: Differentially tests find(char*,char*,char) and find(char16_t*,char16_t*,char16_t) across implementations. Checks: result in [start,end], *result==needle when found, no earlier occurrence exists, and needle is absent when result==end. Update fuzz/CMakeLists.txt to register the three new targets. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
These were the remaining uncovered virtual methods in the implementation API. case 9 — validate_utf16_as_ascii: Differentially tests validate_utf16le_as_ascii and validate_utf16be_as_ascii across all implementations. Also verifies that a true result implies the input is valid UTF-16 (ASCII is a strict subset). case 10 — to_well_formed_utf16: Differentially tests to_well_formed_utf16le and to_well_formed_utf16be. Checks: all impls agree on output, output is always valid UTF-16, and a valid input is passed through unchanged. Ncases bumped from 9 to 11. The actionmask remains 15 (bit_ceil(11)==16) so the change is backwards-compatible with existing corpus files. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
When base64_to_binary_details returns an error (e.g. INVALID_BASE64_CHARACTER), output_count reflects how many bytes were written before the error was detected. Different SIMD implementations process input in different chunk widths (64, 32, 16, or 1 characters) so they may flush different amounts of output before stopping -- this is valid implementation-defined behaviour, not a bug. Only compare output_count (and output_hash) across implementations when the result is SUCCESS, where the output is fully determined. error, input_count, and padding_error are still compared unconditionally. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
With stop_before_partial and trailing ignored whitespace, some implementations include the trailing whitespace in input_count while others do not. On SUCCESS, input_count is not part of the base contract (full_result::operator result() uses output_count on success), so it legitimately differs. Only compare input_count when the conversion fails, where it carries the error position and must be deterministic. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
mingw-w64-i686-clang is no longer available in MSYS2's package repository, causing the MINGW32 build jobs to fail at the package installation step. MINGW32 (i686) with GCC is already covered by msys2.yml, so remove the broken MINGW32+clang entries from this workflow. https://claude.ai/code/session_01Y9aC6ZvR3o1WE8ehFmqify
|
@lemire would you mind reviewing this? |
|
increasing fuzz coverage is good! have you seen the newly added https://github.com/simdutf/simdutf/blob/master/AI_USAGE_POLICY.md ? have you run the fuzzers, and if so on which architectures? have you evaluated the coverage somehow? |
Yes, @lemire wrote a similar one for Ada. I've run it on x64 linux only. Unfortunately, other than oss-fuzz code coverage, I don't know a way of generating fuzzing coverage. |
|
It is on my todo to review this. There are nearly 800 lines of code to review so it will take time. |
Asked Claude to improve the fuzzing test coverage. In Ada, it helped us find 5 bugs.