converting binary data to base64 with lines#840
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new function binary_to_base64_with_lines that outputs base64 encoding with line separators. The implementation provides optimized paths for x64 and ARM architectures with both fast and slow execution paths depending on line length.
- Adds
binary_to_base64_with_linesfunction with configurable line length (default 76 characters) - Implements optimized SIMD versions for various architectures (x64, ARM, etc.)
- Updates all existing base64 function calls to use the new namespaced
simdutf::versions
Reviewed Changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/random_fuzzer.cpp | Updated function calls to use simdutf namespace |
| tests/null_safety_tests.cpp | Updated function call to use simdutf namespace |
| tests/base64_tests.cpp | Added comprehensive tests for line-based encoding and updated namespace usage |
| tests/atomic_base64_tests.cpp | Updated function calls to use simdutf namespace |
| src/westmere/sse_base64.cpp | Added SIMD implementation with line support for SSE |
| src/westmere/implementation.cpp | Added wrapper for new line-based function |
| src/scalar/base64.h | Added templated implementation supporting line breaks |
| src/implementation.cpp | Moved base64 length functions to global namespace and added line support |
| include/simdutf/implementation.h | Added new API declarations and global constants |
| benchmarks/base64/benchmark_base64.cpp | Added benchmarks for line-based encoding |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
77bdf70 to
d878e0f
Compare
pauldreik
left a comment
There was a problem hiding this comment.
I took the liberty to push three minor fixes, hope you agree.
|
@pauldreik Always happy. I optimized the code further. It is now pretty faster!!! Generating lines is not free, but it is now pretty decent. |
|
I imagine using the default line length is a much more common usecase than other line lengths. Is there any significant runtime performance to be gained by knowing it at compile time? |
|
I started working on extending the fuzzer but too tired to finish it today. I might have found a problem. |
|
@pauldreik We can wait for the fuzzer prior to a release. |
|
@pauldreik It is green, but let us wait for the fuzzer. |
|
I pushed a very basic extension of the existing fuzzer, just checking the return value and it does not match. for the case I found (almost instantly) with line length 5, I surprisingly got 9 returned from binary_to_base64_with_lines(), 11 from base64_length_from_binary_with_lines() and 10 from base64_length_from_binary(). |
|
@pauldreik What you caught was that base64url was untested (I had written reasonable code but missed cases). |
no need to change the option size, there was one spare byte left
|
I have run the fuzzer for a short while on arm64 and amd64 - seems to work fine. I think this is good to merge now! |
Add a new function that can output base64 outputs with line separators. By default we use lines of length 76. So far, I have added x64 and ARM support. It comes with new tests and benchmarks.
The implementation is not too difficult. We have a fast path for the case where the line length is long enough (say 64 characters) otherwise we fall back on a slow approach (since we assume that short lines are uncommon).