Skip to content

adding a length computation benchmark#901

Merged
anonrig merged 1 commit intoyagiz/add-binary-length-base64from
lemire/add-binary-length-base64-benchmark
Jan 6, 2026
Merged

adding a length computation benchmark#901
anonrig merged 1 commit intoyagiz/add-binary-length-base64from
lemire/add-binary-length-base64-benchmark

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Jan 6, 2026

This is a PR ON TOP of @anonrig's PR #887

The purpose here is to anchor the discussion with respect to performance. So that everyone understands what is happening.

To test this out, do the following:

cmake -B build -D SIMDUTF_BENCHMARKS=ON -D CMAKE_BUILD_TYPE=Release
cmake --build build --target benchmark_base64

Use a test file. Any would not, but you can create one like so:

 base64 -i ./README.md -b 76 > test.base64

Next run the benchmarks, first the decoding benchmark:

./build/benchmarks/base64/benchmark_base64 -d test.base64 -f simdutf

Here -f simdutf applies a filter so we only run the simdutf functions.

Then benchmark the length functions:

 ./build/benchmarks/base64/benchmark_base64 -L test.base64

Here is what I get on my macbook...

./build/benchmarks/base64/benchmark_base64 -d test.base64 -f simdutf

# current system detected as arm64.
# loading files: .
# volume: 182408 bytes
# max length: 182408 bytes
# number of inputs: 1
# decode
# the base64 data contains spaces, so we cannot use straight libbase64::base64_decode directly
simdutf::arm64                                :  15.82 GB/s  7.27 % 
simdutf::arm64 (accept garbage)               :  13.77 GB/s  6.65 % 

 ./build/benchmarks/base64/benchmark_base64 -L test.base64         

# current system detected as arm64.
# loading files: .
# volume: 182409 bytes
# max length: 182409 bytes
# number of inputs: 1
# lengths
# Benchmark only simdutf length functions (maximal and exact)
simdutf::arm64_maximal_binary_length_from_base64 :  10838.88 GB/s  inf % 
simdutf::arm64_binary_length_from_base64      :   8.31 GB/s  3.06 % 

So you see here that maximal_binary_length_from_base64 is effectively free, while binary_length_from_base64 is not.

Suppose you combine the decoding function with the maximal function... then we get 15.82 GB/s (unchanged).

But if you combine it with the new function you get 1/(1/8.31 + 1/15.82) or 5.5 GB/s. That is, we reduce by a factor of three the speed. It is not a small effect.

@lemire lemire marked this pull request as ready for review January 6, 2026 18:15
@lemire lemire requested review from anonrig and erikcorry January 6, 2026 18:15
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Jan 6, 2026

@anonrig I leave this up to you to merge this.

Copy link
Copy Markdown
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect - amazing work

@anonrig
Copy link
Copy Markdown
Member

anonrig commented Jan 6, 2026

(now we now, my code is slow!)

@anonrig anonrig merged commit 96a192e into yagiz/add-binary-length-base64 Jan 6, 2026
47 checks passed
@erikcorry
Copy link
Copy Markdown
Collaborator

erikcorry commented Jan 6, 2026

The AVX2 and AVX-512 versions are much faster though. On a 135090 byte input with my sanitizer-incompatible version:

simdutf::icelake_binary_length_from_base64    :  130.62 GB/s  13.52 %   4.49 GHz   0.03 c/b   0.17 i/b   5.07 i/c 
simdutf::haswell_binary_length_from_base64    :  67.02 GB/s  13.05 %   4.37 GHz   0.07 c/b   0.31 i/b   4.83 i/c 
simdutf::fallback_binary_length_from_base64   :   6.62 GB/s  16.63 %   3.83 GHz   0.58 c/b   3.32 i/b   5.73 i/c 

So on Icelake

On a tiny 17 byte input the SIMD versions are the same speed as the scalar versions, which suggests that a version that is sanitizer-friendly would be just as fast.

Getting the length is thus not a big part of a getting-the-length-then-decoding task.

# volume: 135090 bytes
# max length: 135090 bytes
# number of inputs: 1
# decode
# the base64 data contains spaces, so we cannot use straight libbase64::base64_decode directly
simdutf::icelake                              :  14.59 GB/s  14.69 %   3.93 GHz   0.27 c/b   0.51 i/b   1.88 i/c 
simdutf::icelake (accept garbage)             :  13.05 GB/s  10.99 %   3.84 GHz   0.29 c/b   0.47 i/b   1.59 i/c 
simdutf::haswell                              :  12.36 GB/s  27.33 %   3.90 GHz   0.32 c/b   1.21 i/b   3.83 i/c 
simdutf::haswell (accept garbage)             :  12.05 GB/s  15.39 %   4.14 GHz   0.34 c/b   1.22 i/b   3.55 i/c 
simdutf::westmere                             :   9.93 GB/s  21.90 %   4.00 GHz   0.40 c/b   2.34 i/b   5.81 i/c 
simdutf::westmere (accept garbage)            :   9.24 GB/s  15.21 %   4.19 GHz   0.45 c/b   2.43 i/b   5.36 i/c 
simdutf::fallback                             :   3.82 GB/s  22.04 %   4.02 GHz   1.05 c/b   6.83 i/b   6.49 i/c 
simdutf::fallback (accept garbage)            :   2.41 GB/s  20.23 %   4.05 GHz   1.68 c/b   9.59 i/b   5.70 i/c 

anonrig pushed a commit that referenced this pull request Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants