Skip to content

Conversation

@ThePseudo
Copy link

base64 turns out to be much slower than GNU coreutils base64.

One of the issues is that the file on which to perform base64 is read twice. This PR improves on this by reading the file once and using iterators over the file Vec to encode/decode files.

@sylvestre
Copy link
Contributor

we are now faster: #8578

@sylvestre
Copy link
Contributor

but your change might still be relevant
please try with hyperfine with and without your change
thanks

@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

GNU test failed: tests/basenc/base64. tests/basenc/base64 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/basenc/basenc. tests/basenc/basenc is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

GNU test failed: tests/basenc/base64. tests/basenc/base64 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/basenc/basenc. tests/basenc/basenc is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@ThePseudo ThePseudo force-pushed the input_reuse_b64 branch 2 times, most recently from 2c9c60a to d8cd01f Compare September 9, 2025 12:03
@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@ThePseudo
Copy link
Author

Did some testing after rebasing on the current master.

The file I used is 4.9 GB large, so this should give some ok results.

For encoding we have:

Command Mean [s] Min [s] Max [s] Relative
Before patch: ./coreutils base64 model-00001-of-000163.safetensors > /dev/null 4.275 ± 0.077 4.201 4.399 1.00
After patch./coreutils base64 model-00001-of-000163.safetensors > /dev/null 4.037 ± 0.049 3.980 4.110 1.00

For decoding:

Command Mean [s] Min [s] Max [s] Relative
Before patch ./coreutils base64 -d base64.txt 9.945 ± 0.159 9.774 10.130 1.00
After patch./coreutils base64 -d base64.txt 9.037 ± 0.205 8.765 9.286 1.00

Since this patch removes a double reading anyways, it might be still useful. I think another follow-up could be (but I'm not an expert, so take these words with a grain of salt) reading the file in parts and just later detecting if we have some padding, if possible. Almost certainly the computation time will increase, but then memory usage would be much lower.

@sylvestre
Copy link
Contributor

Could you please run hyperfine with gnu, without the patch and with ?
(in a single call)
thanks

@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@ThePseudo
Copy link
Author

ThePseudo commented Sep 9, 2025

Here. I put decode and encode together (my bad), but you can get the relative values easily from there anyways.

Command Mean [s] Min [s] Max [s] Relative
./coreutils base64 model-00001-of-000163.safetensors 4.205 ± 0.090 4.082 4.364 1.00
./coreutils_main base64 model-00001-of-000163.safetensors 4.276 ± 0.068 4.194 4.436 1.02 ± 0.03
base64 model-00001-of-000163.safetensors 4.268 ± 0.206 4.129 4.817 1.01 ± 0.05
./coreutils base64 -d base64.txt 9.493 ± 1.042 8.048 11.335 2.26 ± 0.25
./coreutils_main base64 -d base64.txt 10.040 ± 0.553 9.416 10.875 2.39 ± 0.14
base64 -d base64.txt 8.399 ± 0.157 8.215 8.665 2.00 ± 0.06

PS: I will fix now the code quality checks.

@sylvestre
Copy link
Contributor

please copy and paste the export, not the table as it is harder

@ThePseudo ThePseudo force-pushed the input_reuse_b64 branch 3 times, most recently from 3b4318c to ecbe3e7 Compare September 9, 2025 13:18
@ThePseudo
Copy link
Author

Hi, I'm missing something, but is there any preferred export format? Because I only find md, asciidoc, csv, json and orgmode...

@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@sylvestre
Copy link
Contributor

same #8579 as in this PR

@ThePseudo ThePseudo marked this pull request as ready for review September 9, 2025 13:41
@ThePseudo
Copy link
Author

ThePseudo commented Sep 9, 2025

Ok, then, I have them saved:

Benchmark 1: ./coreutils base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.205 s ±  0.090 s    [User: 1.272 s, System: 2.928 s]
  Range (min … max):    4.082 s …  4.364 s    10 runs
 
Benchmark 2: ./coreutils_main base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.276 s ±  0.068 s    [User: 1.007 s, System: 3.262 s]
  Range (min … max):    4.194 s …  4.436 s    10 runs
 
Benchmark 3: base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.268 s ±  0.206 s    [User: 3.295 s, System: 0.971 s]
  Range (min … max):    4.129 s …  4.817 s    10 runs
 
Benchmark 4: ./coreutils base64 -d base64.txt
  Time (mean ± σ):      9.493 s ±  1.042 s    [User: 4.187 s, System: 5.159 s]
  Range (min … max):    8.048 s … 11.335 s    10 runs
 
Benchmark 5: ./coreutils_main base64 -d base64.txt
  Time (mean ± σ):     10.040 s ±  0.553 s    [User: 5.767 s, System: 4.265 s]
  Range (min … max):    9.416 s … 10.875 s    10 runs
 
Benchmark 6: base64 -d base64.txt
  Time (mean ± σ):      8.399 s ±  0.157 s    [User: 6.799 s, System: 1.598 s]
  Range (min … max):    8.215 s …  8.665 s    10 runs

@sylvestre
Copy link
Contributor

please add the last line, it is the summary that matters

@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

@ThePseudo
Copy link
Author

Sure:

Summary
  ./coreutils base64 model-00001-of-000163.safetensors ran
    1.01 ± 0.05 times faster than base64 model-00001-of-000163.safetensors
    1.02 ± 0.03 times faster than ./coreutils_main base64 model-00001-of-000163.safetensors
    2.00 ± 0.06 times faster than base64 -d base64.txt
    2.26 ± 0.25 times faster than ./coreutils base64 -d base64.txt
    2.39 ± 0.14 times faster than ./coreutils_main base64 -d base64.txt

@sylvestre
Copy link
Contributor

Sure:

Summary
  ./coreutils base64 model-00001-of-000163.safetensors ran
    1.01 ± 0.05 times faster than base64 model-00001-of-000163.safetensors
    1.02 ± 0.03 times faster than ./coreutils_main base64 model-00001-of-000163.safetensors
    2.00 ± 0.06 times faster than base64 -d base64.txt
    2.26 ± 0.25 times faster than ./coreutils base64 -d base64.txt
    2.39 ± 0.14 times faster than ./coreutils_main base64 -d base64.txt

i guess "base64" is the GNU implementation

I think you should two different runs as you are benchmarking two very different things

@ThePseudo
Copy link
Author

ThePseudo commented Sep 9, 2025

Yes, it is the GNU implementation. I will make another one only for decode (for encoding it's already there).

Edit: sorry for the delay, but this will have to wait a bit. Right now, to satisfy some of the coding style constraints, I have introduced a performance regression. I will get back to it tomorrow.

Edit 2: found out what was the problem, incidentally it also decreased the memory footprint with respect to my previous modifications.

Benchmark 1: ./coreutils base64 -d base64.txt
  Time (mean ± σ):     10.224 s ±  0.595 s    [User: 6.151 s, System: 4.060 s]
  Range (min … max):    9.592 s … 11.546 s    10 runs
 
Benchmark 2: ./coreutils_main base64 -d base64.txt
  Time (mean ± σ):      9.934 s ±  0.205 s    [User: 5.562 s, System: 4.362 s]
  Range (min … max):    9.747 s … 10.411 s    10 runs
 
Benchmark 3: base64 -d base64.txt
  Time (mean ± σ):      8.484 s ±  0.078 s    [User: 6.919 s, System: 1.554 s]
  Range (min … max):    8.374 s …  8.607 s    10 runs
 
Summary
  base64 -d base64.txt ran
    1.17 ± 0.03 times faster than ./coreutils_main base64 -d base64.txt
    1.21 ± 0.07 times faster than ./coreutils base64 -d base64.txt

@github-actions
Copy link

github-actions bot commented Sep 9, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

@ThePseudo ThePseudo force-pushed the input_reuse_b64 branch 2 times, most recently from ae85bb7 to e243bd7 Compare September 10, 2025 06:38
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@ThePseudo
Copy link
Author

ThePseudo commented Sep 10, 2025

It is strange that tests are not passing now... nothing changed from yesterday, except the removal of an import due to it not being used and the rebase to the current master...

Also, testing locally only shows some errors in tail, which is not touched by these modifications. Maybe on Windows there is sometihng going on? Sadly I do not have a Windows machine...

@cakebaker
Copy link
Contributor

@ThePseudo the errors are unrelated to your PR, they also appear in other PRs.

@ThePseudo
Copy link
Author

@cakebaker thanks! It makes sense now :')

Andrea Calabrese added 2 commits September 10, 2025 11:19
Base64: the function has_padding reads the file and then discards it.
The functions fast_encode and fast_decode re-read the file, providing
significant delay in larger files.
This commit  also reduces the amount of computation done inside the
fast_decode function.
In the next commit there will be the fix for the fast_decode function

Signed-off-by: Andrea Calabrese <[email protected]>
This is the follow-up commit to the improvement of fast_encode and the
reuse of the read file. What is written there is still valid.

Signed-off-by: Andrea Calabrese <[email protected]>
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

@ThePseudo
Copy link
Author

I think I will not edit the change anymore, so I think it can be time for a review before a possible merge, if anyone is ok with it.

@sylvestre
Copy link
Contributor

could you please rerun the benchmark on the last version ?

@ThePseudo
Copy link
Author

Sure!

For encoding:

Benchmark 1: ./coreutils base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.326 s ±  0.189 s    [User: 1.169 s, System: 2.907 s]
  Range (min … max):    3.988 s …  4.598 s    10 runs
 
Benchmark 2: ./coreutils_main base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      4.400 s ±  0.132 s    [User: 0.923 s, System: 3.051 s]
  Range (min … max):    4.230 s …  4.591 s    10 runs
 
Benchmark 3: base64 model-00001-of-000163.safetensors
  Time (mean ± σ):      3.862 s ±  0.016 s    [User: 3.059 s, System: 0.801 s]
  Range (min … max):    3.844 s …  3.899 s    10 runs
 
Summary
  base64 model-00001-of-000163.safetensors ran
    1.12 ± 0.05 times faster than ./coreutils base64 model-00001-of-000163.safetensors
    1.14 ± 0.03 times faster than ./coreutils_main base64 model-00001-of-000163.safetensors

For decoding:

Benchmark 1: ./coreutils base64 -d base64.txt
  Time (mean ± σ):      9.466 s ±  0.113 s    [User: 5.561 s, System: 3.899 s]
  Range (min … max):    9.262 s …  9.581 s    10 runs
 
Benchmark 2: ./coreutils_main base64 -d base64.txt
  Time (mean ± σ):      9.583 s ±  0.097 s    [User: 5.290 s, System: 4.291 s]
  Range (min … max):    9.452 s …  9.742 s    10 runs
 
Benchmark 3: base64 -d base64.txt
  Time (mean ± σ):      8.326 s ±  0.089 s    [User: 6.709 s, System: 1.614 s]
  Range (min … max):    8.270 s …  8.573 s    10 runs
 
Summary
  base64 -d base64.txt ran
    1.14 ± 0.02 times faster than ./coreutils base64 -d base64.txt
    1.15 ± 0.02 times faster than ./coreutils_main base64 -d base64.txt

It might not be significantly faster, but it uses only one read of the file. I will prepare also another patch to enable streaming instead of reading the file from the beginning, but it may take a bit of time, so I'll open another PR.

@sylvestre
Copy link
Contributor

@ThePseudo when you share benchmark, please be a bit more explicit
./coreutils_main isn't a very explicit name :)

@sylvestre
Copy link
Contributor

nice, on my system:

$ hyperfine "./target/release/base64 -d a.txt"  "./target/release/base64.prev -d a.txt"  "/usr/bin/base64 -d a.txt"
Benchmark 1: ./target/release/base64 -d a.txt
  Time (mean ± σ):      14.5 ms ±   3.8 ms    [User: 9.6 ms, System: 4.8 ms]
  Range (min … max):     8.7 ms …  25.9 ms    173 runs

Benchmark 2: ./target/release/base64.prev -d a.txt
  Time (mean ± σ):      16.2 ms ±   3.3 ms    [User: 10.7 ms, System: 5.2 ms]
  Range (min … max):     9.3 ms …  22.9 ms    179 runs

Benchmark 3: /usr/bin/base64 -d a.txt
  Time (mean ± σ):      18.6 ms ±   4.1 ms    [User: 16.2 ms, System: 2.3 ms]
  Range (min … max):    11.9 ms …  31.3 ms    122 runs

Summary
  ./target/release/base64 -d a.txt ran
    1.11 ± 0.37 times faster than ./target/release/base64.prev -d a.txt
    1.28 ± 0.44 times faster than /usr/bin/base64 -d a.txt

and


hyperfine -N "./target/release/base64 shakespeare.txt"  "./target/release/base64.prev shakespeare.txt"  "/usr/bin/base64 shakespeare.txt"
Benchmark 1: ./target/release/base64 shakespeare.txt
  Time (mean ± σ):       7.4 ms ±   1.6 ms    [User: 2.8 ms, System: 4.4 ms]
  Range (min … max):     3.3 ms …  12.6 ms    325 runs

Benchmark 2: ./target/release/base64.prev shakespeare.txt
  Time (mean ± σ):       8.3 ms ±   1.6 ms    [User: 3.5 ms, System: 4.6 ms]
  Range (min … max):     4.1 ms …  12.5 ms    480 runs

Benchmark 3: /usr/bin/base64 shakespeare.txt
  Time (mean ± σ):       9.5 ms ±   1.9 ms    [User: 7.3 ms, System: 1.9 ms]
  Range (min … max):     4.7 ms …  18.0 ms    321 runs

Summary
  ./target/release/base64 shakespeare.txt ran
    1.13 ± 0.32 times faster than ./target/release/base64.prev shakespeare.txt
    1.29 ± 0.38 times faster than /usr/bin/base64 shakespeare.txt

@sylvestre sylvestre merged commit 9e1ba2f into uutils:main Sep 11, 2025
135 of 138 checks passed
@ThePseudo ThePseudo deleted the input_reuse_b64 branch September 11, 2025 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants