Skip to content

reduce the size of our tools#958

Merged
lemire merged 3 commits intomasterfrom
lemire/bloat
Apr 14, 2026
Merged

reduce the size of our tools#958
lemire merged 3 commits intomasterfrom
lemire/bloat

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Apr 9, 2026

On my mac, I get this...

Tool Before (full simdutf) After (feature-specific) Reduction
fastbase64 223 KB (208 KB stripped) 131 KB (119 KB stripped) 41% smaller
sutf 223 KB (203 KB stripped) 150 KB (137 KB stripped) 32% smaller

Interestingly, I get

$ which base64
/usr/bin/base64
$ ls -alh /usr/bin/base64
-rwxr-xr-x  5 root  wheel   133K août  16  2025 /usr/bin/base64

vs

ls -al build/tools/fastbase64
-rwxr-xr-x@ 1 dlemire  staff  133832 avr.   9 15:20 build/tools/fastbase64

This is possible through the magic of features (thanks to @WojciechMula for the idea).
So, on my system, fastbase64 ends up SMALLER.

This makes no sense to me. We should be much fatter.

Update: the trick is that we depend on the C++ lib which is a dynamic library under macOS.

@lemire lemire requested a review from pauldreik April 9, 2026 19:22
@lemire lemire requested a review from Copilot April 9, 2026 19:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR reduces the size of the fastbase64 and sutf tools by linking them against feature-specific simdutf library variants and toggling off unused features at compile time.

Changes:

  • Link fastbase64 against a base64-only simdutf static library and disable UTF-related features.
  • Link sutf against a “no base64” simdutf static library and disable encoding detection/base64.
  • Make feature macros in implementation.h overridable via compile definitions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
tools/CMakeLists.txt Switch tools to link against feature-specific simdutf variants and add tool-level feature defines.
src/CMakeLists.txt Add simdutf-base64 and simdutf-nobase64 static library targets built from simdutf.cpp.
include/simdutf/implementation.h Change feature macro defaults to be conditional, allowing overrides from build flags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/CMakeLists.txt
Comment on lines +59 to +89
if(SIMDUTF_TOOLS)
# --- simdutf-base64: only base64 support (for fastbase64) ---
add_library(simdutf-base64 STATIC simdutf.cpp)
target_include_directories(simdutf-base64 PRIVATE $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>)
target_include_directories(simdutf-base64 PUBLIC "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>")
target_compile_definitions(simdutf-base64 PRIVATE
SIMDUTF_FEATURE_BASE64=1
SIMDUTF_FEATURE_UTF8=0
SIMDUTF_FEATURE_UTF16=0
SIMDUTF_FEATURE_UTF32=0
SIMDUTF_FEATURE_ASCII=0
SIMDUTF_FEATURE_LATIN1=0
SIMDUTF_FEATURE_DETECT_ENCODING=0
)
set_target_properties(simdutf-base64 PROPERTIES POSITION_INDEPENDENT_CODE ON)

# --- simdutf-nobase64: everything except base64 (for sutf) ---
add_library(simdutf-nobase64 STATIC simdutf.cpp)
target_include_directories(simdutf-nobase64 PRIVATE $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>)
target_include_directories(simdutf-nobase64 PUBLIC "$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>")
target_compile_definitions(simdutf-nobase64 PRIVATE
SIMDUTF_FEATURE_BASE64=0
SIMDUTF_FEATURE_UTF8=1
SIMDUTF_FEATURE_UTF16=1
SIMDUTF_FEATURE_UTF32=1
SIMDUTF_FEATURE_ASCII=1
SIMDUTF_FEATURE_LATIN1=1
SIMDUTF_FEATURE_DETECT_ENCODING=0
)
set_target_properties(simdutf-nobase64 PROPERTIES POSITION_INDEPENDENT_CODE ON)
endif()
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new simdutf-base64 / simdutf-nobase64 targets don’t inherit the same platform/compiler-specific options and defines applied to the main simdutf target earlier in this file (e.g., PowerPC -maltivec/-mvsx/-mcpu=power8, fallback-related defines, etc.). This can cause build failures or behavior/perf differences when SIMDUTF_TOOLS is enabled. A concrete fix is to factor common configuration into a helper (e.g., a CMake function) and apply it to all three targets, so only the feature toggles differ.

Copilot uses AI. Check for mistakes.
Comment thread src/CMakeLists.txt
Comment thread src/CMakeLists.txt
Comment thread include/simdutf/implementation.h
Comment thread tools/CMakeLists.txt
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 9, 2026

I think that copilot is drunk.

@pauldreik
Copy link
Copy Markdown
Collaborator

interesting experiment!
on my system (debian trixie, amd64), building with "-O3 -march native", fastbase64 is 1.2M stripped.

If I build like this:

#!/bin/sh
singleheader/amalgamate.py --with-base64
(cat singleheader/simdutf.h
grep -v '#include "simdutf.h"' singleheader/simdutf.cpp
tail -n +2 tools/fastbase64.cpp ) >main.cpp

/usr/lib/ccache/g++-14 -march=native -O3 -DNDEBUG -std=c++17 main.cpp

the stripped result is 119 Kb.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 12, 2026

@pauldreik Really?

Here is what I get on Rocky Linux (x86)...

Before this PR:

465824 build/tools/fastbase64

with this PR...

204184 build/tools/fastbase64

So a dramatic reduction.

On this machine

Rocky Linux release 10.1 (Red Quartz)

This system assumes x86_64-v3. So simdutf just builds the haswell and icelake kernel (as no other kernel is needed).

I think Debian still seek to support everything all the way back to the Pentium 4 so you are going to have 4 kernels. It might be up to 2x larger. Still, you should see a massive reduction in the size of the binary with this PR.

Can you double check?

@pauldreik
Copy link
Copy Markdown
Collaborator

I am sorry, I should have been clearer that I ran on master. I checked the flags used by a default build and repeated the test. Here is my full test script.

#!/bin/sh
singleheader/amalgamate.py --with-base64
(cat singleheader/simdutf.h
grep -v '#include "simdutf.h"' singleheader/simdutf.cpp
tail -n +2 tools/fastbase64.cpp ) >main.cpp

c++ -O3 -DNDEBUG -std=c++17 -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store main.cpp -o fastbase64.onetu
strip fastbase64.onetu
ls -lah fastbase64.onetu # 243K

c++ -O3 -march=native -DNDEBUG -std=c++17 -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store main.cpp -o fastbase64.onetu-native
strip fastbase64.onetu-native
ls -lah fastbase64.onetu-native # 119K

cmake -B /tmp/blaha -S .
cmake --build /tmp/blaha/
strip /tmp/blaha/tools/fastbase64
ls -lah /tmp/blaha/tools/fastbase64 # 1.1 M before PR, 772K after

So this PR indeed reduces the size (from 1.1M to 772K).

On my machine, the single translation unit gets it down to 243K and to 119K if compiling for the current architecture.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 14, 2026

Ok. I am merging this since it is beneficial (less bloat).

@lemire lemire merged commit f9ecacb into master Apr 14, 2026
104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants