Skip to content

simpler approach to removing the C++ lib dependency at runtime#962

Merged
lemire merged 23 commits intomasterfrom
lemire/nostdcxx
Apr 21, 2026
Merged

simpler approach to removing the C++ lib dependency at runtime#962
lemire merged 23 commits intomasterfrom
lemire/nostdcxx

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Apr 17, 2026

This is an alternative to #959 proposed by @mitchellh

It is AI generated with my guidance. Don't worry about the CI tests, this is a proof of concept.

  • It removes std::string from the main library. We simply return std::string_view. This is a breaking change. It means requiring C++17 and up from now.
  • We add a shim for __cxa_pure_virtual. Made optional with a macro.
  • We initialize statically our implementations, instead of having them as static instances inside a function when the no c++ lib macro is set.
  • We remove toBinaryString and some other legacy functions.

And that's it!!!

I get...

> nm -u  build_nolibcxx/src/libsimdutf-nostdlibcxx.a     

simdutf.cpp.o:
___stack_chk_fail
___stack_chk_guard
_getenv
_memcpy
_memmove

Let us try it out by hand. Run the amalgamation script, then do...

> c++ -c simdutf.cpp  -nostdlib++ -fno-rtti -fno-exceptions
> cc amalgamation_demo.c simdutf.o 
> otool -L a.out
a.out:
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1356.0.0)

Compare with old way:

> c++ -c simdutf.cpp
> cc -c  ./amalgamation_demo.c
> c++ amalgamation_demo.o simdutf.o
>  otool -L a.out
a.out:
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 2000.67.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1356.0.0)

So @pauldreik, here is my proposal...

  1. In a major release, we switch from std::string to std::string_view. This will break some code but very little. Few of our users ever grab our strings. It requires switching to C++17, but that ought to be fine.
  2. Using a macro, we guard the shim and the static initialization. The shim is tiny and not something that will get in our way. The static initialization is how I used to do things but people complained about possible data races and stuff. So I'd prefer not change it for everything, but it is a tiny localized change. We never touch this code... only when adding a new kernel and that's not a common event. So now there will be a macro, but it is not a big deal.

I think that my version has only a localized effect and it does not make the code much more difficult to maintain. It will only be an issue when adding new kernels, but it is already a bit tricky to do and not something we will do often. (New CPU families are not common.)

Update: I added __glibcxx_assert_fail as a weak symbol.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 17, 2026

@pauldreik We can make this close to neutral as far as lines of code are concerned. Most of the changes you see have to do with the switch from std::string to const char * in our tools/tests.

The failing tests thus far are unused functions, not real failures.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 18, 2026

@pauldreik Most of the changes are now due to the C++11 to C++17 upgrade.

@pauldreik
Copy link
Copy Markdown
Collaborator

this looks pretty good! looking at it now.

Comment thread src/implementation.cpp Outdated
Comment thread src/CMakeLists.txt Outdated
Comment thread include/simdutf/implementation.h Outdated
@lemire lemire marked this pull request as ready for review April 20, 2026 22:59
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 20, 2026

@pauldreik I am now marking this PR as 'ready to review'.

With my latest changes, the 'no C++ lib' part comes really very much for free because we allow anyone to use static initializers.

Note this PR, once merged, would require a MAJOR version as we are bumping up the C++ version to C++17.

@mitchellh
Copy link
Copy Markdown
Collaborator

Note that this still requires that the C++ stdlib headers are somewhere, whereas my branch did not. My build environment doesn't have access to C++ headers at all so things like <cstring> need to become <string.h>. I can work around this if this ends up being merged as-is, but just noting there is a small semantic difference in that my PR eliminated the need for libc++ at build and runtime, this only eliminates it at runtime.

Copy link
Copy Markdown
Collaborator

@pauldreik pauldreik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Just had some minor remarks. I think this turned out very good and I am happy we move to C++17.
I tested out some examples and it (unsurprisingly) works as promised.

Comment thread README.md Outdated
Comment thread README.md
Comment thread benchmarks/base64/benchmark_base64.cpp Outdated
Comment thread tests/helpers/test.cpp
@pauldreik
Copy link
Copy Markdown
Collaborator

my local install of spelling does not work properly, but I think this change would fix the spelling ci error:

commit 065bb32c0c2d364adc33ebb2e137069d25badb64 (HEAD -> nostdcxx)
Author: Paul Dreik <[email protected]>
Date:   Tue Apr 21 19:15:19 2026 +0200

    update spell check word list

diff --git a/scripts/check_typos.sh b/scripts/check_typos.sh
index 1befba01..7ac5de65 100755
--- a/scripts/check_typos.sh
+++ b/scripts/check_typos.sh
@@ -8,6 +8,6 @@ set -eu
 # this exits with nonzero status if it finds something, which terminates the script
 codespell \
     --skip="./benchmarks/competition,./build,./fuzz/work"  \
-    -L vie,persan,fo,ans,larg,indx,shft,carryin
+    -L vie,persan,fo,ans,larg,indx,shft,carryin,statics
 
 echo "no typos detected!"

@lemire lemire changed the title simpler approach to removing the C++ lib dependency. simpler approach to removing the C++ lib dependency at runtime Apr 21, 2026
Comment thread benchmarks/base64/benchmark_base64.cpp Outdated
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 21, 2026

@mitchellh Fair comment. I would submit to you that not requiring the C++ library at runtime ought to be the main point. Do you disagree ?

The C++ headers are tiny by modern standards and you need the C++ compiler anyhow at build time. Users who build their own software, like Gentoo users, will assuredly have the C++ headers. The standard Zig tar ball can compile C++, and it contains the C++ headers.

For us, the downsides of having to code in C++ without even the C++ standard headers is quite significant long term. Reimplementing and maintaining the equivalent of the standard headers is not fun. And risky.

This being said, I bet you can just strap on the minimal set of C++ headers required. My estimation is that it should fit in about 500 kB. It is about the same size as the size of the HTML page (just the HTML) that we are reading now. Or about the same size as the wget command.

Importantly, the standard headers are not needed at runtime (ever). Only at compile time.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 21, 2026

@mitchellh I would invite your feedback before I merge this.

Copy link
Copy Markdown
Collaborator

@mitchellh mitchellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a fair tradeoff to require the C++ headers. I'm significantly more concerned with the runtime requirements and not the build time ones. Thumbs up from me. Thanks for carrying this through.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Apr 21, 2026

@mitchellh

I am going to release.

Feel free to take credit on your blog. Though I did not go with your PR as-is, this current PR would have been impossible without yours... for obvious reasons. I just worked out from the same ideas.

@lemire lemire merged commit 5c0c050 into master Apr 21, 2026
94 checks passed
Comment thread README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants