Skip to content

allow end user to have base64 literals #956

Merged
lemire merged 3 commits intomasterfrom
expose_base64_lit
Apr 5, 2026
Merged

allow end user to have base64 literals #956
lemire merged 3 commits intomasterfrom
expose_base64_lit

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Apr 2, 2026

It seems generally useful to be able to do "abcd"_base64 and get an std::array.

I am basically copying @pauldreik's code from the test up to implementation.h and allowing users to call it.

@lemire lemire requested review from Copilot and pauldreik April 2, 2026 02:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a public _base64 user-defined literal to decode base64 strings at compile time (C++23), and updates tests/docs to use it.

Changes:

  • Introduces simdutf::literals::operator""_base64 in the public header under C++23 + base64 feature guards.
  • Updates constexpr base64 tests to use the new literal, adding coverage for whitespace and empty input.
  • Documents compile-time base64 decoding usage in the README.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
tests/constexpr_base64_tests.cpp Replaces the local demo literal with the new public _base64 literal and adds more test cases.
include/simdutf/implementation.h Implements the _base64 literal and supporting constexpr decode helpers.
README.md Documents the new compile-time base64 literal and its behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/simdutf/implementation.h
Comment on lines +7104 to +7111
template <std::size_t InputLen>
constexpr auto base64_decode_literal(const char *str) {
base64_decode_result<InputLen> result{};
auto r = scalar::base64::base64_to_binary_details_impl(
str, InputLen, result.buffer.data(), base64_default, loose);
if (r.error != error_code::SUCCESS) {
throw "invalid base64 input in _base64 literal";
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README claims “Invalid base64 input causes a compilation error”, but this implementation can also be evaluated at runtime (and would then throw a const char*, which is a poor exception type for a public API). Consider making the literal consteval to enforce compile-time evaluation, and replace the throw-based failure with a compile-time diagnostic approach (e.g., a dependent static_assert) so users get a clearer error message and there is no runtime exception behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... does not look like valid criticism?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only possible to invoke if you are using the detail namespace, which users are not supposed to do. I clarified that in a commit I pushed.

I upgraded the function to consteval, so it is never evaluated at runtime.

Comment on lines +7098 to +7102
template <std::size_t InputLen> struct base64_decode_result {
static constexpr std::size_t max_out = (InputLen + 3) / 4 * 3;
std::array<char, max_out> buffer{};
std::size_t output_count{};
};
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base64 decoding produces arbitrary bytes, but the literal returns std::array<char, N> (and uses char buffers internally). This can be surprising for non-text payloads and encourages treating decoded data as text. Consider returning std::array<std::uint8_t, N> (or std::byte) to better match “binary data” semantics and avoid signedness/encoding confusion; char can still be produced by users when they explicitly want text.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, it is fine.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree


template <base64_literal_helper a> constexpr auto base64_make_array() {
constexpr auto decoded = base64_decode_literal<a.size()>(a.storage);
std::array<char, decoded.output_count> ret{};
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base64 decoding produces arbitrary bytes, but the literal returns std::array<char, N> (and uses char buffers internally). This can be surprising for non-text payloads and encourages treating decoded data as text. Consider returning std::array<std::uint8_t, N> (or std::byte) to better match “binary data” semantics and avoid signedness/encoding confusion; char can still be produced by users when they explicitly want text.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. It is fine.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@lemire lemire changed the title allow use to have base64 lit allow end user to have base64 literals Apr 2, 2026
Copy link
Copy Markdown
Collaborator

@pauldreik pauldreik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty to push a commit upgrading to consteval (C++20), clarifying the detail namespace is not part of the public api and that not only spaces are ignored.

the last bit should be a hint for those wanting to embed something that is so long it should be broken into lines for readability. clang format etc does not know that the contents of the string can be line broken for this use case.

Comment on lines +7104 to +7111
template <std::size_t InputLen>
constexpr auto base64_decode_literal(const char *str) {
base64_decode_result<InputLen> result{};
auto r = scalar::base64::base64_to_binary_details_impl(
str, InputLen, result.buffer.data(), base64_default, loose);
if (r.error != error_code::SUCCESS) {
throw "invalid base64 input in _base64 literal";
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only possible to invoke if you are using the detail namespace, which users are not supposed to do. I clarified that in a commit I pushed.

I upgraded the function to consteval, so it is never evaluated at runtime.

Comment on lines +7098 to +7102
template <std::size_t InputLen> struct base64_decode_result {
static constexpr std::size_t max_out = (InputLen + 3) / 4 * 3;
std::array<char, max_out> buffer{};
std::size_t output_count{};
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree


template <base64_literal_helper a> constexpr auto base64_make_array() {
constexpr auto decoded = base64_decode_literal<a.size()>(a.storage);
std::array<char, decoded.output_count> ret{};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@lemire lemire merged commit 05c9a53 into master Apr 5, 2026
104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants