Let's take a look at the following code:
auto wellFormed = kj::heapArray<char16_t>(view.length());
simdutf::to_well_formed_utf16le(data, view.length(), wellFormed.begin());
utf8_length = simdutf::utf8_length_from_utf16le(wellFormed.begin(), view.length());
backingStore = v8::ArrayBuffer::NewBackingStore(js.v8Isolate, utf8_length);
[[maybe_unused]] auto written = simdutf::convert_utf16le_to_utf8(
wellFormed.begin(), wellFormed.size(), reinterpret_cast<char*>(backingStore->Data()));
KJ_DASSERT(written == utf8_length);
In order to convert an invalid utf16 input (which is a const char*) to utf8, we need to:
- Allocate an wellFormed array with same length
- Convert input to wellFormed array
- Create a backing store with the utf8 length from wellFormed array
- Convert valid utf16 to utf8
But if we had a utf8_length_from_invalid_utf16
- Calculate utf8 length of the invalid utf16 input
- Create a backing store with the utf8 length
- Convert backing store data to well formed
- Convert backing store (which is well formed) to utf8
This reduces the cost of having an intermediary array with the size of the input.
cc @lemire
Let's take a look at the following code:
In order to convert an invalid utf16 input (which is a const char*) to utf8, we need to:
But if we had a utf8_length_from_invalid_utf16
This reduces the cost of having an intermediary array with the size of the input.
cc @lemire