adding a new method: utf8_length_from_invalid_utf16

Let's take a look at the following code:

```
auto wellFormed = kj::heapArray<char16_t>(view.length());
      simdutf::to_well_formed_utf16le(data, view.length(), wellFormed.begin());
      utf8_length = simdutf::utf8_length_from_utf16le(wellFormed.begin(), view.length());
      backingStore = v8::ArrayBuffer::NewBackingStore(js.v8Isolate, utf8_length);
      [[maybe_unused]] auto written = simdutf::convert_utf16le_to_utf8(
          wellFormed.begin(), wellFormed.size(), reinterpret_cast<char*>(backingStore->Data()));
      KJ_DASSERT(written == utf8_length);
```

In order to convert an invalid utf16 input (which is a const char*) to utf8, we need to:

- Allocate an wellFormed array with same length
- Convert input to wellFormed array
- Create a backing store with the utf8 length from wellFormed array
- Convert valid utf16 to utf8

But if we had a utf8_length_from_invalid_utf16

- Calculate utf8 length of the invalid utf16 input
- Create a backing store with the utf8 length
- Convert backing store data to well formed
- Convert backing store (which is well formed) to utf8

This reduces the cost of having an intermediary array with the size of the input.

cc @lemire 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding a new method: utf8_length_from_invalid_utf16 #849

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

adding a new method: utf8_length_from_invalid_utf16 #849

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions