Skip to content

adding a new method: utf8_length_from_invalid_utf16 #849

@anonrig

Description

@anonrig

Let's take a look at the following code:

auto wellFormed = kj::heapArray<char16_t>(view.length());
      simdutf::to_well_formed_utf16le(data, view.length(), wellFormed.begin());
      utf8_length = simdutf::utf8_length_from_utf16le(wellFormed.begin(), view.length());
      backingStore = v8::ArrayBuffer::NewBackingStore(js.v8Isolate, utf8_length);
      [[maybe_unused]] auto written = simdutf::convert_utf16le_to_utf8(
          wellFormed.begin(), wellFormed.size(), reinterpret_cast<char*>(backingStore->Data()));
      KJ_DASSERT(written == utf8_length);

In order to convert an invalid utf16 input (which is a const char*) to utf8, we need to:

  • Allocate an wellFormed array with same length
  • Convert input to wellFormed array
  • Create a backing store with the utf8 length from wellFormed array
  • Convert valid utf16 to utf8

But if we had a utf8_length_from_invalid_utf16

  • Calculate utf8 length of the invalid utf16 input
  • Create a backing store with the utf8 length
  • Convert backing store data to well formed
  • Convert backing store (which is well formed) to utf8

This reduces the cost of having an intermediary array with the size of the input.

cc @lemire

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions