Alternative strategy for UTF-8 length from malformed UTF-16#857
Merged
lemire merged 2 commits intosimdutf:utf16_to_utf8_length_replacementfrom Nov 18, 2025
Merged
Conversation
Collaborator
Author
|
It's now faster on my machine. |
Member
|
@erikcorry Thank you. I will review today. |
Member
|
I'll merge, but I will later change this to memcpy. uint32_t straddle1 =
*reinterpret_cast<const uint32_t*>(in + pos + 1 * N - 1);
uint32_t straddle2 =
*reinterpret_cast<const uint32_t*>(in + pos + 2 * N - 1);As far as I can tell, this can lead to unaligned loads which is UB. It should be safe but will trigger sanitizer warnings. (A memcpy, won't affect the perf.) |
b42b794
into
simdutf:utf16_to_utf8_length_replacement
19 checks passed
lemire
added a commit
that referenced
this pull request
Nov 18, 2025
* init * adding tests. * initial impl. * adding comment. * format * haswell and westmere * implemented icelake * speeding up icelake * done with icelake * better documentation. * fixing portability issue with Windows * got the name of the intrinsic wrong. * saving. * applying an optimization. * optimized icelake. * fixing other missed opportunities * fixing the cast * Update scripts/README_ADD_FUNCTION.md Co-authored-by: Paul Dreik <[email protected]> * Update CONTRIBUTING.md Co-authored-by: Paul Dreik <[email protected]> * Update CONTRIBUTING.md Co-authored-by: Paul Dreik <[email protected]> * Update scripts/README_ADD_FUNCTION.md Co-authored-by: Paul Dreik <[email protected]> * correcting feature check. * fixing big-endian issue * lint. * typo * Alternative strategy for UTF-8 length from malformed UTF-16 (#857) * Alternative strategy for UTF-8 length from malformed UTF-16 * Don't expect any surrogates, skip work in this case * correct the memcpy * lint * more testing and fixing a bug in generic and arm impl. * adding alignment (workaround for bug in some versions of gcc). --------- Co-authored-by: Daniel Lemire <[email protected]> Co-authored-by: Paul Dreik <[email protected]> Co-authored-by: Erik Corry <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I don't know if it's faster yet.