Conversation
… I want to get it working everywhere.
|
@WojciechMula Added benchmarks and tests for UTF 16 character counting. |
|
@WojciechMula As a sanity check, I tried the following NEON function to count UTF16 words... size_t neon_count_16(const char16_t *input, size_t length) {
size_t count{0};
size_t pos{0};
uint16x8_t low = vmovq_n_u16(0xDC00);
uint16x8_t high = vmovq_n_u16(0xDFFF);
while(pos + 8 < length) {
size_t next_stop = pos + (length - pos > 0xFFFFF ? 0xFFFFF : length - pos);
uint16x8_t counter = vdupq_n_u16(0);
for(;pos + 8 < next_stop; pos += 8) {
uint16x8_t in = vld1q_u16(reinterpret_cast<const uint16_t*>(input + pos));
counter = vsubq_s16(counter,vorrq_u16(vcgtq_u16(in,high), vcltq_u16(in,low)));
}
count += vpaddd_u64(vpaddlq_u32(vpaddlq_u16(counter)));
}
return count + scalar::utf16::count_code_points(input + pos, length - pos);
}It was slower. In any case, it was enough to convince me that my code is not absolutely terrible from a performance point of view. |
WojciechMula
left a comment
There was a problem hiding this comment.
Good job! And a great amount of work.
|
@WojciechMula My expectation is that these functions can be made much faster but that's ok. This is just the foundation. |
|
@lemire I'm of course for merging this PR. You did a great job. Please do not wait for me for any approvals in the future, just merge when you are happy about the code shape. I think at this stage of pre-alpha the review process shouldn't be very strict. It's easier to have everything in master. |
As you would expect, you can count UTF8 code points at high speed:
AMD Rome (GNU GCC 10):
ARM M1 (Apple)
This provides tests and benchmarks for UTF8 counting, but not yet for UTF 16.
Fixes https://github.com/lemire/simdutf/issues/8
Fixes https://github.com/lemire/simdutf/issues/27