Skip to content

faster scalar utf8 to utf16 transcoding#25

Merged
lemire merged 9 commits intomasterfrom
dlemire/faster_utf8_to_utf16_transcoding
Mar 3, 2021
Merged

faster scalar utf8 to utf16 transcoding#25
lemire merged 9 commits intomasterfrom
dlemire/faster_utf8_to_utf16_transcoding

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Mar 1, 2021

This doubles the performance of the scalar fallback and uses far less code:

$ ./build/benchmarks/benchmark -P convert -F benchmarks/dataset/wikipedia_mars/chinese.txt
testcases: 1
convert_utf8_to_utf16+fallback, input size: 75146, iterations: 100,
  13.687 ins/byte,    3.396 GHz,    1.033 GB/s,    4.162 ins/cycle, 0.0186969 b.misses/byte, 0 c.mis/byte
convert_utf8_to_utf16+haswell, input size: 75146, iterations: 100,
  13.687 ins/byte,    3.397 GHz,    1.007 GB/s,    4.056 ins/cycle, 0.0221436 b.misses/byte, 0 c.mis/byte
convert_utf8_to_utf16+westmere, input size: 75146, iterations: 100,
  13.687 ins/byte,    3.396 GHz,    1.071 GB/s,    4.314 ins/cycle, 0.0188167 b.misses/byte, 0 c.mis/byte
convert_valid_utf8_to_utf16+fallback, input size: 75146, iterations: 100,
  10.518 ins/byte,    3.396 GHz,    1.174 GB/s,    3.636 ins/cycle, 0.0188167 b.misses/byte, 0 c.mis/byte
convert_valid_utf8_to_utf16+haswell, input size: 75146, iterations: 100,
   2.688 ins/byte,    3.400 GHz,    3.021 GB/s,    2.388 ins/cycle, 0.00203604 b.misses/byte, 0 c.mis/byte
convert_valid_utf8_to_utf16+westmere, input size: 75146, iterations: 100,
   3.239 ins/byte,    3.400 GHz,    2.916 GB/s,    2.778 ins/cycle, 0.00174327 b.misses/byte, 0 c.mis/byte

We do not yet have accelerated "validating" transcoding, but it felt necessary to first have more sensible scalar fallbacks.

Fixes https://github.com/lemire/simdutf/issues/17

Fixes https://github.com/lemire/simdutf/issues/13

Fixes https://github.com/lemire/simdutf/issues/9

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 1, 2021

@WojciechMula This will get squashed during the merge.

@lemire lemire requested a review from WojciechMula March 1, 2021 23:36
Copy link
Copy Markdown
Collaborator

@WojciechMula WojciechMula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks nice, good job.

Comment thread src/arm64/implementation.cpp Outdated
Comment thread src/fallback/implementation.cpp Outdated
Comment thread src/scalar/utf8_to_utf16/utf8_to_utf16.h Outdated
Comment thread src/scalar/utf8_to_utf16/utf8_to_utf16.h Outdated
Comment thread src/scalar/utf8_to_utf16/valid_utf8_to_utf16.h Outdated
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 3, 2021

@WojciechMula I have addressed all of your points, I tweaked the code some more, and adding more nice features. I am quite happy with this PR so I am merging it.

Before I do, here are the current performance results on my AMD Rome box...

$ ./build/benchmarks
/benchmark -F benchmarks/dataset/wikipedia_mars/chinese.txt
testcases: 1
convert_utf8_to_utf16+fallback, input size: 75146, iterations: 400,
  13.687 ins/byte,    3.396 GHz,    0.913 GB/s (5.0 %),    3.680 ins/cycle, 0.0227557 b.misses/byte, 0 c.mis/byte
convert_utf8_to_utf16+haswell, input size: 75146, iterations: 400,
  13.687 ins/byte,    3.396 GHz,    0.939 GB/s (0.7 %),    3.785 ins/cycle, 0.0193623 b.misses/byte, 0 c.mis/byte
convert_utf8_to_utf16+westmere, input size: 75146, iterations: 400,
  13.687 ins/byte,    3.397 GHz,    1.077 GB/s (0.6 %),    4.340 ins/cycle, 0.0191893 b.misses/byte, 0 c.mis/byte
convert_valid_utf8_to_utf16+fallback, input size: 75146, iterations: 400,
   9.769 ins/byte,    3.397 GHz,    1.219 GB/s (0.9 %),    3.507 ins/cycle, 0.0177122 b.misses/byte, 0 c.mis/byte
convert_valid_utf8_to_utf16+haswell, input size: 75146, iterations: 400,
   2.687 ins/byte,    3.402 GHz,    3.079 GB/s (0.6 %),    2.432 ins/cycle, 0.00137067 b.misses/byte, 0 c.mis/byte
convert_valid_utf8_to_utf16+westmere, input size: 75146, iterations: 400,
   3.238 ins/byte,    3.403 GHz,    2.943 GB/s (0.9 %),    2.800 ins/cycle, 0.00155697 b.misses/byte, 0 c.mis/byte
validate_utf8+fallback, input size: 75146, iterations: 400,
  12.158 ins/byte,    3.396 GHz,    1.166 GB/s (0.8 %),    4.175 ins/cycle, 0.020866 b.misses/byte, 0 c.mis/byte
validate_utf8+haswell, input size: 75146, iterations: 400,
   0.756 ins/byte,    3.437 GHz,   16.098 GB/s (1.0 %),    3.541 ins/cycle, 0.000133074 b.misses/byte, 0 c.mis/byte
validate_utf8+westmere, input size: 75146, iterations: 400,
   2.075 ins/byte,    3.411 GHz,    6.691 GB/s (0.4 %),    4.070 ins/cycle, 0.000133074 b.misses/byte, 0 c.mis/byte

@lemire lemire merged commit 20ee905 into master Mar 3, 2021
@lemire lemire deleted the dlemire/faster_utf8_to_utf16_transcoding branch July 7, 2021 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants