Skip to content

validating utf8 to utf16 converter (x64 and NEON)#26

Merged
lemire merged 6 commits intomasterfrom
dlemire/validating_utf8_utf16
Mar 12, 2021
Merged

validating utf8 to utf16 converter (x64 and NEON)#26
lemire merged 6 commits intomasterfrom
dlemire/validating_utf8_utf16

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Mar 9, 2021

This is the first version of a SIMD-based validating utf8 to utf16 converter. It works under ARM NEON, SSE and AVX.

x64 (AMD Rome, GNU GCC 1)

kernel english french arabic chinese
fallback 3.031 GB/s 1.091 GB/s 0.663 GB/s 1.114 GB/s
SSE 8.527 GB/s 1.741 GB/s 2.169 GB/s 2.092 GB/s
AVX 9.576 GB/s 2.092 GB/s 2.744 GB/s 2.151 GB/s

ARM M1 (Apple system)

kernel english french arabic chinese
fallback 5.022 GB/s 2.886 GB/s 1.746 GB/s 2.460 GB/s
NEON 15.097 GB/s 3.458 GB/s 2.803 GB/s 2.753 GB/s

Maybe unsurprisingly, the results are much more impressive under x64 systems.

Observe how arabic is "slow". I am not 100% sure why but here are general observations. English seems to have sizeable series of ASCII characters, with a few exceptions. (It is definitively not pure ASCII.) French is much the same with a bit less ASCII. Chinese mixes 3-byte UTF8 with ASCII. Arabic seems to be all over the map.

Here are the branch misses per byte for the fallback kernel:

  • English 0.00165568
  • French 0.0584812
  • Arabic 0.0843745
  • Chinese 0.0154898

The number of branch mispredictions looks to be pretty much predicting the performance for the fallback kernels.

Fixes https://github.com/lemire/simdutf/issues/14

Fixes https://github.com/lemire/simdutf/issues/7


Raw outputs...

$ ./benchmarks/benchmark -P convert_utf8_to_utf16  -F ../benchmarks/dataset/wikipedia_mars/chinese.txt -I 1000
testcases: 1
convert_utf8_to_utf16+fallback, input size: 75146, iterations: 1000, 
  13.687 ins/byte,    3.396 GHz,    1.114 GB/s (0.8 %),    4.488 ins/cycle, 0.0156895 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+haswell, input size: 75146, iterations: 1000, 
   3.540 ins/byte,    3.401 GHz,    2.795 GB/s (1.3 %),    2.909 ins/cycle, 0.000958135 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+westmere, input size: 75146, iterations: 1000, 
   5.231 ins/byte,    3.399 GHz,    2.151 GB/s (0.7 %),    3.310 ins/cycle, 0.00211588 b.misses/byte, 0 c.mis/byte 

$ ./benchmarks/benchmark -P convert_utf8_to_utf16  -F ../benchmarks/dataset/wikipedia_mars/french.txt -I 1000
testcases: 1
convert_utf8_to_utf16+fallback, input size: 245549, iterations: 1000, 
   6.769 ins/byte,    3.394 GHz,    1.091 GB/s (0.7 %),    2.175 ins/cycle, 0.0593079 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+haswell, input size: 245549, iterations: 1000, 
   3.967 ins/byte,    3.395 GHz,    2.092 GB/s (0.6 %),    2.444 ins/cycle, 0.00296071 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+westmere, input size: 245549, iterations: 1000, 
   5.648 ins/byte,    3.395 GHz,    1.741 GB/s (0.5 %),    2.897 ins/cycle, 0.0029485 b.misses/byte, 0 c.mis/byte 

$ ./benchmarks/benchmark -P convert_utf8_to_utf16  -F ../benchmarks/dataset/wikipedia_mars/arabic.txt -I 1000
testcases: 1
convert_utf8_to_utf16+fallback, input size: 266929, iterations: 1000, 
  14.540 ins/byte,    3.390 GHz,    0.663 GB/s (2.0 %),    2.843 ins/cycle, 0.08001 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+haswell, input size: 266929, iterations: 1000, 
   2.965 ins/byte,    3.396 GHz,    2.744 GB/s (0.8 %),    2.396 ins/cycle, 0.00381375 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+westmere, input size: 266929, iterations: 1000, 
   4.596 ins/byte,    3.395 GHz,    2.169 GB/s (1.0 %),    2.936 ins/cycle, 0.00336044 b.misses/byte, 0 c.mis/byte

$ ./benchmarks/benchmark -P convert_utf8_to_utf16  -F ../benchmarks/dataset/wikipedia_mars/english.txt -I 1000
testcases: 1
convert_utf8_to_utf16+fallback, input size: 181798, iterations: 1000, 
   3.435 ins/byte,    3.397 GHz,    3.031 GB/s (0.7 %),    3.065 ins/cycle, 0.00165018 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+haswell, input size: 181798, iterations: 1000, 
   0.793 ins/byte,    3.405 GHz,    9.576 GB/s (1.1 %),    2.231 ins/cycle, 0.000275031 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+westmere, input size: 181798, iterations: 1000, 
   1.220 ins/byte,    3.404 GHz,    8.527 GB/s (1.2 %),    3.057 ins/cycle, 0.000313535 b.misses/byte, 0 c.mis/byte 
~/CVS/github/simdutf/build dlemire/validating_utf8_utf16* 9s
❯ ./benchmarks/benchmark -F ../benchmarks/dataset/wikipedia_mars/english.txt -I 10000 -P convert_utf8_to_utf16
testcases: 1
convert_utf8_to_utf16+arm64, input size: 991380, iterations: 10000, 
  15.097 GB/s (3.0 %)
convert_utf8_to_utf16+fallback, input size: 991380, iterations: 10000, 
   5.022 GB/s (1.8 %)

~/CVS/github/simdutf/build dlemire/validating_utf8_utf16* 6s
❯ ./benchmarks/benchmark -F ../benchmarks/dataset/wikipedia_mars/french.txt -I 10000 -P convert_utf8_to_utf16 
testcases: 1
convert_utf8_to_utf16+arm64, input size: 1067472, iterations: 10000, 
   3.458 GB/s (1.6 %)
convert_utf8_to_utf16+fallback, input size: 1067472, iterations: 10000, 
   2.886 GB/s (2.4 %)

~/CVS/github/simdutf/build dlemire/validating_utf8_utf16* 14s
❯ ./benchmarks/benchmark -F ../benchmarks/dataset/wikipedia_mars/arabic.txt -I 10000 -P convert_utf8_to_utf16 
testcases: 1
convert_utf8_to_utf16+arm64, input size: 945989, iterations: 10000, 
   2.803 GB/s (1.1 %)
convert_utf8_to_utf16+fallback, input size: 945989, iterations: 10000, 
   1.746 GB/s (1.8 %)

~/CVS/github/simdutf/build dlemire/validating_utf8_utf16* 18s
❯ ./benchmarks/benchmark -F ../benchmarks/dataset/wikipedia_mars/chinese.txt -I 10000 -P convert_utf8_to_utf16 
testcases: 1
convert_utf8_to_utf16+arm64, input size: 378464, iterations: 10000, 
   2.753 GB/s (2.4 %)
convert_utf8_to_utf16+fallback, input size: 378464, iterations: 10000, 
   2.460 GB/s (2.5 %)

@lemire lemire requested a review from WojciechMula March 9, 2021 21:56
Copy link
Copy Markdown
Collaborator

@WojciechMula WojciechMula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! Now I see the benefits of having these generic vector types.

Comment thread include/simdutf/arm64/simd.h
Comment thread src/generic/utf8_to_utf16/utf8_to_utf16.h Outdated
Comment thread src/westmere/implementation.cpp Outdated
Comment thread src/generic/utf8_to_utf16/utf8_to_utf16.h Outdated
@WojciechMula
Copy link
Copy Markdown
Collaborator

@lemire We already have very good results, despite these arabic-text quirks. Please merge.

@lemire lemire merged commit 290142a into master Mar 12, 2021
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 12, 2021

Merged.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 12, 2021

@WojciechMula We can change splat, I really, really don't care. I just wanted to say that it is not an arbitrary term.

@lemire lemire deleted the dlemire/validating_utf8_utf16 branch July 7, 2021 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The UTF8 => UTF16 transcoder needs to be written using SIMD wrappers All code should rely on simd wrappers

2 participants