validating utf8 to utf16 converter (x64 and NEON) by lemire · Pull Request #26 · simdutf/simdutf

lemire · 2021-03-09T20:51:29Z

This is the first version of a SIMD-based validating utf8 to utf16 converter. It works under ARM NEON, SSE and AVX.

x64 (AMD Rome, GNU GCC 1)

kernel	english	french	arabic	chinese
fallback	3.031 GB/s	1.091 GB/s	0.663 GB/s	1.114 GB/s
SSE	8.527 GB/s	1.741 GB/s	2.169 GB/s	2.092 GB/s
AVX	9.576 GB/s	2.092 GB/s	2.744 GB/s	2.151 GB/s

ARM M1 (Apple system)

kernel	english	french	arabic	chinese
fallback	5.022 GB/s	2.886 GB/s	1.746 GB/s	2.460 GB/s
NEON	15.097 GB/s	3.458 GB/s	2.803 GB/s	2.753 GB/s

Maybe unsurprisingly, the results are much more impressive under x64 systems.

Observe how arabic is "slow". I am not 100% sure why but here are general observations. English seems to have sizeable series of ASCII characters, with a few exceptions. (It is definitively not pure ASCII.) French is much the same with a bit less ASCII. Chinese mixes 3-byte UTF8 with ASCII. Arabic seems to be all over the map.

Here are the branch misses per byte for the fallback kernel:

English 0.00165568
French 0.0584812
Arabic 0.0843745
Chinese 0.0154898

The number of branch mispredictions looks to be pretty much predicting the performance for the fallback kernels.

Fixes https://github.com/lemire/simdutf/issues/14

Fixes https://github.com/lemire/simdutf/issues/7

Raw outputs...

$ ./benchmarks/benchmark -P convert_utf8_to_utf16  -F ../benchmarks/dataset/wikipedia_mars/chinese.txt -I 1000
testcases: 1
convert_utf8_to_utf16+fallback, input size: 75146, iterations: 1000, 
  13.687 ins/byte,    3.396 GHz,    1.114 GB/s (0.8 %),    4.488 ins/cycle, 0.0156895 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+haswell, input size: 75146, iterations: 1000, 
   3.540 ins/byte,    3.401 GHz,    2.795 GB/s (1.3 %),    2.909 ins/cycle, 0.000958135 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+westmere, input size: 75146, iterations: 1000, 
   5.231 ins/byte,    3.399 GHz,    2.151 GB/s (0.7 %),    3.310 ins/cycle, 0.00211588 b.misses/byte, 0 c.mis/byte 

$ ./benchmarks/benchmark -P convert_utf8_to_utf16  -F ../benchmarks/dataset/wikipedia_mars/french.txt -I 1000
testcases: 1
convert_utf8_to_utf16+fallback, input size: 245549, iterations: 1000, 
   6.769 ins/byte,    3.394 GHz,    1.091 GB/s (0.7 %),    2.175 ins/cycle, 0.0593079 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+haswell, input size: 245549, iterations: 1000, 
   3.967 ins/byte,    3.395 GHz,    2.092 GB/s (0.6 %),    2.444 ins/cycle, 0.00296071 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+westmere, input size: 245549, iterations: 1000, 
   5.648 ins/byte,    3.395 GHz,    1.741 GB/s (0.5 %),    2.897 ins/cycle, 0.0029485 b.misses/byte, 0 c.mis/byte 

$ ./benchmarks/benchmark -P convert_utf8_to_utf16  -F ../benchmarks/dataset/wikipedia_mars/arabic.txt -I 1000
testcases: 1
convert_utf8_to_utf16+fallback, input size: 266929, iterations: 1000, 
  14.540 ins/byte,    3.390 GHz,    0.663 GB/s (2.0 %),    2.843 ins/cycle, 0.08001 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+haswell, input size: 266929, iterations: 1000, 
   2.965 ins/byte,    3.396 GHz,    2.744 GB/s (0.8 %),    2.396 ins/cycle, 0.00381375 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+westmere, input size: 266929, iterations: 1000, 
   4.596 ins/byte,    3.395 GHz,    2.169 GB/s (1.0 %),    2.936 ins/cycle, 0.00336044 b.misses/byte, 0 c.mis/byte

$ ./benchmarks/benchmark -P convert_utf8_to_utf16  -F ../benchmarks/dataset/wikipedia_mars/english.txt -I 1000
testcases: 1
convert_utf8_to_utf16+fallback, input size: 181798, iterations: 1000, 
   3.435 ins/byte,    3.397 GHz,    3.031 GB/s (0.7 %),    3.065 ins/cycle, 0.00165018 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+haswell, input size: 181798, iterations: 1000, 
   0.793 ins/byte,    3.405 GHz,    9.576 GB/s (1.1 %),    2.231 ins/cycle, 0.000275031 b.misses/byte, 0 c.mis/byte 
convert_utf8_to_utf16+westmere, input size: 181798, iterations: 1000, 
   1.220 ins/byte,    3.404 GHz,    8.527 GB/s (1.2 %),    3.057 ins/cycle, 0.000313535 b.misses/byte, 0 c.mis/byte

~/CVS/github/simdutf/build dlemire/validating_utf8_utf16* 9s
❯ ./benchmarks/benchmark -F ../benchmarks/dataset/wikipedia_mars/english.txt -I 10000 -P convert_utf8_to_utf16
testcases: 1
convert_utf8_to_utf16+arm64, input size: 991380, iterations: 10000, 
  15.097 GB/s (3.0 %)
convert_utf8_to_utf16+fallback, input size: 991380, iterations: 10000, 
   5.022 GB/s (1.8 %)

~/CVS/github/simdutf/build dlemire/validating_utf8_utf16* 6s
❯ ./benchmarks/benchmark -F ../benchmarks/dataset/wikipedia_mars/french.txt -I 10000 -P convert_utf8_to_utf16 
testcases: 1
convert_utf8_to_utf16+arm64, input size: 1067472, iterations: 10000, 
   3.458 GB/s (1.6 %)
convert_utf8_to_utf16+fallback, input size: 1067472, iterations: 10000, 
   2.886 GB/s (2.4 %)

~/CVS/github/simdutf/build dlemire/validating_utf8_utf16* 14s
❯ ./benchmarks/benchmark -F ../benchmarks/dataset/wikipedia_mars/arabic.txt -I 10000 -P convert_utf8_to_utf16 
testcases: 1
convert_utf8_to_utf16+arm64, input size: 945989, iterations: 10000, 
   2.803 GB/s (1.1 %)
convert_utf8_to_utf16+fallback, input size: 945989, iterations: 10000, 
   1.746 GB/s (1.8 %)

~/CVS/github/simdutf/build dlemire/validating_utf8_utf16* 18s
❯ ./benchmarks/benchmark -F ../benchmarks/dataset/wikipedia_mars/chinese.txt -I 10000 -P convert_utf8_to_utf16 
testcases: 1
convert_utf8_to_utf16+arm64, input size: 378464, iterations: 10000, 
   2.753 GB/s (2.4 %)
convert_utf8_to_utf16+fallback, input size: 378464, iterations: 10000, 
   2.460 GB/s (2.5 %)

WojciechMula

Really nice! Now I see the benefits of having these generic vector types.

WojciechMula · 2021-03-12T19:10:07Z

@lemire We already have very good results, despite these arabic-text quirks. Please merge.

lemire · 2021-03-12T21:47:36Z

Merged.

lemire · 2021-03-12T21:55:41Z

@WojciechMula We can change splat, I really, really don't care. I just wanted to say that it is not an arbitrary term.

lemire added 4 commits March 9, 2021 11:54

This is a validating utf8 to utf16 code.

2962c78

Adding converters.

1e81b89

Tested under x64

b43607c

Moving convert_valid_utf8_to_utf16 to generic files.

57ee9e4

lemire requested a review from WojciechMula March 9, 2021 21:56

WojciechMula reviewed Mar 10, 2021

View reviewed changes

Comment thread include/simdutf/arm64/simd.h

Comment thread src/generic/utf8_to_utf16/utf8_to_utf16.h Outdated

Comment thread src/westmere/implementation.cpp Outdated

Comment thread src/generic/utf8_to_utf16/utf8_to_utf16.h Outdated

lemire added 2 commits March 10, 2021 16:18

Cleaning comment.

0122209

Removing commented out line.

85f4b68

lemire merged commit 290142a into master Mar 12, 2021

lemire deleted the dlemire/validating_utf8_utf16 branch July 7, 2021 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validating utf8 to utf16 converter (x64 and NEON)#26

validating utf8 to utf16 converter (x64 and NEON)#26
lemire merged 6 commits intomasterfrom
dlemire/validating_utf8_utf16

lemire commented Mar 9, 2021 •

edited

Loading

Uh oh!

WojciechMula left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WojciechMula commented Mar 12, 2021

Uh oh!

lemire commented Mar 12, 2021

Uh oh!

lemire commented Mar 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lemire commented Mar 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WojciechMula left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WojciechMula commented Mar 12, 2021

Uh oh!

lemire commented Mar 12, 2021

Uh oh!

lemire commented Mar 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lemire commented Mar 9, 2021 •

edited

Loading