Skip ASCII check until the non ASCII byte is found#81
Conversation
If the 8 + 8 bytes ASCII check is unsuccessful, then there is no reason to repeat it until the non-ASCII byte is found.
918186d to
0cbb098
Compare
|
What are your benchmark numbers supporting this optimization? By how much do you speed up the processing... please include diverse data sources... |
|
Master branch This PR The dataset is a 10 MiB file made of Will run more benchmarks with real world dataset later. |
|
I am concerned about such a synthetic dataset. Would you try again with the files from this repository ? |
|
Wikipedia Japanese main page (https://ja.wikipedia.org/wiki/メインページ): Master branch This PR |
All of them? |
|
Nvm, I did not see the |
|
Master branch This PR |
|
Master branch This PR |
|
Master branch This PR |
|
Thanks. It does look convincing. Here are my own results (Apple M1, LLVM 12):
It is a bit surprising at first that it would help even with pure ASCII files, but it makes sense. Merging. |
If the 8 + 8 bytes ASCII check is unsuccessful, then there is no reason
to repeat it until the non-ASCII byte is found.