Commit 012c924
committed
perf(napi/parser, linter/plugins): speed up decoding strings in raw transfer (#21021)
Improve perf of deserializing strings in raw transfer. This PR combines several optimizations, which have been tested and benchmarked in https://github.com/overlookmotel/oxc-raw-str-bench. This PR implements the version "latin-slice-onebyte64" from that repo, which is the current winner.
String deserialization is the main bottleneck in raw transfer, so speeding it up will likely make a large impact on deserialization overall.
This work follows on from #20834 which produced a major speed-up in many files by making files which contain some non-ASCII characters take the fast path of slicing `sourceText` more often.
This PR tackles the remainder - speeding up the fallback path where the fast path can't be taken.
## Optimizations
The optimizations in this PR are:
### Latin1
When source is not 100% ASCII, decode source text from buffer as Latin1.
A Latin1-decoded string represents each UTF-8 byte as a single Latin1 character, so it can be indexed into using UTF-8 offsets.
So when we can't slice the string from `sourceText` because the UTF-8 and UTF-16 offsets differ (after any non-ASCII character), loop through the string's bytes and check if they're all ASCII. If they are, the string can be sliced from `sourceTextLatin` instead, with the original UTF-8 offsets.
This is way faster than calling `textDecoder.decode`, as it avoids a call into C++. [Benchmarks show](https://github.com/overlookmotel/oxc-raw-str-bench/blob/4f96275efa9a35d5d27615abb27f21a137149cc0/README.md#apply30-vs-latin-vs-latin-source64) speed up of 55% on average, and up to 70% on some files.
### Latin1 decoding method
It turns out that `new TextDecoder("latin1").decode(arr)` doesn't actually decode to Latin1!
Per the WHATWG Encoding Standard, "latin1" is mapped to "windows-1252".
The result is that with `TextDecoder("latin1")`:
1. `decode` is quite complicated, requiring a 2-pass scan of the bytes to determine if they're all ASCII, followed by a 2nd pass to do the actual `windows-1252` decoding. If the string *does* contain any non-ASCII characters (which it always does in our usecase), NodeJS implements the decoding in JS, not native code. Slow.
2. `decode` produces a 2-byte-per-char string (`TWO_BYTE` in V8), which takes more memory, and is slower for all operations on it e.g. string comparison, hashing for use as an object key etc.
Instead, use `Buffer.prototype.latin1Slice` which:
1. Does a pure Latin1 decode, which is just a single `memcpy` call.
2. Produces a 1-byte-per-char string (`ONE_BYTE` in V8).
`latin1Slice` involves a call into C++, but we only do it once per file, so this cost is tiny in context of deserializing the whole AST.
### Latin1 string slicing
In the fast path, slice from the Latin1-decoded string, instead of `sourceText`. In the fast path, we know that all bytes of source comprising the string are ASCII, so no further checks are required.
This makes no difference on benchmarks for `deserializeStr` itself, but it may have beneficial effects downstream for code (e.g. lint rules) which access strings in the AST, e.g. `Identifier` names.
Because Latin1-decoded source text is `ONE_BYTE`-encoded, slices of it are too. In comparison, slices of `sourceText` may be `ONE_BYTE` or `TWO_BYTE`. If a file's source is pure ASCII, it'll be `ONE_BYTE`, if source contains any non-ASCII characters, it'll be `TWO_BYTE`. Files in a repo will likely be a mix of both, which makes strings returned from `deserializeStr` and placed in the AST a mix too. This in turn makes functions (e.g. lint rule visitors) polymorphic. V8 cannot optimize them as aggressively as if they see only `ONE_BYTE` strings.
We cannot make sure that all strings returned by `deserializeStr` are `ONE_BYTE`. Some string may contain non-ASCII characters, and they *have* to be represented in `TWO_BYTE` form. But we can minimize it - now only strings which *themselves* contain non-ASCII characters are `TWO_BYTE`, whereas before they would be if the source text as a whole contains a single non-ASCII byte.
Code which accesses `Identifier` names, for example will exclusively see `ONE_BYTE` strings and will be more heavily optimized, because Unicode `Identifier`s are rarer than hen's teeth in real-world code.
### Remove string-concatenation loop
Previously strings which are outside of source text were assembled byte-by-byte in a loop via concatenation.
Instead, check that all the bytes are ASCII first, copy them into an array and pass that array to `String.fromCharCode` with `fromCharCode.apply(null, array)`.
To avoid allocating a fresh array every time, hold a stock of arrays for all string lengths that this path can require, and reuse them.
This is a variation on the approach that #20883 took, but without the massive switch. This produces much tighter assembly, and avoids regressing the fast path due making `deserializeStr` a very large function.
Despite the complexity, and multiple operations, [this is up to 3x faster](https://github.com/overlookmotel/oxc-raw-str-bench/blob/4f96275efa9a35d5d27615abb27f21a137149cc0/README.md#apply30-vs-switch30) than the switch approach, and gives an average 30% speed-up.
### Increase native call threshold
The above optimizations make the slow path much faster. This shifts the tipping point at which it's faster to make a native call to `TextDecoder.decode` from 9 bytes to 64 bytes. Most strings now avoid the native call and stay in JS code which is heavily optimized by Turbofan.
The tipping point of 64 is something of a guesstimate. Benchmarking shows its in the right ballpark, but we could finesse it, and probably squeeze out another couple of %.
## Credit
The Latin1 string technique was cooked up by @joshuaisaact in overlookmotel/oxc-raw-str-bench#1. All credit to him for this masterstroke which cracks the whole problem!1 parent 55e1e9b commit 012c924
10 files changed
Lines changed: 394 additions & 256 deletions
File tree
- apps/oxlint/src-js/generated
- napi/parser/src-js/generated/deserialize
- tasks/ast_tools/src/generators
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
25 | 28 | | |
26 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
27 | 32 | | |
28 | 33 | | |
29 | 34 | | |
| |||
41 | 46 | | |
42 | 47 | | |
43 | 48 | | |
44 | | - | |
45 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
46 | 53 | | |
47 | 54 | | |
48 | 55 | | |
49 | 56 | | |
| 57 | + | |
50 | 58 | | |
51 | 59 | | |
52 | 60 | | |
53 | 61 | | |
54 | 62 | | |
55 | 63 | | |
56 | | - | |
57 | | - | |
| 64 | + | |
| 65 | + | |
58 | 66 | | |
59 | 67 | | |
60 | 68 | | |
| |||
5880 | 5888 | | |
5881 | 5889 | | |
5882 | 5890 | | |
5883 | | - | |
5884 | | - | |
5885 | | - | |
5886 | | - | |
5887 | | - | |
5888 | | - | |
5889 | | - | |
5890 | | - | |
5891 | | - | |
5892 | | - | |
5893 | | - | |
5894 | | - | |
5895 | | - | |
5896 | | - | |
5897 | | - | |
5898 | | - | |
5899 | | - | |
5900 | | - | |
5901 | | - | |
5902 | | - | |
5903 | | - | |
5904 | | - | |
5905 | | - | |
5906 | | - | |
5907 | | - | |
5908 | | - | |
5909 | | - | |
5910 | | - | |
5911 | | - | |
5912 | | - | |
5913 | | - | |
5914 | | - | |
5915 | | - | |
5916 | | - | |
| 5891 | + | |
| 5892 | + | |
| 5893 | + | |
| 5894 | + | |
| 5895 | + | |
| 5896 | + | |
| 5897 | + | |
| 5898 | + | |
| 5899 | + | |
| 5900 | + | |
| 5901 | + | |
| 5902 | + | |
| 5903 | + | |
| 5904 | + | |
| 5905 | + | |
| 5906 | + | |
| 5907 | + | |
| 5908 | + | |
| 5909 | + | |
| 5910 | + | |
| 5911 | + | |
| 5912 | + | |
| 5913 | + | |
| 5914 | + | |
5917 | 5915 | | |
5918 | 5916 | | |
5919 | 5917 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
8 | 10 | | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
12 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
13 | 18 | | |
14 | 19 | | |
| 20 | + | |
15 | 21 | | |
16 | 22 | | |
17 | 23 | | |
| |||
22 | 28 | | |
23 | 29 | | |
24 | 30 | | |
25 | | - | |
26 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
27 | 35 | | |
28 | 36 | | |
29 | 37 | | |
| 38 | + | |
30 | 39 | | |
31 | 40 | | |
32 | 41 | | |
33 | 42 | | |
34 | 43 | | |
35 | | - | |
36 | | - | |
| 44 | + | |
| 45 | + | |
37 | 46 | | |
38 | 47 | | |
39 | 48 | | |
| |||
4547 | 4556 | | |
4548 | 4557 | | |
4549 | 4558 | | |
4550 | | - | |
4551 | | - | |
4552 | | - | |
4553 | | - | |
4554 | | - | |
4555 | | - | |
4556 | | - | |
4557 | | - | |
4558 | | - | |
4559 | | - | |
4560 | | - | |
4561 | | - | |
4562 | | - | |
4563 | | - | |
4564 | | - | |
4565 | | - | |
| 4559 | + | |
| 4560 | + | |
| 4561 | + | |
| 4562 | + | |
| 4563 | + | |
| 4564 | + | |
| 4565 | + | |
| 4566 | + | |
| 4567 | + | |
| 4568 | + | |
| 4569 | + | |
| 4570 | + | |
| 4571 | + | |
| 4572 | + | |
| 4573 | + | |
| 4574 | + | |
| 4575 | + | |
| 4576 | + | |
| 4577 | + | |
| 4578 | + | |
4566 | 4579 | | |
4567 | 4580 | | |
4568 | 4581 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
8 | 10 | | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
13 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
14 | 19 | | |
15 | 20 | | |
| 21 | + | |
16 | 22 | | |
17 | 23 | | |
18 | 24 | | |
| |||
23 | 29 | | |
24 | 30 | | |
25 | 31 | | |
26 | | - | |
27 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
28 | 36 | | |
29 | 37 | | |
30 | 38 | | |
| 39 | + | |
31 | 40 | | |
32 | 41 | | |
33 | 42 | | |
34 | 43 | | |
35 | 44 | | |
36 | | - | |
37 | | - | |
| 45 | + | |
| 46 | + | |
38 | 47 | | |
39 | 48 | | |
40 | 49 | | |
| |||
5078 | 5087 | | |
5079 | 5088 | | |
5080 | 5089 | | |
5081 | | - | |
5082 | | - | |
5083 | | - | |
5084 | | - | |
5085 | | - | |
5086 | | - | |
5087 | | - | |
5088 | | - | |
5089 | | - | |
5090 | | - | |
5091 | | - | |
5092 | | - | |
5093 | | - | |
5094 | | - | |
5095 | | - | |
5096 | | - | |
| 5090 | + | |
| 5091 | + | |
| 5092 | + | |
| 5093 | + | |
| 5094 | + | |
| 5095 | + | |
| 5096 | + | |
| 5097 | + | |
| 5098 | + | |
| 5099 | + | |
| 5100 | + | |
| 5101 | + | |
| 5102 | + | |
| 5103 | + | |
| 5104 | + | |
| 5105 | + | |
| 5106 | + | |
| 5107 | + | |
| 5108 | + | |
| 5109 | + | |
5097 | 5110 | | |
5098 | 5111 | | |
5099 | 5112 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
8 | 10 | | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
12 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
13 | 18 | | |
14 | 19 | | |
| 20 | + | |
15 | 21 | | |
16 | 22 | | |
17 | 23 | | |
| |||
22 | 28 | | |
23 | 29 | | |
24 | 30 | | |
25 | | - | |
26 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
27 | 35 | | |
28 | 36 | | |
29 | 37 | | |
| 38 | + | |
30 | 39 | | |
31 | 40 | | |
32 | 41 | | |
33 | 42 | | |
34 | 43 | | |
35 | | - | |
36 | | - | |
| 44 | + | |
| 45 | + | |
37 | 46 | | |
38 | 47 | | |
39 | 48 | | |
| |||
5089 | 5098 | | |
5090 | 5099 | | |
5091 | 5100 | | |
5092 | | - | |
5093 | | - | |
5094 | | - | |
5095 | | - | |
5096 | | - | |
5097 | | - | |
5098 | | - | |
5099 | | - | |
5100 | | - | |
5101 | | - | |
5102 | | - | |
5103 | | - | |
5104 | | - | |
5105 | | - | |
5106 | | - | |
5107 | | - | |
| 5101 | + | |
| 5102 | + | |
| 5103 | + | |
| 5104 | + | |
| 5105 | + | |
| 5106 | + | |
| 5107 | + | |
| 5108 | + | |
| 5109 | + | |
| 5110 | + | |
| 5111 | + | |
| 5112 | + | |
| 5113 | + | |
| 5114 | + | |
| 5115 | + | |
| 5116 | + | |
| 5117 | + | |
| 5118 | + | |
| 5119 | + | |
| 5120 | + | |
5108 | 5121 | | |
5109 | 5122 | | |
5110 | 5123 | | |
| |||
0 commit comments