fix: improve performance in many places#203
Conversation
|
|
5399622 to
a494cfd
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #203 +/- ##
==========================================
- Coverage 97.79% 97.77% -0.03%
==========================================
Files 25 25
Lines 1908 1932 +24
Branches 600 607 +7
==========================================
+ Hits 1866 1889 +23
- Misses 40 41 +1
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will degrade performance by 98.22%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | cached-source: map() (cached) |
253.1 µs | 41.4 µs | ×6.1 |
| ⚡ | cached-source: new CachedSource() |
367.7 µs | 322.9 µs | +13.86% |
| ⚡ | cached-source: source() (cached) |
666.5 µs | 104 µs | ×6.4 |
| ❌ | prefix-source: getPrefix() |
120.7 µs | 6,649.9 µs | -98.19% |
| ❌ | prefix-source: original() |
121 µs | 6,783.4 µs | -98.22% |
| ⚡ | realistic-source-map-pipeline: warm sourceAndMap() (reuse CachedSource) |
79.8 µs | 51.8 µs | +54.14% |
| ❌ | replace-source: original() |
126.4 µs | 189.4 µs | -33.25% |
Comparing claude/investigate-cached-source-perf-ofvqq (03993c5) with main (adf17ec)1
Footnotes
51282ec to
bab15a7
Compare
The constructors of RawSource, OriginalSource and SourceMapSource preallocated `this._cachedSize = undefined` so that size() has a stable hidden-class slot to write to. That made every construction pay for memoization machinery regardless of whether .size() was ever called, showing up on CodSpeed as a ~12% regression on `cached-source: new CachedSource()` and 5-13% on several `source-map-source: new (*)` benchmarks. The size() overrides already guard on `this._cachedSize !== undefined`, which correctly handles missing properties. Drop the eager assignment and let the field be created on the first size() call. This keeps the size()/streamChunks() improvements from PR #199 (e.g. ~8x on a one-shot `new SMS().size()`) while erasing the construction regression. https://claude.ai/code/session_01LZbaaPrnDTu6y7s4nK4cJz
CachedSource.map(), sourceAndMap() and streamChunks() key their map-entry
cache with `JSON.stringify(options)`. That call is 60-100 ns on the
common `{}` / `{columns: false}` shapes and dominates the entire method
body once the cache is warm.
Both MapOptions and the streamChunks Options type only contain boolean
fields, so we can short-circuit the stringify for the three shapes that
account for essentially all real traffic (undefined, empty object,
columns-only) using interned string constants. Any other shape still
falls through to JSON.stringify, so keys stay byte-identical to the
on-disk BufferedMaps produced by previous CachedSource versions.
Measured on the benchmark fixtures with JIT enabled (best of 5, ns/op):
before after
warmed.map({}) 79.1 4.3 (18x)
warmed.map({columns:false})139.8 9.6 (14x)
warmed.sourceAndMap({}) 89.6 9.8 (9x)
warmed.map() / undefined 10.6 6.7 (1.6x)
warmed.sourceAndMap() 12.4 7.7 (1.6x)
Full --no-opt benchmark run shows +53% on `cached-source: map()
(cached)`, +37.6% on `cached-source: sourceAndMap() (cached)`, +34.5%
on `realistic-source-map-pipeline: warm sourceAndMap()`, and
+17-21% on `replace-source: map()` / `sourceAndMap()`. 89813 tests
still pass.
https://claude.ai/code/session_01LZbaaPrnDTu6y7s4nK4cJz
Follow-up to the _cachedSize deferral: PR #204 added a new `this._cachedBuffers = undefined` preallocation to the CachedSource constructor, which re-introduces the same per-construction cost we previously removed from RawSource/OriginalSource/SourceMapSource. `_cachedBuffers` is only read from `buffer()` and `buffers()`, and both call sites already guard with `!== undefined`, which treats missing properties identically to an undefined slot. Drop the eager assignment so every CachedSource construction skips one write and one hidden-class transition, and instances that never call `buffers()` stay on a tighter shape. https://claude.ai/code/session_01LZbaaPrnDTu6y7s4nK4cJz
`source()` delegated the hot "cache already filled" read to `_getCachedSource()`, which means every warm call pays for a prototype method lookup and a function-call stack frame. Under V8's TurboFan the call gets inlined, but the interpreter (and CodSpeed's simulation mode, which runs without optimization) actually pays the overhead. Inlining the `this._cachedSource !== undefined` short-circuit brings warm `source()` from ~58 ns/op down to ~41 ns/op under --no-opt while keeping the buffer-derived path behind the shared helper. https://claude.ai/code/session_01LZbaaPrnDTu6y7s4nK4cJz
…e ctor Two follow-ups to keep CodSpeed happy: 1. `source()` previously inlined only the warm `_cachedSource` check and still called `_getCachedSource()` on cold reads, which under the interpreter (CodSpeed's simulation mode) pays a function-call stack frame plus a redundant re-check of `_cachedSource`. Inline the whole helper body so both warm and cold paths stay branchless-to-call. Local --no-opt: cold `new CachedSource(RawSource).source()` drops 534 -> 477 ns/op (+12% vs main). 2. The constructor was evaluating `cachedData ? cachedData.X : undefined` once per cached-data field (five separate ternaries). Replace with a single branch on `cachedData` so construction pays one conditional instead of five. Local --no-opt: `new CachedSource(new RawSource(…))` drops 431 -> 375 ns/op (+13% vs main), erasing the -11% regression CodSpeed showed on `cached-source: new CachedSource()`. `_getCachedSource()` stays on the class because `buffer()`, `streamChunks()` and `getCachedData()` still call it; only `source()` had a hot enough caller to justify duplicating the body. https://claude.ai/code/session_01LZbaaPrnDTu6y7s4nK4cJz
38b570e to
03fed1e
Compare
No description provided.