Skip to content

Conversation

@pmalouin
Copy link

@pmalouin pmalouin commented Oct 9, 2019

Changes

Fixes #161

As discussed in #161, this PR applies a performance optimization to reduce the memory footprint of parsed templates that are held in memory. The implementation tries to "flatten" the string's memory representation.

After a few trials, the only implementation that works in Node 8.x, 10.x and 12.x is to convert the string to a Buffer instance and back to a string.

This PR also adds a memory footprint benchmark suite.

Performance impacts

Here are the results of running the benchmarks before/after the change:

Node v8.16.1

Before change

--- output ---
literal x 12,549 ops/sec ±5.10% (66 runs sampled)
truncate x 16,438 ops/sec ±6.61% (66 runs sampled)
date x 16,108 ops/sec ±4.62% (69 runs sampled)
escape x 21,418 ops/sec ±4.92% (72 runs sampled)
default x 17,142 ops/sec ±6.03% (71 runs sampled)
--- tag ---
if x 15,015 ops/sec ±6.34% (65 runs sampled)
unless x 11,412 ops/sec ±21.84% (51 runs sampled)
for x 5,211 ops/sec ±10.51% (57 runs sampled)
switch x 4,201 ops/sec ±27.06% (43 runs sampled)
assign x 16,051 ops/sec ±12.60% (61 runs sampled)
capture x 21,869 ops/sec ±4.14% (71 runs sampled)
increment x 28,739 ops/sec ±10.84% (59 runs sampled)
decrement x 32,349 ops/sec ±8.22% (65 runs sampled)
tablerow x 7,899 ops/sec ±6.14% (66 runs sampled)
--- demo ---
demo x 3,117 ops/sec ±6.84% (62 runs sampled)
--- layout ---
cache=false x 2,362 ops/sec ±8.86% (62 runs sampled)
cache=true x 5,918 ops/sec ±14.59% (59 runs sampled)
--- memory ---
retained memory for a 3KB template: 130.337KB (250 samples)

After change

--- output ---
literal x 9,183 ops/sec ±15.14% (55 runs sampled)
truncate x 13,476 ops/sec ±10.56% (63 runs sampled)
date x 14,451 ops/sec ±5.20% (67 runs sampled)
escape x 18,873 ops/sec ±8.34% (67 runs sampled)
default x 16,910 ops/sec ±4.81% (68 runs sampled)
--- tag ---
if x 15,231 ops/sec ±3.24% (71 runs sampled)
unless x 14,695 ops/sec ±3.18% (70 runs sampled)
for x 3,768 ops/sec ±22.81% (43 runs sampled)
switch x 1,250 ops/sec ±30.12% (39 runs sampled)
assign x 14,912 ops/sec ±10.09% (60 runs sampled)
capture x 20,072 ops/sec ±5.62% (71 runs sampled)
increment x 24,952 ops/sec ±17.23% (53 runs sampled)
decrement x 26,009 ops/sec ±9.63% (58 runs sampled)
tablerow x 7,596 ops/sec ±5.29% (68 runs sampled)
--- demo ---
demo x 2,474 ops/sec ±10.26% (58 runs sampled)
--- layout ---
cache=false x 2,066 ops/sec ±11.15% (61 runs sampled)
cache=true x 6,567 ops/sec ±6.14% (68 runs sampled)
--- memory ---
retained memory for a 3KB template: 14.5260625KB (250 samples)

Node v10.16.3

Before change

--- output ---
literal x 23,165 ops/sec ±9.82% (68 runs sampled)
truncate x 27,848 ops/sec ±11.87% (59 runs sampled)
date x 28,459 ops/sec ±8.46% (66 runs sampled)
escape x 35,186 ops/sec ±11.68% (61 runs sampled)
default x 29,286 ops/sec ±11.21% (64 runs sampled)
--- tag ---
if x 24,939 ops/sec ±10.41% (60 runs sampled)
unless x 24,386 ops/sec ±10.31% (60 runs sampled)
for x 8,772 ops/sec ±15.28% (58 runs sampled)
switch x 12,475 ops/sec ±10.18% (63 runs sampled)
assign x 33,224 ops/sec ±9.23% (66 runs sampled)
capture x 36,463 ops/sec ±9.62% (64 runs sampled)
increment x 60,185 ops/sec ±12.16% (60 runs sampled)
decrement x 24,719 ops/sec ±55.25% (27 runs sampled)
tablerow x 8,308 ops/sec ±18.04% (50 runs sampled)
--- demo ---
demo x 3,494 ops/sec ±17.12% (50 runs sampled)
--- layout ---
cache=false x 2,794 ops/sec ±14.90% (55 runs sampled)
cache=true x 9,101 ops/sec ±13.30% (51 runs sampled)
--- memory ---
retained memory for a 3KB template: 98.53553125KB (250 samples)

After change

--- output ---
literal x 21,353 ops/sec ±9.51% (65 runs sampled)
truncate x 29,896 ops/sec ±9.59% (66 runs sampled)
date x 25,673 ops/sec ±10.01% (63 runs sampled)
escape x 35,280 ops/sec ±10.31% (65 runs sampled)
default x 30,202 ops/sec ±8.99% (65 runs sampled)
--- tag ---
if x 26,551 ops/sec ±9.17% (66 runs sampled)
unless x 25,031 ops/sec ±10.32% (65 runs sampled)
for x 9,396 ops/sec ±17.51% (65 runs sampled)
switch x 10,219 ops/sec ±13.70% (57 runs sampled)
assign x 30,277 ops/sec ±10.72% (61 runs sampled)
capture x 32,643 ops/sec ±11.43% (56 runs sampled)
increment x 59,629 ops/sec ±11.25% (63 runs sampled)
decrement x 61,304 ops/sec ±10.31% (66 runs sampled)
tablerow x 11,196 ops/sec ±10.90% (57 runs sampled)
--- demo ---
demo x 3,979 ops/sec ±12.18% (56 runs sampled)
--- layout ---
cache=false x 2,783 ops/sec ±10.69% (56 runs sampled)
cache=true x 10,978 ops/sec ±10.54% (59 runs sampled)
--- memory ---
retained memory for a 3KB template: 18.93296875KB (250 samples)

Node v12.11.1

Before change

--- output ---
literal x 18,560 ops/sec ±15.98% (61 runs sampled)
truncate x 23,535 ops/sec ±15.89% (64 runs sampled)
date x 30,289 ops/sec ±5.90% (70 runs sampled)
escape x 39,775 ops/sec ±7.63% (67 runs sampled)
default x 28,139 ops/sec ±12.73% (62 runs sampled)
--- tag ---
if x 29,444 ops/sec ±8.24% (71 runs sampled)
unless x 27,317 ops/sec ±10.17% (72 runs sampled)
for x 9,733 ops/sec ±15.09% (65 runs sampled)
switch x 11,498 ops/sec ±8.65% (64 runs sampled)
assign x 30,745 ops/sec ±7.74% (65 runs sampled)
capture x 35,279 ops/sec ±15.47% (65 runs sampled)
increment x 59,210 ops/sec ±14.25% (68 runs sampled)
decrement x 66,293 ops/sec ±9.08% (70 runs sampled)
tablerow x 12,352 ops/sec ±8.42% (69 runs sampled)
--- demo ---
demo x 4,997 ops/sec ±13.12% (67 runs sampled)
--- layout ---
cache=false x 2,772 ops/sec ±11.32% (60 runs sampled)
cache=true x 12,084 ops/sec ±12.81% (70 runs sampled)
--- memory ---
retained memory for a 3KB template: 97.79503125KB (250 samples)

After change

--- output ---
literal x 18,877 ops/sec ±10.26% (61 runs sampled)
truncate x 20,667 ops/sec ±22.44% (59 runs sampled)
date x 20,703 ops/sec ±17.39% (57 runs sampled)
escape x 27,398 ops/sec ±14.42% (57 runs sampled)
default x 15,843 ops/sec ±30.52% (49 runs sampled)
--- tag ---
if x 13,795 ops/sec ±18.62% (40 runs sampled)
unless x 11,405 ops/sec ±24.08% (46 runs sampled)
for x 8,654 ops/sec ±13.18% (63 runs sampled)
switch x 11,054 ops/sec ±11.63% (68 runs sampled)
assign x 29,164 ops/sec ±10.31% (63 runs sampled)
capture x 21,827 ops/sec ±23.17% (47 runs sampled)
increment x 55,947 ops/sec ±11.52% (64 runs sampled)
decrement x 52,068 ops/sec ±13.48% (59 runs sampled)
tablerow x 8,370 ops/sec ±29.84% (54 runs sampled)
--- demo ---
demo x 4,147 ops/sec ±11.10% (67 runs sampled)
--- layout ---
cache=false x 1,732 ops/sec ±17.99% (52 runs sampled)
cache=true x 9,773 ops/sec ±10.17% (63 runs sampled)
--- memory ---
retained memory for a 3KB template: 8.50384375KB (250 samples)

Summary

There seems to be a negative performance impact (ops per sec) that is most noticeable on Node.js 8.x. For 10.x and 12.x, the impact is less obvious, some tests perform better, others worse.

The memory footprint improvements are significant for the benchmarked scenario on all three versions of node.

Potential improvements

The overhead of converting to a Buffer is probably more significant for smaller templates, while the potential gain on the memory side would be smaller. One idea would be to apply the flattening logic only when a given string is longer than some threshold.

One option could be to put this optimization behind an option to opt out of this behavior. For example, if a template is used and immediately trashed, there is no gain to reduce the memory footprint as it would get garbage collected quickly. On the contrary, for users that expect to call the template multiple time, the additional latency during parsing is maybe less problematic.

Thanks for sharing any feedback 🙇

@harttle harttle merged commit 3ad512c into harttle:master Oct 10, 2019
@harttle
Copy link
Owner

harttle commented Oct 10, 2019

🎉 This PR is included in version 9.1.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance of the tokenizer and V8 concatenated string

2 participants