perf(allocator/vec2): align min amortized cap size with std#9857
Conversation
CodSpeed Instrumentation Performance ReportMerging #9857 will not alter performanceComparing Summary
|
800ce69 to
eafa7ae
Compare
4dc483f to
9d1db26
Compare
9d1db26 to
c94a0f0
Compare
eafa7ae to
00f7adb
Compare
00f7adb to
d354fa1
Compare
07ac0f1 to
43613e1
Compare
d354fa1 to
66bd195
Compare
43613e1 to
e8c0a03
Compare
66bd195 to
01cc973
Compare
8ee03a4 to
9927f60
Compare
df1271b to
b3bd1f4
Compare
9927f60 to
7d5ef8e
Compare
There was a problem hiding this comment.
@Dunqing This is a really interesting discovery that Vecs having a minimum capacity of 4 is a perf regression for oxc_transformer etc (it'd be 4 for all AST types, as they're all between 8 and 1024 bytes).
I suspect that the reason isn't the cost of the cmp::max call in grow_amortized (like your comment says), but is to do with CPU cache usage.
As you say, many Vecs in AST require less than 4 elements e.g. the Vec<FormalParameter> in function foo(x) {}. FormalParameter is 80 bytes, so if there's spare capacity of 3 in the Vec, that results in 240 bytes unused in the middle of the arena. With data spread out across the arena, it'll produce more L1/L2 cache misses when traversing the AST.
That's just a theory, but I doubt one cmp::max call could produce a significant perf regression, especially as it's on a cold path.
Merge activity
|
b3bd1f4 to
b34bd06
Compare
Align with https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#653-656, but unfortunately, we got a performance regression from this optimization, which was caused by `let cap = cmp::max(Self::MIN_NON_ZERO_CAP, cap);`. So I commented it out and added comments to describe why. Although the performance has not changed, keep the implementation the same as the standard library is also nice to have.
7d5ef8e to
84407cc
Compare
Align with https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#653-656, but unfortunately, we got a performance regression from this optimization, which was caused by `let cap = cmp::max(Self::MIN_NON_ZERO_CAP, cap);`. So I commented it out and added comments to describe why. Although the performance has not changed, keep the implementation the same as the standard library is also nice to have.
84407cc to
4eaef66
Compare
b34bd06 to
3cd3d23
Compare
I agreed your point! I've tried to pre-allocate 4 or 8 capacity for most of |

Align with https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#653-656, but unfortunately, we got a performance regression from this optimization, which was caused by
let cap = cmp::max(Self::MIN_NON_ZERO_CAP, cap);. So I commented it out and added comments to describe why. Although the performance has not changed, keep the implementation the same as the standard library is also nice to have.