Skip to content

perf(allocator/vec2): align min amortized cap size with std#9857

Merged
graphite-app[bot] merged 1 commit intomainfrom
03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std
May 3, 2025
Merged

perf(allocator/vec2): align min amortized cap size with std#9857
graphite-app[bot] merged 1 commit intomainfrom
03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std

Conversation

@Dunqing
Copy link
Member

@Dunqing Dunqing commented Mar 18, 2025

Align with https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#653-656, but unfortunately, we got a performance regression from this optimization, which was caused by let cap = cmp::max(Self::MIN_NON_ZERO_CAP, cap);. So I commented it out and added comments to describe why. Although the performance has not changed, keep the implementation the same as the standard library is also nice to have.

Copy link
Member Author

Dunqing commented Mar 18, 2025

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 18, 2025

CodSpeed Instrumentation Performance Report

Merging #9857 will not alter performance

Comparing 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std (4eaef66) with main (7f2f247)

Summary

✅ 36 untouched benchmarks

@Dunqing Dunqing mentioned this pull request Jan 18, 2026
12 tasks
@Dunqing Dunqing force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch from 800ce69 to eafa7ae Compare March 18, 2025 08:36
@Dunqing Dunqing force-pushed the 03-18-perf_allocator_vec2_replace_self.reserve_1_calls_with_self.grow_one_for_better_efficiency branch from 4dc483f to 9d1db26 Compare March 18, 2025 08:36
@Dunqing Dunqing force-pushed the 03-18-perf_allocator_vec2_replace_self.reserve_1_calls_with_self.grow_one_for_better_efficiency branch from 9d1db26 to c94a0f0 Compare March 18, 2025 08:59
@Dunqing Dunqing force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch from eafa7ae to 00f7adb Compare March 18, 2025 08:59
@Dunqing Dunqing changed the base branch from 03-18-perf_allocator_vec2_replace_self.reserve_1_calls_with_self.grow_one_for_better_efficiency to graphite-base/9857 March 18, 2025 10:36
@Dunqing Dunqing force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch from 00f7adb to d354fa1 Compare March 18, 2025 10:58
@Dunqing Dunqing force-pushed the graphite-base/9857 branch 2 times, most recently from 07ac0f1 to 43613e1 Compare March 18, 2025 16:35
@Dunqing Dunqing force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch from d354fa1 to 66bd195 Compare March 18, 2025 16:35
@Dunqing Dunqing changed the base branch from graphite-base/9857 to 03-18-perf_allocator_vec2_replace_self.reserve_1_calls_with_self.grow_one_for_better_efficiency March 18, 2025 16:35
@Dunqing Dunqing marked this pull request as draft March 20, 2025 09:48
@Dunqing Dunqing changed the base branch from 03-18-perf_allocator_vec2_replace_self.reserve_1_calls_with_self.grow_one_for_better_efficiency to graphite-base/9857 March 21, 2025 08:44
@Dunqing Dunqing force-pushed the graphite-base/9857 branch from 43613e1 to e8c0a03 Compare March 21, 2025 08:54
@Dunqing Dunqing force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch from 66bd195 to 01cc973 Compare March 21, 2025 08:54
@Dunqing Dunqing changed the base branch from graphite-base/9857 to 04-29-feat_allocator_vec2_align_rawvec_reserve_with_standard_library_implementation April 30, 2025 07:31
@Dunqing Dunqing force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch 3 times, most recently from 8ee03a4 to 9927f60 Compare April 30, 2025 11:36
@Dunqing Dunqing marked this pull request as ready for review April 30, 2025 11:41
@graphite-app graphite-app bot force-pushed the 04-29-feat_allocator_vec2_align_rawvec_reserve_with_standard_library_implementation branch from df1271b to b3bd1f4 Compare May 3, 2025 13:12
@graphite-app graphite-app bot force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch from 9927f60 to 7d5ef8e Compare May 3, 2025 13:13
Copy link
Member

@overlookmotel overlookmotel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dunqing This is a really interesting discovery that Vecs having a minimum capacity of 4 is a perf regression for oxc_transformer etc (it'd be 4 for all AST types, as they're all between 8 and 1024 bytes).

I suspect that the reason isn't the cost of the cmp::max call in grow_amortized (like your comment says), but is to do with CPU cache usage.

As you say, many Vecs in AST require less than 4 elements e.g. the Vec<FormalParameter> in function foo(x) {}. FormalParameter is 80 bytes, so if there's spare capacity of 3 in the Vec, that results in 240 bytes unused in the middle of the arena. With data spread out across the arena, it'll produce more L1/L2 cache misses when traversing the AST.

That's just a theory, but I doubt one cmp::max call could produce a significant perf regression, especially as it's on a cold path.

@overlookmotel overlookmotel added the 0-merge Merge with Graphite Merge Queue label May 3, 2025
Copy link
Member

overlookmotel commented May 3, 2025

Merge activity

@graphite-app graphite-app bot force-pushed the 04-29-feat_allocator_vec2_align_rawvec_reserve_with_standard_library_implementation branch from b3bd1f4 to b34bd06 Compare May 3, 2025 14:54
graphite-app bot pushed a commit that referenced this pull request May 3, 2025
Align with https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#653-656, but unfortunately, we got a performance regression from this optimization, which was caused by `let cap = cmp::max(Self::MIN_NON_ZERO_CAP, cap);`. So I commented it out and added comments to describe why. Although the performance has not changed, keep the implementation the same as the standard library is also nice to have.
@graphite-app graphite-app bot force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch from 7d5ef8e to 84407cc Compare May 3, 2025 14:55
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label May 3, 2025
Align with https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#653-656, but unfortunately, we got a performance regression from this optimization, which was caused by `let cap = cmp::max(Self::MIN_NON_ZERO_CAP, cap);`. So I commented it out and added comments to describe why. Although the performance has not changed, keep the implementation the same as the standard library is also nice to have.
@overlookmotel overlookmotel force-pushed the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch from 84407cc to 4eaef66 Compare May 3, 2025 15:04
@overlookmotel overlookmotel force-pushed the 04-29-feat_allocator_vec2_align_rawvec_reserve_with_standard_library_implementation branch from b34bd06 to 3cd3d23 Compare May 3, 2025 15:04
@overlookmotel overlookmotel added the 0-merge Merge with Graphite Merge Queue label May 3, 2025
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label May 3, 2025
Base automatically changed from 04-29-feat_allocator_vec2_align_rawvec_reserve_with_standard_library_implementation to main May 3, 2025 15:10
@graphite-app graphite-app bot merged commit 4eaef66 into main May 3, 2025
29 checks passed
@graphite-app graphite-app bot deleted the 03-18-perf_allocator_vec2_align_min_amortized_cap_size_with_std branch May 3, 2025 15:11
@Dunqing
Copy link
Member Author

Dunqing commented May 16, 2025

@Dunqing This is a really interesting discovery that Vecs having a minimum capacity of 4 is a perf regression for oxc_transformer etc (it'd be 4 for all AST types, as they're all between 8 and 1024 bytes).

I suspect that the reason isn't the cost of the cmp::max call in grow_amortized (like your comment says), but is to do with CPU cache usage.

As you say, many Vecs in AST require less than 4 elements e.g. the Vec<FormalParameter> in function foo(x) {}. FormalParameter is 80 bytes, so if there's spare capacity of 3 in the Vec, that results in 240 bytes unused in the middle of the arena. With data spread out across the arena, it'll produce more L1/L2 cache misses when traversing the AST.

That's just a theory, but I doubt one cmp::max call could produce a significant perf regression, especially as it's on a cold path.

I agreed your point! I've tried to pre-allocate 4 or 8 capacity for most of Vec usages in the parser, the benchmark result surprised me, it doesn't improve but hurts 1%-2% performance. The reason is probably like you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants