Skip to content

perf(prof): pre-reserve function name buffer#3445

Merged
morrisonlevi merged 3 commits intomasterfrom
levi/perf-func-name
Dec 17, 2025
Merged

perf(prof): pre-reserve function name buffer#3445
morrisonlevi merged 3 commits intomasterfrom
levi/perf-func-name

Conversation

@morrisonlevi
Copy link
Copy Markdown
Collaborator

@morrisonlevi morrisonlevi commented Oct 13, 2025

PROF-12543

Description

When a function name isn't cached, we calculate the function name which has up to 5 "segments":

{$module_name}|{$class_name}::{$method_name}

And with how the code was written, we were doing potentially 5 memory allocations per function name. Of course, the Vec's growth amortization prevented some of these, but not all of them.

This now pre-reserves the size of the function name buffer.

Motivation

Here's the before/after with the whole-host profiler:

Screenshot 2025-10-13 at 8 16 17 PM

The "key" bit is shaving off ~8ms of the ~24ms operation, across a minute. Obviously this is a tiny gain but it is also very easy. It's a good practice to right-size your strings/buffers when you know how large they are ahead of time.

The function cache is reset at the start of each request, so at the start of each request the samples tend to have more uncached function names than ones which happen later on. Coupled with the fact that we are walking the stack on-thread, the code is reasonably hot early on when collecting a sample.

Additional Notes

Today I was able to look at the whole-host profiler's data for the PHP profiler running on the enterprise DOE app that Florian has set up, and I noticed the memory being reallocated in the data.

I have used ddprof before on the PHP profiler, even recently. I believed I saw this recently with ddprof, but it had a larger impact when measured by the whole host profiler. I have no idea which is more accurate, or if we are simply dealing with the fact that we're sampling something that by design happens infrequently, and just got different results accordingly.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@morrisonlevi morrisonlevi added cat:performance profiling Relates to the Continuous Profiler labels Oct 13, 2025
@morrisonlevi morrisonlevi marked this pull request as ready for review December 17, 2025 18:40
@morrisonlevi morrisonlevi requested a review from a team as a code owner December 17, 2025 18:40
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.64%. Comparing base (3626b11) to head (79801a8).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3445      +/-   ##
==========================================
- Coverage   61.74%   61.64%   -0.11%     
==========================================
  Files         143      143              
  Lines       13045    13045              
  Branches     1704     1704              
==========================================
- Hits         8055     8041      -14     
- Misses       4228     4241      +13     
- Partials      762      763       +1     

see 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3626b11...79801a8. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Dec 17, 2025

Benchmarks [ profiler ]

Benchmark execution time: 2025-12-17 19:39:17

Comparing candidate commit 79801a8 in PR branch levi/perf-func-name with baseline commit 3626b11 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 29 metrics, 7 unstable metrics.

Comment thread profiling/src/profiling/stack_walking.rs Outdated
Copy link
Copy Markdown
Member

@realFlowControl realFlowControl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the buffer.reserve() vs buffer.reserve_exact() thingy otherwise LGTM!

Comment thread profiling/src/profiling/stack_walking.rs Outdated
Comment on lines +74 to +75
let module_len = has_module as usize * "|".len() + module_name.len();
let class_name_len = has_class as usize * "::".len() + class_name.len();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a nice and cheap (in CPU cycles) way to avoid a branch

Co-authored-by: Florian Engelhardt <[email protected]>
@morrisonlevi morrisonlevi merged commit fde1939 into master Dec 17, 2025
2000 of 2008 checks passed
@morrisonlevi morrisonlevi deleted the levi/perf-func-name branch December 17, 2025 20:24
@github-actions github-actions Bot added this to the 1.16.0 milestone Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:performance profiling Relates to the Continuous Profiler tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants