perf(prof): speedup hot path in allocator#3505
Conversation
5049f86 to
be579db
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3505 +/- ##
==========================================
- Coverage 61.75% 61.64% -0.11%
==========================================
Files 143 143
Lines 13008 13008
Branches 1702 1702
==========================================
- Hits 8033 8019 -14
- Misses 4215 4228 +13
- Partials 760 761 +1 see 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
be579db to
1f5dd79
Compare
Benchmarks [ profiler ]Benchmark execution time: 2025-12-09 16:06:11 Comparing candidate commit eca063f in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 28 metrics, 7 unstable metrics. scenario:php-profiler-exceptions-with-profiler-and-timeline
|
Benchmarks [ tracer ]Benchmark execution time: 2025-12-08 08:31:57 Comparing candidate commit 476c702 in PR branch Found 3 performance improvements and 0 performance regressions! Performance is the same for 191 metrics, 0 unstable metrics. scenario:ComposerTelemetryBench/benchTelemetryParsing
scenario:MessagePackSerializationBench/benchMessagePackSerialization
scenario:TraceSerializationBench/benchSerializeTrace
|
morrisonlevi
left a comment
There was a problem hiding this comment.
There is just no way that alloc_prof_realloc would call into alloc_prof_rshutdown and from there, there is no way to call into zend_mm_gc...
I don't think this quite right. The function zend_mm_safe_error is called by various allocation functions such as zend_mm_realloc_huge, and zend_mm_safe_error has zend_bailout which jumps to EG(bailout). It could jump to shutdown, yeah?
I was thinking: we might not need to do all of this get_or_init() dance and use a RefCell at runtime, when we can instead initialize the ALLOCATION_PROFILING_STATS in GINIT and then just "know" that this is initialized and use it.
RefCell is not about being initialized or not, that is the thread-local lazy initialization. The RefCell is about enforcing Rust's borrowing rules at runtime.
I think there's possibly some optimization though. We could use MaybeUninit<RefCell<AllocationProfilingStats>> and make it const { MaybeUninit::uninit() } so it doesn't need lazy-initialization checks at runtime. We initialize it in ginit. The RefCell still is there to enforce borrow rules at runtime (cover our butts, yes, it causes crashes, but that's better than undefined behavior).
c65a5b8 to
d84a013
Compare
|
With b0ce8e9 I've guarded the allocation (and exception) stats we are sending with the metadata JSON behind a feature flag because it looks like we are spending nearly ⅓ of our time just in incrementing these counters:
They are also only needed by us as a measurement to validate the upscaling algorithm. I am gonna measure again once the binary is ready to benchmark. |
You are right, in case ZendMM can't allocate it bails out. Okay, that that part of the stack trace is fine. Just don't know how
We could benchmark the different options against each other and see. There are probably easier and more safe ways to bring down overhead without doing too much unsafe stuff 😉 |
We are okay to use |
|
I had a thought: when an Apache graceful restart is done, the process goes through mshutdown and back into minit. What is the story there with ginit and gshutdown? |
morrisonlevi
left a comment
There was a problem hiding this comment.
I think we should try to encapsulate the access to ALLOCATION_PROFILING_STATS so that other modules are not repeating the unsafe code (and comments that ought to accompany it). I'll work on it.
|
I think this is my only concern before merging:
We need to be sure everything is okay. I am still getting back into the swing of work on a Monday morning, but I think it might be okay as-is, because the allocator hooks are unloaded so nothing should be accessing the un-initialized thread-locals even if gshutdown and such are called. If they aren't called, then they'll be okay from a safety perspective, but we'll have to think about correctness, perhaps? |
`UnsafeCell` and document why it is safe anyway.
Co-authored-by: Florian Engelhardt <[email protected]>
476c702 to
eca063f
Compare
|
I tried finding the code in the PHP extension that handles Apache graceful restarts and wasn't successful. Might be faster to spin up a new module with hooks and see when they are called under Apache, particularly if you already have a docker container that uses Apache set up (I don't currently). |
AFAIK this is handled in Apache itself, the entrypoint from Apache to PHP should be
I run the |

Description
We got some crash reports with stacks like this:
None of these are actually helpful, because this would just look like a crash in the engine's
execute_exfunction somewhere. But now we got a new crash trace adding some stack frames that look like stack corruption, but reveal something interesting nonetheless:There is just no way thatalloc_prof_reallocwould call intoalloc_prof_rshutdownand from there, there is no way to call intozend_mm_gc...The stack trace is still weird, but
alloc_prof_reallocin case ZendMM can't allocate, will bailout (longjmp()toEG(bailout)which usually leads toRSHUTDOWNand as such toalloc_prof_rshutdown. I can't see a path though fromalloc_prof_rshutdowntozend_mm_gc...But there are a few ideas and things to this stack anyway: according to the runtime stack trace, it is in
Composer\Autoload\ClassLoader::findFileWithExtension()line 505 which could trigger a reallocation, which on the way down, could call intozend_mm_gc. So at least this makes sense, not 100% with the stack trace we see, but ...So anyway, I was thinking:
get_or_init()dance with theRefCellinALLOCATION_PROFILING_STATSat runtime, when we know there is no reentrancyALLOCATION_PROFILING_STATSitself when we make it aconstSeqCstmemory ordering for stats and intervals is not needed, we only need atomicity, there is no happen-before relation to other memory, so we can useRelaxedand also move all the stats behind a feature flag because we are the only ones using itReviewer checklist