fix(profiling): stale ProfilingContext cache causing missing trace endpoint labels#7786
fix(profiling): stale ProfilingContext cache causing missing trace endpoint labels#7786
Conversation
…point labels
TracingPlugin.startSpan() calls storage.enterWith({span}) immediately on span
creation, before the plugin calls addRequestTags() to set span.type='web'. The
first enterCh event therefore fires with span.type unset, causing
#getProfilingContext to compute webTags=undefined and cache that result on the
span. When the span is re-activated moments later (with span.type='web' already
set) the stale cache is returned and webTags stays undefined for the entire
request, so no trace endpoint labels appear in the CPU profile.
Fix: skip writing the cache when endpointCollectionEnabled is true, webTags is
undefined, *and* the span's type is not yet set. On the next activation the
context is recomputed; once span.type='web' is known webTags is found and the
result is cached normally.
This affects both the ACF path (pprof.time.setContext per activation) and the
non-ACF path (_currentContext.ref mutation), so the fix lives in
#getProfilingContext which is shared by both.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7786 +/- ##
==========================================
+ Coverage 80.34% 80.55% +0.21%
==========================================
Files 743 743
Lines 32296 32297 +1
==========================================
+ Hits 25947 26017 +70
+ Misses 6349 6280 -69 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overall package sizeSelf size: 4.98 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.0 | 81.15 kB | 815.98 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
This comment has been minimized.
This comment has been minimized.
BenchmarksBenchmark execution time: 2026-03-16 13:09:24 Comparing candidate commit ed5c375 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 231 metrics, 29 unstable metrics. |
…dpoint labels (#7786) TracingPlugin.startSpan() calls storage.enterWith({span}) immediately on span creation, before the plugin calls addRequestTags() to set span.type='web'. The first enterCh event therefore fires with span.type unset, causing #getProfilingContext to compute webTags=undefined and cache that result on the span. When the span is re-activated moments later (with span.type='web' already set) the stale cache is returned and webTags stays undefined for the entire request, so no trace endpoint labels appear in the CPU profile. Fix: skip writing the cache when endpointCollectionEnabled is true, webTags is undefined, *and* the span's type is not yet set. On the next activation the context is recomputed; once span.type='web' is known webTags is found and the result is cached normally. This affects both the ACF path (pprof.time.setContext per activation) and the non-ACF path (_currentContext.ref mutation), so the fix lives in #getProfilingContext which is shared by both. Co-authored-by: Claude Sonnet 4.6 <[email protected]>
…dpoint labels (#7786) TracingPlugin.startSpan() calls storage.enterWith({span}) immediately on span creation, before the plugin calls addRequestTags() to set span.type='web'. The first enterCh event therefore fires with span.type unset, causing #getProfilingContext to compute webTags=undefined and cache that result on the span. When the span is re-activated moments later (with span.type='web' already set) the stale cache is returned and webTags stays undefined for the entire request, so no trace endpoint labels appear in the CPU profile. Fix: skip writing the cache when endpointCollectionEnabled is true, webTags is undefined, *and* the span's type is not yet set. On the next activation the context is recomputed; once span.type='web' is known webTags is found and the result is cached normally. This affects both the ACF path (pprof.time.setContext per activation) and the non-ACF path (_currentContext.ref mutation), so the fix lives in #getProfilingContext which is shared by both. Co-authored-by: Claude Sonnet 4.6 <[email protected]>
I noticed that we don't seem to be getting tracing endpoints associated with profiling samples with ACF anymore. Turns out that:
TracingPlugin.startSpan()callsstorage.enterWith({span})immediately on span creation, before the plugin callsaddRequestTags()to setspan.type='web'. This means the firstenterChevent fires withspan.typeunset.#getProfilingContext()was cachingwebTags=undefinedfrom that first activation, so the subsequent activation (withspan.type='web'already set) incorrectly served the stale cache and never producedtrace endpointlabels in CPU profiles.We fix it by skipping writing to the
span[ProfilingContext]cache when endpoint collection is enabled,webTagsisundefined, andspan.typeis not yet set. This forces recomputation on the next activation, by which timespan.type='web'will be set. The fix applies to both the ACF path (Node.js 24 CPED) and the non-ACF path (legacy async_hooks), even though async_hooks is not really susceptible to it as it invokes the#entervery often.We still miss the endpoints when
setRequestTagsis called in the same activation, fortunatelyweb.instrument()provides a subsequent second activation as well. It'd be possible to fix this more thoroughly by adding a DC for observing setRequestTags, but I'd like to save that for after some other soon-incoming-changes. This fix reenables endpoint tracing for virtually all cases already.Test plan
wall.spec.jscover ACF path, non-ACF path, and child span inheritancecd packages/dd-trace && yarn test test/profiling/profilers/wall.spec.jstrace endpointlabels appear in profiles collected against a real Express app withDD_PROFILING_ENDPOINT_COLLECTION_ENABLED=trueHere's a screenshot profiling a test application after the fix:

Compare to screenshot before the fix with no "For Endpoints" section:

🤖 Generated with Claude Code