chore(telemetry): extend process tracking to spawned processes#16842
chore(telemetry): extend process tracking to spawned processes#16842
Conversation
This comment has been minimized.
This comment has been minimized.
Performance SLOsComparing candidate munir/send-app-start-closed-on-root (800925e) with baseline main (ecc75ad) 🟡 Near SLO Breach (3 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 19.650ms (SLO: <22.300ms 📉 -11.9%) vs baseline: +0.2% Memory: ✅ 68.829MB (SLO: <73.500MB -6.4%) vs baseline: +4.8% ✅ exception-replay-enabledTime: ✅ 1.395ms (SLO: <1.450ms -3.8%) vs baseline: ~same Memory: ✅ 66.798MB (SLO: <71.500MB -6.6%) vs baseline: +4.8% ✅ iastTime: ✅ 19.628ms (SLO: <22.250ms 📉 -11.8%) vs baseline: -0.4% Memory: ✅ 68.917MB (SLO: <75.000MB -8.1%) vs baseline: +4.9% ✅ profilerTime: ✅ 15.188ms (SLO: <16.550ms -8.2%) vs baseline: +0.1% Memory: ✅ 60.411MB (SLO: <61.000MB 🟡 -1.0%) vs baseline: +4.8% ✅ resource-renamingTime: ✅ 19.592ms (SLO: <21.750ms -9.9%) vs baseline: ~same Memory: ✅ 68.882MB (SLO: <73.500MB -6.3%) vs baseline: +4.8% ✅ span-code-originTime: ✅ 20.152ms (SLO: <28.200ms 📉 -28.5%) vs baseline: +0.7% Memory: ✅ 68.754MB (SLO: <75.000MB -8.3%) vs baseline: +4.9% ✅ tracerTime: ✅ 19.656ms (SLO: <21.750ms -9.6%) vs baseline: -0.3% Memory: ✅ 68.925MB (SLO: <75.000MB -8.1%) vs baseline: +5.0% ✅ tracer-and-profilerTime: ✅ 20.964ms (SLO: <23.500ms 📉 -10.8%) vs baseline: ~same Memory: ✅ 70.814MB (SLO: <75.000MB -5.6%) vs baseline: +4.8% ✅ tracer-dont-create-db-spansTime: ✅ 19.705ms (SLO: <21.500ms -8.3%) vs baseline: +0.2% Memory: ✅ 68.940MB (SLO: <75.000MB -8.1%) vs baseline: +4.8% ✅ tracer-minimalTime: ✅ 16.803ms (SLO: <17.500ms -4.0%) vs baseline: ~same Memory: ✅ 68.802MB (SLO: <75.000MB -8.3%) vs baseline: +4.8% ✅ tracer-nativeTime: ✅ 19.728ms (SLO: <21.750ms -9.3%) vs baseline: +0.8% Memory: ✅ 68.918MB (SLO: <72.500MB -4.9%) vs baseline: +4.9% ✅ tracer-no-cachesTime: ✅ 17.579ms (SLO: <19.650ms 📉 -10.5%) vs baseline: -0.2% Memory: ✅ 68.877MB (SLO: <75.000MB -8.2%) vs baseline: +4.9% ✅ tracer-no-databasesTime: ✅ 19.272ms (SLO: <20.100ms -4.1%) vs baseline: +0.6% Memory: ✅ 68.895MB (SLO: <75.000MB -8.1%) vs baseline: +4.7% ✅ tracer-no-middlewareTime: ✅ 19.294ms (SLO: <21.500ms 📉 -10.3%) vs baseline: -0.3% Memory: ✅ 68.917MB (SLO: <75.000MB -8.1%) vs baseline: +4.9% ✅ tracer-no-templatesTime: ✅ 19.715ms (SLO: <22.000ms 📉 -10.4%) vs baseline: +1.2% Memory: ✅ 68.884MB (SLO: <73.500MB -6.3%) vs baseline: +4.9% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 3.360ms (SLO: <4.750ms 📉 -29.3%) vs baseline: -0.3% Memory: ✅ 56.072MB (SLO: <66.500MB 📉 -15.7%) vs baseline: +4.9% ✅ appsec-postTime: ✅ 2.868ms (SLO: <6.750ms 📉 -57.5%) vs baseline: +0.1% Memory: ✅ 56.118MB (SLO: <66.500MB 📉 -15.6%) vs baseline: +4.9% ✅ appsec-telemetryTime: ✅ 3.383ms (SLO: <4.750ms 📉 -28.8%) vs baseline: +1.0% Memory: ✅ 56.200MB (SLO: <66.500MB 📉 -15.5%) vs baseline: +5.1% ✅ debuggerTime: ✅ 1.877ms (SLO: <2.000ms -6.2%) vs baseline: +0.2% Memory: ✅ 49.219MB (SLO: <51.500MB -4.4%) vs baseline: +4.9% ✅ iast-getTime: ✅ 1.865ms (SLO: <2.000ms -6.7%) vs baseline: -0.2% Memory: ✅ 45.919MB (SLO: <49.000MB -6.3%) vs baseline: +5.0% ✅ profilerTime: ✅ 1.915ms (SLO: <2.100ms -8.8%) vs baseline: ~same Memory: ✅ 52.508MB (SLO: <53.500MB 🟡 -1.9%) vs baseline: +4.7% ✅ resource-renamingTime: ✅ 3.335ms (SLO: <3.650ms -8.6%) vs baseline: +0.4% Memory: ✅ 56.221MB (SLO: <60.000MB -6.3%) vs baseline: +4.8% ✅ tracerTime: ✅ 3.348ms (SLO: <3.650ms -8.3%) vs baseline: +0.2% Memory: ✅ 56.163MB (SLO: <60.000MB -6.4%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 3.344ms (SLO: <3.650ms -8.4%) vs baseline: +0.2% Memory: ✅ 56.092MB (SLO: <60.000MB -6.5%) vs baseline: +4.8% 🟡 recursivecomputation - 8/8✅ deepTime: ✅ 310.849ms (SLO: <320.950ms -3.1%) vs baseline: +0.1% Memory: ✅ 37.513MB (SLO: <38.750MB -3.2%) vs baseline: +4.8% ✅ deep-profiledTime: ✅ 327.649ms (SLO: <359.150ms -8.8%) vs baseline: ~same Memory: ✅ 43.863MB (SLO: <46.000MB -4.6%) vs baseline: +5.2% ✅ mediumTime: ✅ 7.277ms (SLO: <7.400ms 🟡 -1.7%) vs baseline: -0.3% Memory: ✅ 36.569MB (SLO: <38.000MB -3.8%) vs baseline: +4.5% ✅ shallowTime: ✅ 1.016ms (SLO: <1.050ms -3.3%) vs baseline: +1.7% Memory: ✅ 36.628MB (SLO: <38.000MB -3.6%) vs baseline: +5.0%
|
691ac6e to
4fd8cd0
Compare
Codeowners resolved as |
4fd8cd0 to
2368fb5
Compare
Co-authored-by: Munir Abdinur <[email protected]>
Co-authored-by: Munir Abdinur <[email protected]>
…start-closed-on-root
## Summary Implements the [Stable Service Instance Identifier RFC](https://docs.google.com/document/d/1ECKj9_NnwaKYtFqm3p3Rlpicx5d-OQcdj9kI2jvRqVU) for Go instrumentation telemetry. - **`DD-Session-ID`**: always present on every telemetry request, set to the current `runtime_id` - **`DD-Root-Session-ID`**: present only in child processes, inherited via `_DD_ROOT_GO_SESSION_ID` env var. Omitted when equal to session ID — backend infers root = self when absent - **Auto-propagation**: `globalconfig.init()` sets `_DD_ROOT_GO_SESSION_ID` in `os.Environ()` so child processes spawned via `os/exec` inherit it automatically without any user-side calls ## Changes - `internal/globalconfig/globalconfig.go`: adds `rootSessionID` field, `init()` reads/sets `_DD_ROOT_GO_SESSION_ID` (internal env var, not in supported_configurations), `RootSessionID()` getter - `internal/telemetry/internal/writer.go`: adds `DD-Session-ID` (always) and `DD-Root-Session-ID` (child processes only) to pre-baked telemetry headers - Tests for both globalconfig (including cross-process propagation) and writer ## Related - System-tests PR: DataDog/system-tests#6510 - Node.js PR: DataDog/dd-trace-js#7821 - dd-trace-py fork tracking: DataDog/dd-trace-py#16839 - dd-trace-py spawn tracking: DataDog/dd-trace-py#16842 Co-authored-by: ayan.khan <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b384eb7bb1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-authored-by: Munir Abdinur <[email protected]>
Description
Extends process lineage tracking to exec-based child processes (
subprocess,multiprocessingspawn). Fork support was added in a previous PR; this covers the remaining spawning mechanism.subprocess.Popen.__init__is now patched unconditionally (independent of ASM) to inject_DD_ROOT_PY_SESSION_IDand_DD_PARENT_PY_SESSION_IDinto the child's environment. The child reads these at module loadtime to seed
get_ancestor_runtime_id()andget_parent_runtime_id(). Disable viaDD_TRACE_SUBPROCESS_ENABLED=false.Key changes:
runtime/__init__.py— adds env var name constants, seeds module state from them at import, exposesget_session_env_vars()subprocess/patch.py— movesPopen.__init__/Popen.waitwrapping before the ASM gate; injects lineage env vars unconditionally in_traced_subprocess_inittelemetry/writer.py— replacesforksafe.is_fork_child()withget_parent_runtime_id() is not None, which now correctly identifies both forked and exec-spawned childrenTesting
test_subprocess_session_lineage_env_vars— parametrized overDD_TRACE_SUBPROCESS_ENABLED=true/false/unset, runs underddtrace-run, spawns addtrace-runchild, and verifies the child'sget_parent_runtime_id()/get_ancestor_runtime_id()match the parent's runtime ID when enabled and areNonewhen disabled.Risks
Popen.__init__is now patched whenever ddtrace is loaded, not only when ASM is enabled. Spawned processes will receive two extra_DD_-prefixed env vars. These are ignored by non-ddtrace processes so the blastradius is minimal.