Skip to content

feat(telemetry): add stable session identifier headers#4574

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 15 commits intomainfrom
ayan.khan/stable-session-id-headers
Mar 20, 2026
Merged

feat(telemetry): add stable session identifier headers#4574
gh-worker-dd-mergequeue-cf854d[bot] merged 15 commits intomainfrom
ayan.khan/stable-session-id-headers

Conversation

@khanayan123
Copy link
Copy Markdown
Contributor

@khanayan123 khanayan123 commented Mar 20, 2026

Summary

Implements the Stable Service Instance Identifier RFC for Go instrumentation telemetry.

  • DD-Session-ID: always present on every telemetry request, set to the current runtime_id
  • DD-Root-Session-ID: present only in child processes, inherited via _DD_ROOT_GO_SESSION_ID env var. Omitted when equal to session ID — backend infers root = self when absent
  • Auto-propagation: globalconfig.init() sets _DD_ROOT_GO_SESSION_ID in os.Environ() so child processes spawned via os/exec inherit it automatically without any user-side calls

Changes

  • internal/globalconfig/globalconfig.go: adds rootSessionID field, init() reads/sets _DD_ROOT_GO_SESSION_ID (internal env var, not in supported_configurations), RootSessionID() getter
  • internal/telemetry/internal/writer.go: adds DD-Session-ID (always) and DD-Root-Session-ID (child processes only) to pre-baked telemetry headers
  • Tests for both globalconfig (including cross-process propagation) and writer

Related

khanayan123 and others added 2 commits March 20, 2026 10:24
Implements the Stable Service Instance Identifier RFC for Go telemetry.

- DD-Session-ID: always present, set to current runtime_id
- DD-Root-Session-ID: present only in child processes, inherited via
  DD_ROOT_GO_SESSION_ID env var (omitted when equal to session_id)
- globalconfig: reads DD_ROOT_GO_SESSION_ID and DD_PARENT_GO_SESSION_ID
  at init, falls back to runtime_id for root processes

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Only root and session IDs are required. Parent session ID
is optional and not needed for the initial implementation.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 bot commented Mar 20, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 90.48%
Overall Coverage: 59.44% (+3.57%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 33a27d8 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

khanayan123 and others added 2 commits March 20, 2026 10:30
Re-execs the test binary as a subprocess with DD_ROOT_GO_SESSION_ID
set to verify the child process correctly inherits the root session
ID and generates its own distinct runtime_id.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Provides InjectSessionEnv(cmd) to inject DD_ROOT_GO_SESSION_ID into
child process environments. Required by the Stable Session Identifier
RFC — Go uses os/exec exclusively (raw fork is unsafe).

Users call InjectSessionEnv before cmd.Start()/cmd.Run(). Automatic
injection via Orchestrion aspect is a planned follow-up.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.31%. Comparing base (516cca3) to head (33a27d8).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
internal/telemetry/internal/writer.go 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
internal/globalconfig/globalconfig.go 47.94% <100.00%> (ø)
internal/telemetry/internal/writer.go 76.87% <71.42%> (ø)

... and 436 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Fix import ordering (goimports) and use strings.SplitSeq (modernize linter).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@github-actions github-actions bot added the apm:ecosystem contrib/* related feature requests or bugs label Mar 20, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 20, 2026

Benchmarks

Benchmark execution time: 2026-03-20 18:41:34

Comparing candidate commit 33a27d8 in PR branch ayan.khan/stable-session-id-headers with baseline commit 516cca3 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 214 metrics, 9 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:BenchmarkPayloadVersions/simple_1spans/v1_0-25

  • 🟥 execution_time [+30.459ns; +40.741ns] or [+2.761%; +3.693%]

khanayan123 and others added 4 commits March 20, 2026 11:47
Remove InjectSessionEnv (contrib/os/exec.go) — the SDK now sets
DD_ROOT_GO_SESSION_ID in the process environment at init time so child
processes spawned via os/exec inherit it automatically without any
user-side calls. Also remove from supported_configurations since it is
an internal env var, not user-facing config.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
This is an internal propagation env var, not user-facing config, so
it uses os.Getenv/Setenv directly instead of internal/env.Get.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Prefix with underscore to signify internal env var. Remove redundant
comments from tests — the test names are self-documenting.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@khanayan123 khanayan123 marked this pull request as ready for review March 20, 2026 16:07
@khanayan123 khanayan123 requested a review from a team as a code owner March 20, 2026 16:07
@khanayan123 khanayan123 requested a review from mabdinur March 20, 2026 16:08
Verify DD-Session-ID equals RuntimeID (not just non-empty), and assert
DD-Root-Session-ID is absent when rootSessionID == runtimeID or present
with the correct value when inherited from a parent process.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown
Contributor

@mtoffl01 mtoffl01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, globalconfig is a legacy package that will be replaced by the new ddtrace/internal/config package. But, you cannot import that package into the telemetry package without hitting a circular import right now. So for now, adding a new field to globalconfig is fine.

rootID = cfg.runtimeID
}
cfg.rootSessionID = rootID
os.Setenv("_DD_ROOT_GO_SESSION_ID", rootID) //nolint:forbidigo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a pattern in dd-trace-py, and maybe dd-trace-js, but there is no precedence for it in dd-trace-go.

Why do you need to overwrite the env var directly? Couldn't anywhere that wants to query "_DD_ROOT_GO_SESSION_ID" just query globalconfig.RootSessionId()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The os.Setenv is needed for child process propagation. globalconfig.RootSessionID() only works within the current process — child processes spawned via os/exec are separate binaries that can't call Go functions from the parent.

By setting _DD_ROOT_GO_SESSION_ID in the process environment at init time, any child spawned with default env inheritance (cmd.Env == nil) automatically receives it. When the child starts dd-trace-go, newConfig() reads it from the environment and knows it's part of the same session tree.

This is the core propagation mechanism described in the RFC — it's how Node.js, Python, and Ruby SDKs handle it too (each via their language's equivalent of env var injection into child processes). Without this, every user spawning child processes would need to manually set the env var, which defeats the purpose.

Happy to discuss if there's a Go-idiomatic alternative that achieves the same automatic propagation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Let's just put it behind a constant (rootSessionIdEnvVar or similar)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — extracted it into a rootSessionIDEnvVar constant at the top of the file. Both the production code and tests now reference the constant instead of the raw string.

headersAsTags: internal.NewLockMap(map[string]string{}),
}

func init() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init() is very unpopular for go. Can you move this over to some helper function, and call it to set the rootSessionID field on the cfg variable?

e.g, line 21+ becomes:

var cfg = &config{
	analyticsRate: math.NaN(),
	runtimeID:     uuid.New().String(),
	headersAsTags: internal.NewLockMap(map[string]string{}),
	rootSessionID: getRootSessionID()
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — replaced init() with a newConfig() helper that's called inline in the var cfg declaration.

Address review feedback: move rootSessionID initialization from init()
into a newConfig() helper called in the var declaration. Keeps os.Setenv
for child process propagation (see PR comment for rationale).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@khanayan123 khanayan123 requested a review from mtoffl01 March 20, 2026 17:45
Comment on lines +27 to +41
func newConfig() *config {
runtimeID := uuid.New().String()
rootSessionID := env.Get(rootSessionIDEnvVar)
if rootSessionID == "" {
rootSessionID = runtimeID
}
// Set in the process environment so child processes spawned via os/exec
// with default env inheritance (cmd.Env == nil) automatically receive it.
os.Setenv(rootSessionIDEnvVar, rootSessionID) //nolint:forbidigo
return &config{
analyticsRate: math.NaN(),
runtimeID: runtimeID,
rootSessionID: rootSessionID,
headersAsTags: internal.NewLockMap(map[string]string{}),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly 😆

newConfig() is fine but was not required by my previous comment. I meant to abstract away the rootSessionID logic into its own helper, like getRootSessionID.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — extracted the session ID logic into a getRootSessionID(runtimeID) helper. It takes runtimeID as a param since inlining it directly into the struct literal would cause an init cycle (cfggetRootSessionIDcfg). So newConfig() stays as the thin wrapper that generates the runtimeID and passes it through.

@khanayan123 khanayan123 requested a review from mtoffl01 March 20, 2026 18:08
}
// Set in the process environment so child processes spawned via os/exec
// with default env inheritance (cmd.Env == nil) automatically receive it.
os.Setenv(rootSessionIDEnvVar, id) //nolint:forbidigo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a rule forbidding os.Setenv, only os.Getenv. Is this comment necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — only os.Getenv and os.LookupEnv are in the forbidigo rules, not os.Setenv. Removed the nolint directive.

Only os.Getenv and os.LookupEnv are forbidden by the linter.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@khanayan123 khanayan123 requested a review from mtoffl01 March 20, 2026 18:14
Copy link
Copy Markdown
Contributor

@mtoffl01 mtoffl01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking you with an approval, but overall I think the comments are noisy and some are unnecessary; I'd suggest cutting them down.

@khanayan123
Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 bot commented Mar 20, 2026

View all feedbacks in Devflow UI.

2026-03-20 18:23:53 UTC ℹ️ Start processing command /merge


2026-03-20 18:24:01 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-03-20 18:44:14 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 30m (p90).


2026-03-20 18:55:33 UTC ℹ️ MergeQueue: This merge request was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

apm:ecosystem contrib/* related feature requests or bugs mergequeue-status: done

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants