Skip to content

Add _dd.p.ksr propagated tag for Knuth sampling rate#5436

Merged
bm1549 merged 3 commits intomasterfrom
brian.marks/add-ksr-tag
Mar 18, 2026
Merged

Add _dd.p.ksr propagated tag for Knuth sampling rate#5436
bm1549 merged 3 commits intomasterfrom
brian.marks/add-ksr-tag

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented Mar 11, 2026

What does this PR do?

Adds _dd.p.ksr (Knuth Sampling Rate) as a propagated tag set when agent-based or rule-based sampling decisions are made. The tag is stored in span meta (string type) with up to 6 significant digits and no trailing zeros.

Motivation:

To enable consistent sampling across tracers and backend retention filters, the backend needs to know the sampling rate applied by the tracer. Without transmitting the tracer's rate via _dd.p.ksr, backend resampling cannot correctly compute effective rates in multi-stage sampling scenarios.

See RFC: "Transmit Knuth sampling rate to backend"

Change log entry

Add _dd.p.ksr propagated tag for Knuth sampling rate to enable backend resampling.

Additional Notes:

Key files changed:

  • lib/datadog/tracing/metadata/ext.rb — Added TAG_KNUTH_SAMPLING_RATE constant
  • lib/datadog/tracing/transport/trace_formatter.rb — Added tag_knuth_sampling_rate! method
  • sig/datadog/tracing/metadata/ext.rbs — RBS type signature

Related PRs across tracers:

How to test the change?

  • Unit tests: 74 examples, 0 failures in spec/datadog/tracing/transport/trace_formatter_spec.rb
  • System test: test_sampling_knuth_sample_rate_trace_sampling_rule passes in CI

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 11, 2026

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

Visited at: 2026-03-11 02:15:18 UTC

@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 bot commented Mar 11, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 95.13% (-0.02%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: f5a2eda | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 11, 2026

Benchmarks

Benchmark execution time: 2026-03-17 21:32:28

Comparing candidate commit f5a2eda in PR branch brian.marks/add-ksr-tag with baseline commit b42addf in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@bm1549 bm1549 marked this pull request as ready for review March 17, 2026 01:28
@bm1549 bm1549 requested review from a team as code owners March 17, 2026 01:28
@bm1549 bm1549 requested a review from vpellan March 17, 2026 01:28
Copy link
Copy Markdown
Member

@Strech Strech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one minor suggestion (non-blocking)

Address review feedback: use a fixed rule_sample_rate and hardcode
the expected string instead of computing it with format(), so the
test actually validates the formatting behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@bm1549
Copy link
Copy Markdown
Contributor Author

bm1549 commented Mar 17, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 bot commented Mar 17, 2026

View all feedbacks in Devflow UI.

2026-03-17 15:01:14 UTC ℹ️ Start processing command /merge


2026-03-17 15:01:25 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-03-17 19:03:16 UTC ⚠️ MergeQueue: This merge request was unqueued

devflow unqueued this merge request: It did not become mergeable within the expected time

@bm1549 bm1549 merged commit a49984a into master Mar 18, 2026
630 of 631 checks passed
@bm1549 bm1549 deleted the brian.marks/add-ksr-tag branch March 18, 2026 14:36
@github-actions github-actions bot added this to the 2.30.0 milestone Mar 18, 2026
@Strech Strech mentioned this pull request Mar 19, 2026
bm1549 added a commit to DataDog/dd-trace-rs that referenced this pull request Mar 19, 2026
# What does this PR do?

Adds `_dd.p.ksr` (Knuth Sampling Rate) as a propagated tag set when
agent-based or rule-based sampling decisions are made. The tag is stored
in span `meta` (string type) with up to 6 significant digits and no
trailing zeros.

`format_sampling_rate` now returns `Option<String>` and guards against
invalid inputs (negative, >1.0, NaN, infinity), returning `None` instead
of producing garbage output.

# Motivation

To enable consistent sampling across tracers and backend retention
filters, the backend needs to know the sampling rate applied by the
tracer. Without transmitting the tracer's rate via `_dd.p.ksr`, backend
resampling cannot correctly compute effective rates in multi-stage
sampling scenarios.

See RFC: "Transmit Knuth sampling rate to backend"

# Additional Notes

Key files changed:
- `datadog-opentelemetry/src/core/constants.rs` — Added
`SAMPLING_KNUTH_RATE_TAG_KEY` constant
- `datadog-opentelemetry/src/sampling/datadog_sampler.rs` — Added
`format_sampling_rate()` helper (returns `Option<String>`, defensive
against invalid rates) and set ksr in agent/rule sampling paths
- Updated 2 snapshot JSON files

Related PRs across tracers:
- Java: DataDog/dd-trace-java#10802
- .NET: DataDog/dd-trace-dotnet#8287
- Ruby: DataDog/dd-trace-rb#5436
- Node.js: DataDog/dd-trace-js#7741
- PHP: DataDog/dd-trace-php#3701
- C++: DataDog/dd-trace-cpp#288
- System tests: DataDog/system-tests#6466

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
bm1549 added a commit to DataDog/dd-trace-go that referenced this pull request Mar 19, 2026
### What does this PR do?

Fixes `_dd.p.ksr` (Knuth Sampling Rate) to only be set on spans when the
agent has provided sampling rates via `readRatesJSON()`. Previously, ksr
was unconditionally set in `prioritySampler.apply()`, including when the
rate was the initial client-side default (1.0) before any agent response
arrived.

Also refactors `prioritySampler` to consolidate lock acquisitions:
extracts `getRateLocked()` so `apply()` acquires `ps.mu.RLock` only once
to read both the rate and `agentRatesLoaded`.

### Motivation

Cross-language consistency: Python, Java, PHP, and other tracers only
set ksr when actual agent rates or sampling rules are applied, not for
the default fallback. This aligns Go with that behavior.

See RFC: "Transmit Knuth sampling rate to backend"

### Additional Notes

- Added `agentRatesLoaded` bool field to `prioritySampler`, set to
`true` in `readRatesJSON()`
- `apply()` now gates ksr behind `agentRatesLoaded` check
- Extracted `getRateLocked()` to avoid double lock acquisition in
`apply()`
- Rule-based sampling path (`applyTraceRuleSampling` in span.go)
unchanged — correctly always sets ksr
- Tests added: `ksr-not-set-without-agent-rates` and
`ksr-set-after-agent-rates-received`

Related PRs across tracers:
- Java: DataDog/dd-trace-java#10802
- .NET: DataDog/dd-trace-dotnet#8287
- Ruby: DataDog/dd-trace-rb#5436
- Node.js: DataDog/dd-trace-js#7741
- PHP: DataDog/dd-trace-php#3701
- Rust: DataDog/dd-trace-rs#180
- C++: DataDog/dd-trace-cpp#288
- System tests: DataDog/system-tests#6466

### Reviewer's Checklist

- [x] Changed code has unit tests for its functionality at or near 100%
coverage.
- [x] [System-Tests](https://github.com/DataDog/system-tests/) covering
this feature have been added and enabled with the va.b.c-dev version
tag.
- [ ] There is a benchmark for any new code, or changes to existing
code.
- [x] If this interacts with the agent in a new way, a system test has
been added.
- [x] New code is free of linting errors. You can check this by running
`make lint` locally.
- [x] New code doesn't break existing tests. You can check this by
running `make test` locally.
- [ ] Add an appropriate team label so this PR gets put in the right
place for the release notes.
- [ ] All generated files are up to date. You can check this by running
`make generate` locally.
- [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed
by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to
date by running `make fix-modules` locally.

Unsure? Have a question? Request a review!

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Co-authored-by: Dario Castañé <[email protected]>
Co-authored-by: Mikayla Toffler <[email protected]>
bm1549 added a commit to DataDog/dd-trace-dotnet that referenced this pull request Mar 25, 2026
## Summary of changes

Add `_dd.p.ksr` (Knuth Sampling Rate) propagated tag to spans when
sampling is applied via agent rates or trace sampling rules, per the
[Transmit Knuth Sampling Rate to Backend
RFC](https://docs.google.com/document/d/1Po3qtJb6PGheFeKFSUMv2pVY_y-HFAxTzNLuacCbCXY/edit).

## Reason for change

The backend needs to know the exact sampling rate applied by the tracer
to correctly compute effective rates during resampling (e.g., tracer 0.5
× backend 0.5 = effective 0.25). This tag enables that by propagating
the rate via `x-datadog-tags` and W3C `tracestate`.

## Implementation details

- Set `_dd.p.ksr` in `TraceContext.SetSamplingPriority()` for
`AgentRate`, `LocalTraceSamplingRule`, `RemoteAdaptiveSamplingRule`, and
`RemoteUserSamplingRule` mechanisms
- Use `TryAddTag` to preserve the original rate (consistent with
`AppliedSamplingRate ??= rate` semantics)
- Format with `"0.######"` (up to 6 decimal digits, no trailing zeros,
no scientific notation) per RFC spec
- Added `.IsOptional("_dd.p.ksr")` to `SpanTagAssertion.cs` so
integration test tag validators accept the new tag

## Test coverage

- Unit tests in `TraceContextTests_KnuthSamplingRate.cs`:
  - KSR set for agent rate sampling
- KSR set for trace sampling rules (local, remote adaptive, remote user)
- KSR NOT set for manual, AppSec, rate limiter, or single span
mechanisms
  - KSR preserved on subsequent sampling calls (TryAddTag semantics)
- Formatting with up to 6 decimal digits (boundary values including
small rates like 0.00001)
- System tests in [system-tests
#6466](DataDog/system-tests#6466)

## Other details

Related PRs across tracers:
- Java: DataDog/dd-trace-java#10802
- Ruby: DataDog/dd-trace-rb#5436
- Node.js: DataDog/dd-trace-js#7741
- PHP: DataDog/dd-trace-php#3701
- Rust: DataDog/dd-trace-rs#180
- C++: DataDog/dd-trace-cpp#288

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos mergequeue-status: removed tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants