Add _dd.p.ksr propagated tag for Knuth sampling rate#3701
Conversation
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
✨ Fix all issues with BitsAI or with Cursor
|
The default sampling mechanism (DD_MECHANISM_DEFAULT) is not an agent rate. Per the RFC, _dd.p.ksr should only be set for "Agent Sampling rate or Trace Sampling rules". This prevents _dd.p.ksr from appearing on every span when no agent rates are configured. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3701 +/- ##
==========================================
- Coverage 62.40% 62.35% -0.06%
==========================================
Files 142 142
Lines 13586 13586
Branches 1775 1775
==========================================
- Hits 8479 8471 -8
- Misses 4301 4308 +7
- Partials 806 807 +1 see 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
The inferred proxy sampling rules test uses explicit DD_TRACE_SAMPLING_RULES which triggers rule-based sampling. The ksr tag now appears in span meta. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Two fixes for remaining CI failures after the initial KSR tag implementation: 1. serializer.c: Copy _dd.p.ksr from the root span to the inferred proxy span during serialization (using false to keep it on the root span too), matching the behavior expected by sampling_rules.phpt. 2. SpanChecker.php: Auto-ignore _dd.p.ksr in withExactTags() assertions unless the test explicitly tests for it, consistent with how _dd.p.dm and _dd.p.tid are handled. This fixes ResponseStatusCodeTest and other integration tests that don't test KSR behavior directly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Benchmarks [ tracer ]Benchmark execution time: 2026-03-17 02:32:16 Comparing candidate commit 96e7dc0 in PR branch Found 0 performance improvements and 23 performance regressions! Performance is the same for 170 metrics, 1 unstable metrics. scenario:ComposerTelemetryBench/benchTelemetryParsing-opcache
scenario:ContextPropagationBench/benchInject64Bit
scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache
scenario:SamplingRuleMatchingBench/benchGlobMatching1
scenario:SamplingRuleMatchingBench/benchGlobMatching1-opcache
scenario:SamplingRuleMatchingBench/benchGlobMatching2
scenario:SamplingRuleMatchingBench/benchGlobMatching2-opcache
scenario:SamplingRuleMatchingBench/benchGlobMatching3
scenario:SamplingRuleMatchingBench/benchGlobMatching3-opcache
scenario:SamplingRuleMatchingBench/benchGlobMatching4
scenario:SamplingRuleMatchingBench/benchGlobMatching4-opcache
scenario:SamplingRuleMatchingBench/benchRegexMatching1
scenario:SamplingRuleMatchingBench/benchRegexMatching1-opcache
scenario:SamplingRuleMatchingBench/benchRegexMatching2
scenario:SamplingRuleMatchingBench/benchRegexMatching2-opcache
scenario:SamplingRuleMatchingBench/benchRegexMatching3
scenario:SamplingRuleMatchingBench/benchRegexMatching3-opcache
scenario:SamplingRuleMatchingBench/benchRegexMatching4
scenario:SamplingRuleMatchingBench/benchRegexMatching4-opcache
scenario:SpanBench/benchOpenTelemetryAPI
scenario:SpanBench/benchOpenTelemetryAPI-opcache
scenario:SpanBench/benchOpenTelemetryInteroperability
scenario:SpanBench/benchOpenTelemetryInteroperability-opcache
|
In the hot sampling path, avoid re-allocating strings and updating hash tables when _dd.p.ksr already contains the same value. This optimization recovers most of the performance regression seen in SamplingRuleMatchingBench when sampling is re-evaluated with the same rate (the common case for long-running spans). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
| zend_hash_str_del(metrics, ZEND_STRL("_dd.rule_psr")); | ||
| } else { | ||
| zend_hash_str_update(metrics, ZEND_STRL("_dd.rule_psr"), &sample_rate_zv); | ||
| dd_update_knuth_sampling_rate_tag(span, sample_rate); |
There was a problem hiding this comment.
Is it intentional that the _dd.p.ksr is not deleted where the psr is deleted?
| #include <json/json.h> | ||
|
|
||
| #include "../configuration.h" | ||
| #include "../tracer_tag_propagation/tracer_tag_propagation.h" |
There was a problem hiding this comment.
Redundant header inclusion.
- Remove redundant tracer_tag_propagation.h include (already available via span.h) - Delete _dd.p.ksr from meta in manual mechanism path to prevent stale tags - Add static to KNUTH_FACTOR and MAX_TRACE_ID constants (file-local only) - Use --EXPECT-- instead of --EXPECTREGEX-- in test 028 (deterministic output) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Delete the ksr tag in the propagated-mode early return and the DD_MECHANISM_DEFAULT branch, not just DD_MECHANISM_MANUAL. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
bwoebi
left a comment
There was a problem hiding this comment.
Looks good now, Brian, thanks!
# What does this PR do? Adds `_dd.p.ksr` (Knuth Sampling Rate) as a propagated tag set when agent-based or rule-based sampling decisions are made. The tag is stored in span `meta` (string type) with up to 6 significant digits and no trailing zeros. `format_sampling_rate` now returns `Option<String>` and guards against invalid inputs (negative, >1.0, NaN, infinity), returning `None` instead of producing garbage output. # Motivation To enable consistent sampling across tracers and backend retention filters, the backend needs to know the sampling rate applied by the tracer. Without transmitting the tracer's rate via `_dd.p.ksr`, backend resampling cannot correctly compute effective rates in multi-stage sampling scenarios. See RFC: "Transmit Knuth sampling rate to backend" # Additional Notes Key files changed: - `datadog-opentelemetry/src/core/constants.rs` — Added `SAMPLING_KNUTH_RATE_TAG_KEY` constant - `datadog-opentelemetry/src/sampling/datadog_sampler.rs` — Added `format_sampling_rate()` helper (returns `Option<String>`, defensive against invalid rates) and set ksr in agent/rule sampling paths - Updated 2 snapshot JSON files Related PRs across tracers: - Java: DataDog/dd-trace-java#10802 - .NET: DataDog/dd-trace-dotnet#8287 - Ruby: DataDog/dd-trace-rb#5436 - Node.js: DataDog/dd-trace-js#7741 - PHP: DataDog/dd-trace-php#3701 - C++: DataDog/dd-trace-cpp#288 - System tests: DataDog/system-tests#6466 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <[email protected]>
### What does this PR do? Fixes `_dd.p.ksr` (Knuth Sampling Rate) to only be set on spans when the agent has provided sampling rates via `readRatesJSON()`. Previously, ksr was unconditionally set in `prioritySampler.apply()`, including when the rate was the initial client-side default (1.0) before any agent response arrived. Also refactors `prioritySampler` to consolidate lock acquisitions: extracts `getRateLocked()` so `apply()` acquires `ps.mu.RLock` only once to read both the rate and `agentRatesLoaded`. ### Motivation Cross-language consistency: Python, Java, PHP, and other tracers only set ksr when actual agent rates or sampling rules are applied, not for the default fallback. This aligns Go with that behavior. See RFC: "Transmit Knuth sampling rate to backend" ### Additional Notes - Added `agentRatesLoaded` bool field to `prioritySampler`, set to `true` in `readRatesJSON()` - `apply()` now gates ksr behind `agentRatesLoaded` check - Extracted `getRateLocked()` to avoid double lock acquisition in `apply()` - Rule-based sampling path (`applyTraceRuleSampling` in span.go) unchanged — correctly always sets ksr - Tests added: `ksr-not-set-without-agent-rates` and `ksr-set-after-agent-rates-received` Related PRs across tracers: - Java: DataDog/dd-trace-java#10802 - .NET: DataDog/dd-trace-dotnet#8287 - Ruby: DataDog/dd-trace-rb#5436 - Node.js: DataDog/dd-trace-js#7741 - PHP: DataDog/dd-trace-php#3701 - Rust: DataDog/dd-trace-rs#180 - C++: DataDog/dd-trace-cpp#288 - System tests: DataDog/system-tests#6466 ### Reviewer's Checklist - [x] Changed code has unit tests for its functionality at or near 100% coverage. - [x] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag. - [ ] There is a benchmark for any new code, or changes to existing code. - [x] If this interacts with the agent in a new way, a system test has been added. - [x] New code is free of linting errors. You can check this by running `make lint` locally. - [x] New code doesn't break existing tests. You can check this by running `make test` locally. - [ ] Add an appropriate team label so this PR gets put in the right place for the release notes. - [ ] All generated files are up to date. You can check this by running `make generate` locally. - [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally. Unsure? Have a question? Request a review! 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> Co-authored-by: Dario Castañé <[email protected]> Co-authored-by: Mikayla Toffler <[email protected]>
…dd-update * 'master' of github.com:DataDog/dd-trace-php: feat(sidecar): add thread mode as fallback connection for restricted environments (#3573) Migrate deprecated GitLab runner tags (#3715) Adds process tags to remote config payload (#3658) perf(config): cache sys getenv (#3670) Fixes the tag name for process tags (#3709) Fix debugger ephemerals handling (#3685) Fix #3651: Prevent crash during shutdown in Frankenphp (#3662) Add dynamic instrumentation and exception replay to startup logging (#3667) chore: bump bytes crate from 1.9.0 to 1.11.1 to address CVE-2026-25541 (#3669) Merge pull request #3701 from DataDog/brian.marks/add-ksr-tag ci: fix Windows job flakiness caused by dirty workspace (#3694) Fixup CI owner association (#3704) Add Rust rewrite of the AppSec helper alongside the C++ implementation Remove debug instruction Fix script order debug Fix exploration logic chore(ci): add final_status property on junit XML [APMSP-2610] Fix DD_TRACE_SYMFONY_HTTP_ROUTE=false Optimize Symfony http.route caching with path map approach
## Summary of changes Add `_dd.p.ksr` (Knuth Sampling Rate) propagated tag to spans when sampling is applied via agent rates or trace sampling rules, per the [Transmit Knuth Sampling Rate to Backend RFC](https://docs.google.com/document/d/1Po3qtJb6PGheFeKFSUMv2pVY_y-HFAxTzNLuacCbCXY/edit). ## Reason for change The backend needs to know the exact sampling rate applied by the tracer to correctly compute effective rates during resampling (e.g., tracer 0.5 × backend 0.5 = effective 0.25). This tag enables that by propagating the rate via `x-datadog-tags` and W3C `tracestate`. ## Implementation details - Set `_dd.p.ksr` in `TraceContext.SetSamplingPriority()` for `AgentRate`, `LocalTraceSamplingRule`, `RemoteAdaptiveSamplingRule`, and `RemoteUserSamplingRule` mechanisms - Use `TryAddTag` to preserve the original rate (consistent with `AppliedSamplingRate ??= rate` semantics) - Format with `"0.######"` (up to 6 decimal digits, no trailing zeros, no scientific notation) per RFC spec - Added `.IsOptional("_dd.p.ksr")` to `SpanTagAssertion.cs` so integration test tag validators accept the new tag ## Test coverage - Unit tests in `TraceContextTests_KnuthSamplingRate.cs`: - KSR set for agent rate sampling - KSR set for trace sampling rules (local, remote adaptive, remote user) - KSR NOT set for manual, AppSec, rate limiter, or single span mechanisms - KSR preserved on subsequent sampling calls (TryAddTag semantics) - Formatting with up to 6 decimal digits (boundary values including small rates like 0.00001) - System tests in [system-tests #6466](DataDog/system-tests#6466) ## Other details Related PRs across tracers: - Java: DataDog/dd-trace-java#10802 - Ruby: DataDog/dd-trace-rb#5436 - Node.js: DataDog/dd-trace-js#7741 - PHP: DataDog/dd-trace-php#3701 - Rust: DataDog/dd-trace-rs#180 - C++: DataDog/dd-trace-cpp#288 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <[email protected]>
Description
Adds
_dd.p.ksr(Knuth Sampling Rate) as a propagated tag set when agent-based or rule-based sampling decisions are made. The tag is stored in spanmeta(string type) with up to 6 significant digits and no trailing zeros.To enable consistent sampling across tracers and backend retention filters, the backend needs to know the sampling rate applied by the tracer. Without transmitting the tracer's rate via
_dd.p.ksr, backend resampling cannot correctly compute effective rates in multi-stage sampling scenarios.See RFC: "Transmit Knuth sampling rate to backend"
Key files changed:
ext/priority_sampling/priority_sampling.c— Addeddd_update_knuth_sampling_rate_tag()function; deletes_dd.p.ksrin all code paths that don't set it (manual, propagated-mode early return, default mechanism)Related PRs across tracers:
_dd.p.ksrformatting to use 6 significant digits without trailing zeros dd-trace-cpp#288Reviewer checklist
🤖 Generated with Claude Code