Probability Samplers based on W3C Trace Context Level 2#3910
Probability Samplers based on W3C Trace Context Level 2#3910jmacd wants to merge 6 commits intoopen-telemetry:mainfrom
Conversation
|
@kentquirk @oertl @PeterF778 @kalyanaj Please take a look. |
|
@kentquirk @oertl @PeterF778 @kalyanaj |
| `traceparent` and `tracestate` contexts fields specified in [W3C Trace | ||
| Context Level 2](https://www.w3.org/TR/trace-context-2/). | ||
|
|
||
| When injecting and extracting trace context to ro from a carrier, the |
There was a problem hiding this comment.
| When injecting and extracting trace context to ro from a carrier, the | |
| When injecting and extracting trace context to or from a carrier, the |
| #### TraceIdRatioBased | ||
|
|
||
| * The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the | ||
| The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the |
There was a problem hiding this comment.
| The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the | |
| The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`. To respect the |
| * A rejection threshold is calculated, expressing as an integer how | ||
| many out of 2**56 trace IDs should be sampled. | ||
| * The threshold is encoded as a "T-value", expressing the threshold | ||
| using up to 5 hexadecimal digits of precision. |
There was a problem hiding this comment.
| using up to 5 hexadecimal digits of precision. | |
| using up to 14 hexadecimal digits of precision. |
| Samplers SHOULD unset `p` when the invariant between the `sampled`, | ||
| `p`, and `r` values is violated before using the `tracestate` to make | ||
| a sampling decision or interpret adjusted count. | ||
| Samplers SHOULD unset T-value (by erasing the `tv` field) when the |
There was a problem hiding this comment.
| Samplers SHOULD unset T-value (by erasing the `tv` field) when the | |
| Samplers SHOULD unset T-value (by erasing the `th` field) when the |
| ``` | ||
| base10(r) = 3 | ||
| base10(p) = 2 | ||
| tracestate: ot=tv:c |
There was a problem hiding this comment.
| tracestate: ot=tv:c | |
| tracestate: ot=th:c |
| return a string of the form `"TraceIdRatioBased{RATIO;tv:TVALUE}"` | ||
| with `RATIO` the configured probability and `TVALUE` replaced by the |
There was a problem hiding this comment.
| return a string of the form `"TraceIdRatioBased{RATIO;tv:TVALUE}"` | |
| with `RATIO` the configured probability and `TVALUE` replaced by the | |
| return a string of the form `"TraceIdRatioBased{RATIO;th:THRESHOLD}"` | |
| with `RATIO` the configured probability and `THRESHOLD` replaced by the |
| Here, R-value ce929d0e0e4736 indicates that a consistent probability | ||
| sampler configured with probability greater than approximately 19.3%, |
There was a problem hiding this comment.
| Here, R-value ce929d0e0e4736 indicates that a consistent probability | |
| sampler configured with probability greater than approximately 19.3%, | |
| The set sampled flag together with the R-value being ce929d0e0e4736 indicate that the parent used a consistent probability sampler configured with probability greater than approximately 19.3%, |
| T-value is not set, consistent with an unsampled context. In this | ||
| case, the parent context could have been sampled at less than 19.3% | ||
| probability or used the `AlwaysOff` sampler. |
There was a problem hiding this comment.
In this case we know that the parent was not consistently sampled. If any other sampling mechanism is used, the probability could be any value. Mentioning 19.3% here, is therefore a little bit confusing.
| To combine a consistent probability Sampler decision with a | ||
| non-probability Sampler decision, T-value should be set according to | ||
| the probability sampler(s). If no probability sampler decides to | ||
| sample, no T-value should be set. |
There was a problem hiding this comment.
| To combine a consistent probability Sampler decision with a | |
| non-probability Sampler decision, T-value should be set according to | |
| the probability sampler(s). If no probability sampler decides to | |
| sample, no T-value should be set. | |
| If the sampling decision is the result of multiple samplers that are not all consistent samplers, no T-value should be set. |
| The `TraceIdRatioBased` and `ParentBased` samplers can be used as | ||
| delegates of another sampler, for conditioning the choice of sampler | ||
| on span and other fixed attributes. However, for adjusted counts to | ||
| be trustworthy, the choice of non-root sampler cannot be conditioned | ||
| on the parent's sampled trace flag or the OpenTelemetry tracestate | ||
| R-value or T-value, as these decisions would lead to incorrect |
There was a problem hiding this comment.
| The `TraceIdRatioBased` and `ParentBased` samplers can be used as | |
| delegates of another sampler, for conditioning the choice of sampler | |
| on span and other fixed attributes. However, for adjusted counts to | |
| be trustworthy, the choice of non-root sampler cannot be conditioned | |
| on the parent's sampled trace flag or the OpenTelemetry tracestate | |
| R-value or T-value, as these decisions would lead to incorrect | |
| The `TraceIdRatioBased` and `ParentBased` samplers can be used as | |
| delegates of another sampler, for conditioning the choice of sampler | |
| on span and other fixed attributes. However, for adjusted counts to | |
| be trustworthy, the choice of non-root sampler cannot be conditioned | |
| on the parent's sampled trace flag or the OpenTelemetry tracestate | |
| R-value, as these decisions would lead to incorrect |
Choosing the T-value based on the parent T-value is fine. This is exactly what is done for parent-based sampling.
| - TraceFlags (8 bits) | ||
| - TraceState (string) | ||
|
|
||
| Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero. |
There was a problem hiding this comment.
For clarity, let's say "bits 2-7 of TraceFlags".
|
|
||
| * Ratio values are restricted to the range `2**-56` through 1 | ||
| * A rejection threshold is calculated, expressing as an integer how | ||
| many out of 2**56 trace IDs should be sampled. |
| and r-value are independent settings, each can be meaningfully set | ||
| without the other present. | ||
| The [Sampler API](sdk.md#sampler) is responsible for setting the | ||
| `sampled` and `random` flags and the `tracestate`. |
There was a problem hiding this comment.
This is problematic. A sampler should not change the random flag at all. This flag must be set when the TraceID is created which is ahead of making the sampling decision for the root span.
| bits. | ||
| Head Samplers (i.e., trace SDKs) SHOULD by default generate the 56 | ||
| random bits specified for TraceIDs by the W3C Trace Context Level 2, | ||
| and set the Random trace flag, when generating new contexts. |
There was a problem hiding this comment.
According to the SDK specification, the TraceID is already generated even when sampling ROOT spans. This needs more discussions.
kentquirk
left a comment
There was a problem hiding this comment.
I'm a fan of this. I think you've done a good job here and based on what I see, I don't think we need to be rewriting it from scratch. Our past history leads me to believe that might be a long process. Let's push forward!
| ### W3C Trace Context Requirements | ||
|
|
||
| A W3C Trace Context propagator is expected to implement the | ||
| `traceparent` and `tracestate` contexts fields specified in [W3C Trace |
There was a problem hiding this comment.
| `traceparent` and `tracestate` contexts fields specified in [W3C Trace | |
| `traceparent` and `tracestate` context fields specified in [W3C Trace |
| - TraceFlags (8 bits) | ||
| - TraceState (string) | ||
|
|
||
| Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero. |
There was a problem hiding this comment.
| Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero. | |
| Propagators MUST NOT assume that bits 2-7 of TraceFlags (the 6 most significant bits) will be zero. |
| - [Sampled](https://www.w3.org/TR/trace-context/#sampled-flag) | ||
| - [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag) |
There was a problem hiding this comment.
| - [Sampled](https://www.w3.org/TR/trace-context/#sampled-flag) | |
| - [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag) | |
| - [Sampled (value 0x1)](https://www.w3.org/TR/trace-context/#sampled-flag) | |
| - [Random (value 0x2)](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag) |
I think it would help to know which bits we're talking about.
| * A rejection threshold is calculated, expressing as an integer how | ||
| many out of 2**56 trace IDs should be sampled. |
There was a problem hiding this comment.
| * A rejection threshold is calculated, expressing as an integer how | |
| many out of 2**56 trace IDs should be sampled. | |
| * A rejection threshold is calculated, expressing as an integer the number | |
| out of 2**56 trace IDs that should be dropped. |
| This states that the SDK should fill least significant 7 bytes (i.e., 56 | ||
| bits) of the TraceID are genuinely random or pseudo-random., so they | ||
| can be used for probability sampling. |
There was a problem hiding this comment.
| This states that the SDK should fill least significant 7 bytes (i.e., 56 | |
| bits) of the TraceID are genuinely random or pseudo-random., so they | |
| can be used for probability sampling. | |
| This states that the SDK should fill the least significant 7 bytes (i.e., 56 | |
| bits) of the TraceID with bits that are genuinely random or pseudo-random, | |
| so that they can be used for probability sampling. |
| decimal value is greater than 63 before using the `tracestate` to make | ||
| a sampling decision or interpret adjusted count. | ||
| These points, taken together, means there will be a long delay before | ||
| the Random trace flag functions as intended in ther OpenTelemetry |
There was a problem hiding this comment.
| the Random trace flag functions as intended in ther OpenTelemetry | |
| the Random trace flag functions as intended in the OpenTelemetry |
|
|
||
| ##### Example: Probability sampled context | ||
| Tail Samplers SHOULD ignore the W3C Trace Context Level 2 Random flag | ||
| when interpreting span data. This is a compromise |
There was a problem hiding this comment.
| when interpreting span data. This is a compromise | |
| when interpreting span data. This is a compromise. |
| when interpreting span data. This is a compromise | ||
|
|
||
| Consider a trace context with the following headers: | ||
| When there is an explicit R-value set, the Sampler will anyway |
There was a problem hiding this comment.
| When there is an explicit R-value set, the Sampler will anyway | |
| When there is an explicit R-value set, the Sampler will |
| * Sampler decisions are made by comparing the Trace ID randomness | ||
| against the rejection threshold. | ||
| * When Sampled, T-value is included in the [OpenTelemetry TraceState | ||
| header](./tracestate-handling.md), identified by sub-key `th`, |
There was a problem hiding this comment.
Should we update the tracestate-handling.md to include the examples for the sub-key 'th'? Right now, it seems to be referring to the earlier p values and r value examples.
| using up to 5 hexadecimal digits of precision. | ||
| * Sampler decisions are made by comparing the Trace ID randomness | ||
| against the rejection threshold. | ||
| * When Sampled, T-value is included in the [OpenTelemetry TraceState |
There was a problem hiding this comment.
It will be good to rephrase this to an active voice of what implementations must do. For example, something like: "When sampled, an implementation MUST update the tracestate header to specify the T-value. <+ any additional details>"
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Fixes #1413.
Fixes #3602.
Changes
Updates Trace SDK and Propagator specifications with complete support for consistent probability sampling as described in OTEP 235.
CHANGELOG.mdTODOspec-compliance-matrix.mdTODO