Filing this issue per our discussion in the Sampling SIG today.
What are you trying to achieve?
OpenTelemetry supports Span Links that can be used to model asynchronous scenarios or batched operations (fan-out/fan-in). I am looking to achieve some level of consistent (head-based) sampling of all the linked traces. If the sampling decision happens at an individual trace level, customers cannot understand the whole story of what happened to a request.
Example of links usage: One use-case is in a producer - consumer scenario where a producer span (say Trace T1 / Span S1) enqueues a job to a queue; let's say such jobs are processed by a consuming service asynchronously. Since the lifetimes of the producer and consumer are different, the consuming operation is modelled as a separate trace (T2 / S2) that links to T1 / S1 using span-links. If there's a way to do consistent sampling across links, then if T1 was sampled then T2 also should be sampled.
What did you expect to see?
Guidance / samples / out-of-the-box sampler to help achieve the above. For example, something like:
- if you are using parent-based sampling & want to get consistent sampling across links, this is what you need to do.
- if you are using consistent-probability sampling & want to get consistent sampling across links, this is what you need to do.
Additional context.
- One way the above scenario could be achieved is with a custom sampler that checks if any of the linked spans (of the span for which the sampling decision is being made) is sampled, and if so decide to sample this as well. This can work when the source of this link is the root span of a new trace.
- On the other hand, if the source of the link is not the root span, it may need to consider its parent's decision or its links' decision to arrive at its decision. Yes, it will be a partial trace, but having a partial trace here might be better than no trace.
- Need to understand the implications for the adjusted count etc.
- There would be other trade-offs to consider as well: e.g., if a span is sampled because one of its 20 links is sampled, this span could have a higher probability of always being sampled (since its probability of being sampled = prob(link1 being sampled) + P(link2 being sampled) + ... + P(link20 being sampled)) so need to consider if additional probabilistic measures are needed for the link sampling (credit: @pyohannes).
Filing this issue per our discussion in the Sampling SIG today.
What are you trying to achieve?
OpenTelemetry supports Span Links that can be used to model asynchronous scenarios or batched operations (fan-out/fan-in). I am looking to achieve some level of consistent (head-based) sampling of all the linked traces. If the sampling decision happens at an individual trace level, customers cannot understand the whole story of what happened to a request.
Example of links usage: One use-case is in a producer - consumer scenario where a producer span (say Trace T1 / Span S1) enqueues a job to a queue; let's say such jobs are processed by a consuming service asynchronously. Since the lifetimes of the producer and consumer are different, the consuming operation is modelled as a separate trace (T2 / S2) that links to T1 / S1 using span-links. If there's a way to do consistent sampling across links, then if T1 was sampled then T2 also should be sampled.
What did you expect to see?
Guidance / samples / out-of-the-box sampler to help achieve the above. For example, something like:
Additional context.