You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on Azure SDK instrumentation and we have a problem with double instrumentation on the HTTP layer.
Here's the context:
we want to support users who auto-instrument (with bytecode agent, monkey-patching, diagnostic sources) and those who don't - it's not in our control.
we instrument public API calls like 'upload blob' (so users know what happens on the public surface of the library and can map spans to their code). Public API calls translate into a number of REST calls (retries, auth, complex operations) and underlying HTTP calls give users insights into what happens in the SDK too.
we want to instrument HTTP layer so we can
add extra properties that are specific to Azure and provide a better experience to users (e.g. users need them to get official support). We occasionally want to strip some sensitive parameters. All those are per-http request and have no place on the parent logical span.
support legacy context propagation protocols with Azure services
we can't rely on the auto-instrumentation (even if it's on) because we can't augment the span that will be created with auto-instrumentation on a lower API level with all the details that are important.
So users who don't use auto-instrumentation have spans representing HTTP calls from our client libraries, users who use auto-instrumentation, would get two spans for the same HTTP call.
Generalizing this problem beyond Azure SDKs:
TL;DR: - Client libraries cannot trace common protocol-level calls if those could also be traced by auto-instrumentation. - If they do (when auto-instrumentation is on), duplicate spans will be created and context propagation will be broken. - Libraries could make their programmatic tracing configurable, but then they cannot inject service-specific data into auto-instrumented RPC-call span
Potential solutions
Backoff if the context is injected. If the context is already injected on the request (HTTP/gRPC/anything else), there is no other good option than to back off. Options are:
re-instrument, i.e. create a new span and inject header (replace or add another value)? Then the logic that injected the header (and created a previous span) will be broken. There is no way to suppress that span.
re-instrument, but not inject header? Then this instrumentation layer is broken - there is no reason to export this span
don't instrument: ok, someone above already instrumented this request and perhaps created a span, nothing else to do. It seems nothing is broken and we didn't even create a span
This approach also means that the user's manual instrumentation always wins, which seems like a good default to have in terms of supportability. This is really short-term mitigation for a subset of double-instrumentation problems.
A similar contract might exist for server-side auto-instrumentation: if there is a current span already - back-off, but I can imagine it can go sideways if requests start from a thread that has span (e.g. created in start-up) by mistake.
Down-to-earth approach we discussed a while back in the OpenCensus community was Terminal Context (perhaps terminology is outdated, happy to update if there is an interest). This only works for client-side instrumentation.
Suppressing instrumentation implementations in OTel
Please note that context suppression is not really possible in client libraries as it requires dependency on instrumentation packages (to export suppress key). So this is not a viable workaround.
What are you trying to achieve?
I'm working on Azure SDK instrumentation and we have a problem with double instrumentation on the HTTP layer.
Here's the context:
So users who don't use auto-instrumentation have spans representing HTTP calls from our client libraries, users who use auto-instrumentation, would get two spans for the same HTTP call.
Generalizing this problem beyond Azure SDKs:
TL;DR:
- Client libraries cannot trace common protocol-level calls if those could also be traced by auto-instrumentation.
- If they do (when auto-instrumentation is on), duplicate spans will be created and context propagation will be broken.
- Libraries could make their programmatic tracing configurable, but then they cannot inject service-specific data into auto-instrumented RPC-call span
Potential solutions
This approach also means that the user's manual instrumentation always wins, which seems like a good default to have in terms of supportability. This is really short-term mitigation for a subset of double-instrumentation problems.
A similar contract might exist for server-side auto-instrumentation: if there is a current span already - back-off, but I can imagine it can go sideways if requests start from a thread that has span (e.g. created in start-up) by mistake.
Existing discussions
Suppressing instrumentation implementations in OTel
Please note that context suppression is not really possible in client libraries as it requires dependency on instrumentation packages (to export suppress key). So this is not a viable workaround.
Happy to hear suggestions and drive it.