Recently there has been a lot of debate over how to support adding links after spans start.
I have written up my case against adding AddLink() an in favor of extending the AddEvent() API.
The core of my argument is that the proposal to create AddLink() is based on a missing data model, and that when we fill in the data model we'll see that for every link created at start there is implicitly a link in the reverse direction created in the other span. One span's link is another span's link too, and when the other span receives a new link we ought to call it an event. I am trying to avoid a situation where the user has to add a link to both sides of a span reference.
Part of #1929.
Part of #454.
Part of #3240.
AddLink() is a mistake
Problem statement
AddLink() is a conceptual Span API meant to allow the addition of a Link to a Span after it starts. The AddLink() API was removed from OpenTelemetry Tracing early in its history because the interaction between user-level AddLink() and the SDK-level Sampler API was poorly understood.
Although OpenTelemetry has not written down its data model for Links, we all agree Span Links are special to OpenTelmetry because they form edges in a trace graph. It is important that Links appear as a separate kind of data in our Span data model because edges are a first-class part of tracing data.
Span Events are ill-defined in OpenTelemetry tracing. OpenTelemetry has not defined a data model for Tracing that explains what a Span Event is, instead it has defined an API that stands in as a logging API.
To Summarize this problem, we have not defined a data model for Span Events and we have not defined an API model for Span Links. To solve the API problem, we need to define a data model for Spans.
Proposed data model statements
The following definitions, added to a new data model document, will help us address this problem. These would be retroactive definitions, but there should be no conflicts with existing understanding.
Span Link: Data Model
A Span Link is a reference between two Spans that defines an edge in the trace graph. There are two Spans associated with every link, one called the Referer and one called the Referent. There is an implied or explicit event whenever a Span Link is formed.
In the existing specification, Spans Links are implicitly Referer links and they can only be created when the Span starts (i.e., the Start event creates the link). Note that when a Span Link is presented to the Tracer when the Span starts, that information can be used to make the sampling decision.
In the existing specification, Referent links are implicit features. We know there is a Referent Span for every Referer Span Link, but the only way to record a Referent Span Link is to sample the Referer Span. In other words, to capture the referent link you must sample the referer Span. This explains why we had to remove AddLink() in the early specification, because the Sampler has no mechanism to record referent spans links otherwise.
In proposed data model specification for tracing, Links will have new fields added to support AddLink() functionality. Necessary Sampler API changes are described below.
message Link {
// The existing fields
bytes trace_id = 1;
bytes span_id = 2;
string trace_state = 3;
repeated opentelemetry.proto.common.v1.KeyValue attributes = 4;
uint32 dropped_attributes_count = 5;
// New fields to represent links added after span start. When these
// are set to default values, the span link was created with the span.
string link_event_name = 6; // To add an event after span start, it must have a name.
fixed64 time_unix_nano = 7; // To add an event after span start, it must have a time.
LinkKind link_kind = 8; // New field to distinguish which kind of link event.
}
enum LinkKind {
LINK_KIND_REFERER = 0, // The original span link was a referer link
LINK_KIND_REFERENT = 1, // Referent links are always added after span start
}
Note that whatever APIs are used to create Referer Span Links (see below), the system can be configured to also create Referent Span Links.
For example, if the referent span is alive at the time a link is created, the SDK could be configured to record a referent Link; otherwise, this implicit Referent link can can be established downstream, at the consumer. When a span link is created to a span after the referent span has ended, only the downstream consumer is capable of adding in the Referent link.
Span Event: Data Model
The existing Span Event concept is not well defined in the data model. Here is a proposed definition for the Event that appears in the Span data model.
Span Events are a projection of selected Log events that refer to the Span.
There are no changes to the existing Event structure in this proposal, only a new data model definition.
Note that whereas Span Link data objects contain one explicit and one implicit span reference (i.e., the referer and referent), Span Events contain just one implicit span reference (i.e., the associated span).
Suppose a user has an existing logging API and is capable of including a span context in a log event. The Span Event constructed from the log event is said to be a "projection" because it may be one of several copies. The log event that mentions two span IDs may project an Event into both span data objects.
Note that a log event that mentions two span IDs is not the same as a span Link, because Links are first class parts of the data model. To add a span Link requires a using the Tracer or Span APIs, depending on whether the link is created at the same time as the span or not.
Span Link: API model
In this proposal, there are two APIs for creating links.
Links may be constructed at span start using Tracer.Start(WithLink(context, attributes...)). While this constructs explicit Referer links, it also creates implicit Referent links in the referred-to spans. The SDK can be configured to record Referent links in live spans, as described above, in which case a span Start with N associated links to live spans actually constructs 2*N Link objects, that is N referer links in one span and N referent links in the other spans.
Links may be constructed after span start by recording an event with a span link property using the AddEvent API with an optional link context Span.AddEvent("name", WithLinkContext(context), WithAttributes(attributes...)). When the AddEvent API receives a Link context it will create a Link object, otherwise it will create an Event object.
Span Event: API model
As described above, the span AddEvent API would be used to record either events or links, since after-Start links are identical to events with the addition of a span context.
The docuemntation for AddEvent would be updated to indicate that it is a mechanism for reporting log events associated with the span, including Link events. AddEvent would explain that while it is one way of creating Events and Links in the resulting span data, there are other mechanisms for creating Links and Events. For example, you can call AddEvent() on the Referer span to create a Link in the Referent span, or you can use an OpenTelemetry-recognized logging API to project an Event into the Span.
Sampler fixes
The original reason that AddLink() was removed from the Tracing API was that it interfered with Sampling decisions. While it is unavoidable that Links added after a span starts cannot influence a head-sampling decision, the problem with sampling and links is worse than that.
When a sampler has decided to sampler a span AND that span is the referent span of links created by other spans, there is no way to encode the Referent links in the span unless the other spans are also sampled.
While Referer links in the current specification can only be created when a span starts, Referent links can only be created after a span starts. Since we have no way to encode a link added after a span starts in the span data, it is not possible for the sampled span to encode complete information about itself. We need a way for the Span data to incorporate Referent links to support Sampling in the presence of links, due to cases where the Referer span is not sampled, so we need a data model for links added after a span starts.
Recently there has been a lot of debate over how to support adding links after spans start.
I have written up my case against adding
AddLink()an in favor of extending theAddEvent()API.The core of my argument is that the proposal to create
AddLink()is based on a missing data model, and that when we fill in the data model we'll see that for every link created at start there is implicitly a link in the reverse direction created in the other span. One span's link is another span's link too, and when the other span receives a new link we ought to call it an event. I am trying to avoid a situation where the user has to add a link to both sides of a span reference.Part of #1929.
Part of #454.
Part of #3240.
AddLink() is a mistake
Problem statement
AddLink() is a conceptual Span API meant to allow the addition of a Link to a Span after it starts. The AddLink() API was removed from OpenTelemetry Tracing early in its history because the interaction between user-level AddLink() and the SDK-level Sampler API was poorly understood.
Although OpenTelemetry has not written down its data model for Links, we all agree Span Links are special to OpenTelmetry because they form edges in a trace graph. It is important that Links appear as a separate kind of data in our Span data model because edges are a first-class part of tracing data.
Span Events are ill-defined in OpenTelemetry tracing. OpenTelemetry has not defined a data model for Tracing that explains what a Span Event is, instead it has defined an API that stands in as a logging API.
To Summarize this problem, we have not defined a data model for Span Events and we have not defined an API model for Span Links. To solve the API problem, we need to define a data model for Spans.
Proposed data model statements
The following definitions, added to a new data model document, will help us address this problem. These would be retroactive definitions, but there should be no conflicts with existing understanding.
Span Link: Data Model
A Span Link is a reference between two Spans that defines an edge in the trace graph. There are two Spans associated with every link, one called the Referer and one called the Referent. There is an implied or explicit event whenever a Span Link is formed.
In the existing specification, Spans Links are implicitly Referer links and they can only be created when the Span starts (i.e., the Start event creates the link). Note that when a Span Link is presented to the Tracer when the Span starts, that information can be used to make the sampling decision.
In the existing specification, Referent links are implicit features. We know there is a Referent Span for every Referer Span Link, but the only way to record a Referent Span Link is to sample the Referer Span. In other words, to capture the referent link you must sample the referer Span. This explains why we had to remove AddLink() in the early specification, because the Sampler has no mechanism to record referent spans links otherwise.
In proposed data model specification for tracing, Links will have new fields added to support AddLink() functionality. Necessary Sampler API changes are described below.
Note that whatever APIs are used to create Referer Span Links (see below), the system can be configured to also create Referent Span Links.
For example, if the referent span is alive at the time a link is created, the SDK could be configured to record a referent Link; otherwise, this implicit Referent link can can be established downstream, at the consumer. When a span link is created to a span after the referent span has ended, only the downstream consumer is capable of adding in the Referent link.
Span Event: Data Model
The existing Span Event concept is not well defined in the data model. Here is a proposed definition for the Event that appears in the Span data model.
Span Events are a projection of selected Log events that refer to the Span.
There are no changes to the existing Event structure in this proposal, only a new data model definition.
Note that whereas Span Link data objects contain one explicit and one implicit span reference (i.e., the referer and referent), Span Events contain just one implicit span reference (i.e., the associated span).
Suppose a user has an existing logging API and is capable of including a span context in a log event. The Span Event constructed from the log event is said to be a "projection" because it may be one of several copies. The log event that mentions two span IDs may project an Event into both span data objects.
Note that a log event that mentions two span IDs is not the same as a span Link, because Links are first class parts of the data model. To add a span Link requires a using the Tracer or Span APIs, depending on whether the link is created at the same time as the span or not.
Span Link: API model
In this proposal, there are two APIs for creating links.
Links may be constructed at span start using
Tracer.Start(WithLink(context, attributes...)). While this constructs explicit Referer links, it also creates implicit Referent links in the referred-to spans. The SDK can be configured to record Referent links in live spans, as described above, in which case a span Start with N associated links to live spans actually constructs 2*N Link objects, that is N referer links in one span and N referent links in the other spans.Links may be constructed after span start by recording an event with a span link property using the AddEvent API with an optional link context
Span.AddEvent("name", WithLinkContext(context), WithAttributes(attributes...)). When the AddEvent API receives a Link context it will create a Link object, otherwise it will create an Event object.Span Event: API model
As described above, the span AddEvent API would be used to record either events or links, since after-Start links are identical to events with the addition of a span context.
The docuemntation for AddEvent would be updated to indicate that it is a mechanism for reporting log events associated with the span, including Link events. AddEvent would explain that while it is one way of creating Events and Links in the resulting span data, there are other mechanisms for creating Links and Events. For example, you can call AddEvent() on the Referer span to create a Link in the Referent span, or you can use an OpenTelemetry-recognized logging API to project an Event into the Span.
Sampler fixes
The original reason that AddLink() was removed from the Tracing API was that it interfered with Sampling decisions. While it is unavoidable that Links added after a span starts cannot influence a head-sampling decision, the problem with sampling and links is worse than that.
When a sampler has decided to sampler a span AND that span is the referent span of links created by other spans, there is no way to encode the Referent links in the span unless the other spans are also sampled.
While Referer links in the current specification can only be created when a span starts, Referent links can only be created after a span starts. Since we have no way to encode a link added after a span starts in the span data, it is not possible for the sampled span to encode complete information about itself. We need a way for the Span data to incorporate Referent links to support Sampling in the presence of links, due to cases where the Referer span is not sampled, so we need a data model for links added after a span starts.