Skip to content

Path forward for id vs name and weaver diff #785

@jsuereth

Description

@jsuereth

I'd like to propose the following:

  • Move to a single representation of "name" for signals.
    • Here group.name means "metric name", "span type", "event name" and "entity type".
    • This name forms a namespace per signal. So the entity with name host means the same thing everywhere
    • This name is something we'll make sure is transmitted via OTLP and enforce-able in live-check.
  • Clarify what "id" means for groups.
    • group.id denotes an identifier for understanding a definition in weaver/semconv YAML files.
    • The id forms a namespace per group definition.
    • We expect code and documentation generation to interact at this level in weaver.
      • e.g. a java.apache.http.client.duration group id would refer to a metric of name http.client.duration. This may be a specialized instance of http.client.duration used to codegen for the Java Apache HTTP client library, and can be referenced when generating documentation for that library.

Background

The difference between id and actual identity of a signal has been a sore spot in semantic conventions for some time.

For example, today all semantic convention compatibility policies are enforced at the name level. This means there is no enforcement for spans today: an attribute may be dropped with no automated check. Additionally, we have issues in weaver with resolving registries and enforcing "extension" group ids.

We actually already have this issue with attributes. Today, only registries which match semantic conventions usage of registry.* attribute groups can use weaver diff.

What's the meaning of id vs. name

We should view name and id the same way we see a type and an instance (or term) in a programming language. For example String x, we would have String be the name, and x be the id. The identity denotes a string in a more well understood context and may have more limitations that the general String.

Within weaver, today, we do not define "type" and "instance" separately. Instead, e.g. with attributes we've promoted a special usage in Semantic Conventions where we define "registry" groups, and use ref to refine an attribute within a specific context. Additionally, weaver allows extend on group to allow refinement of a definition or re-use of a set of attributes.

Refinement

A key aspect of this proposal is that we understand when a group or attribute is the "root source" of the definition vs. when it is a refinement. I propose adding refinement tracking in weaver with the following rules:

  • An attribute that is a ref is a refinement of what it refers.
  • A group that extends another group with the same type is a refinement to what it extends.
    • e.g. An attribute group shared.attributes that is extended by my.span group would NOT be the source of truth for my.span, but a java.apache.http.client.duration metric group that extends metric.http.client.duration metric group WOULD be a refinement.

Implications for weaver resolve

  • We can create signal registries for metric, events, etc. if desired.
  • We should only allow shared name between groups IFF one group refines the other.

Implications for weaver diff

  • Weaver diff MUST operate three levels:
    • "global" Attribute differences (which already use 'name')
    • "global" Signal differences (which will use 'name' going forward)
    • Group differences (A new output, to be designed).
  • We will use refinement in diff.
    • Attribute differences only work on "source", not refinements.
    • Signal differences only work on "source", not refinements.
  • We will likely need to provide diff for refinements between versions.
    This is akin to the apply_to_metrics config in existing Telemetry Schema.
    I think this should be follow on work

Implications for live-check

  • Live check can only enforce at the signal (name) level.
  • It will need the same capability to understand the "source" of a signal vs. a specialized instance (e.g. the raw rules of http.client.duration vs. the Java Apache HTTP client specific http.client.duration)

implications for emit

Emit should continue to emit all groups independently. The specialized instances are important to demonstrate downstream.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions