What are you trying to achieve?
This proposal introduces a new semantic convention attribute service.criticality to enable classification of services based on their operational importance. This attribute will allow observability platforms to implement criticality-aware tracing, and sampling strategies.
What did you expect to see?
| Value |
Description |
Use Cases |
critical |
Service is business-critical; downtime directly impacts revenue, user experience, or core functionality |
Payment processing, authentication, primary user-facing APIs |
high |
Service is important but has degradation tolerance or fallback mechanisms |
Shopping cart, search, recommendation engines |
medium |
Service provides supplementary functionality; degradation has limited user impact |
Analytics, reporting, non-essential integrations |
low |
Service is non-essential to core operations; used for background tasks or internal tools |
Batch processors, cleanup jobs, internal dashboards |
Additional context.
By introducing a standardized criticality attribute, one can:
- Implement adaptive sampling rates (e.g., 100% for critical, 10% for low-priority services)
- Optimize telemetry costs by reducing data from non-critical services
- Improve incident response by surfacing critical service traces first
- Enable better capacity planning and resource allocation
I attach below a sample OTel collector tailsampling processor where the proposed semconv attribute is utilized as intended
# OpenTelemetry Collector Configuration
# Tail-based sampling using service.criticality
receivers:
otlp:
protocols:
grpc:
http:
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: critical-services-policy
type: string_attribute
string_attribute:
key: service.criticality
values:
- critical
enabled_regex_matching: false
invert_match: false
- name: high-criticality-services
type: and
and:
and_sub_policy:
- name: is-high-criticality
type: string_attribute
string_attribute:
key: service.criticality
values:
- high
- name: probabilistic-50
type: probabilistic
probabilistic:
sampling_percentage: 50
- name: medium-criticality-services
type: and
and:
and_sub_policy:
- name: is-medium-criticality
type: string_attribute
string_attribute:
key: service.criticality
values:
- medium
- name: probabilistic-10
type: probabilistic
probabilistic:
sampling_percentage: 10
- name: low-criticality-services
type: and
and:
and_sub_policy:
- name: is-low-criticality
type: string_attribute
string_attribute:
key: service.criticality
values:
- low
- name: probabilistic-1
type: probabilistic
probabilistic:
sampling_percentage: 1
- name: error-traces
type: status_code
status_code:
status_codes:
- ERROR
- name: slow-critical-traces
type: and
and:
and_sub_policy:
- name: is-critical-or-high
type: string_attribute
string_attribute:
key: service.criticality
values:
- critical
- high
- name: is-slow
type: latency
latency:
threshold_ms: 5000
exporters:
otlp:
endpoint: some-backend:4317
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [otlp]
Add any other context about the problem here. If you followed an existing documentation, please share the link to it.
Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
What are you trying to achieve?
This proposal introduces a new semantic convention attribute
service.criticalityto enable classification of services based on their operational importance. This attribute will allow observability platforms to implement criticality-aware tracing, and sampling strategies.What did you expect to see?
criticalhighmediumlowAdditional context.
By introducing a standardized criticality attribute, one can:
I attach below a sample OTel collector tailsampling processor where the proposed semconv attribute is utilized as intended
Add any other context about the problem here. If you followed an existing documentation, please share the link to it.
Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.