Add guidance and info metric for per cicd pipeline run metrics#2237
Add guidance and info metric for per cicd pipeline run metrics#2237kamphaus wants to merge 9 commits intoopen-telemetry:mainfrom
Conversation
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
| - id: metric.cicd.pipeline.run.info | ||
| type: metric | ||
| metric_name: cicd.pipeline.run.info | ||
| brief: 'This is an info metric linking pipeline runs to any other associated entities or information.' |
There was a problem hiding this comment.
I don't have a helpful suggestion, but just a thought around how to make this brief more clear (if that's something we think needs to be more clear). Can we link to some reference documentation around what an info metric is?
There was a problem hiding this comment.
+1
It'd be helpful to include when it's reported - when the run starts?
Since it results in high cardinality, does it need to be a metric? is it a time to consider using event instead? It would provide all the correlation too
There was a problem hiding this comment.
does it need to be a metric?
The cicd.pipeline.run.info metric is meant to cover case 2. We query this info metric in order to pivot from one attribute (cicd.pipeline.run.id) to the host or pod id as such it has to be a metric.
There was a problem hiding this comment.
It'd be helpful to include when it's reported - when the run starts?
Although it would be nice to have a way to provide more information about a pipelines, pipeline runs and workers I think those would be more appropriately added as descriptive attributes on the entities.
It was my understanding from the entities SIG that at some point the descriptive attributes are meant to be conveyed in an efficient way (like an info metric maybe?), but I might be mistaken.
Since cicd.pipeline.run.info metric is primarily meant for linking the different entities I'd like to start with that and add other information later.
Update: Discussed in SemConv meeting 2025-08-04: Indeed entities to metrics signal transformations are on the Entities SIG roadmap, but could take a long time to arrive.
There was a problem hiding this comment.
Can we link to some reference documentation around what an info metric is?
The closest definition to what an "info" metric type is that I could find is this paragraph in the Otel specs: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#info
Should I link to it?
|
|
||
| ### Guidance on per pipeline run metrics | ||
|
|
||
| You MAY use metrics that include an attribute identifying the pipeline run by using the [`cicd.pipeline.run`](/docs/resource/cicd.md#cicd-pipeline-run) entity association. |
There was a problem hiding this comment.
does it apply to all metrics in this file?
I'm not sure it's possible to pick and choose which entity associations specific metric gets. When you create a meter it's associated with all entities that are defined on that meter provider. I don't see if it changes with open-telemetry/opentelemetry-specification#456, checking with @jsuereth to confirm.
I.e. someone that adds cicd.pipeline.run entity should expect that all metrics emitted during the run would have association with it, including cicd.pipeline.run.duration.
There was a problem hiding this comment.
The idea behind this guidance paragraph and the info metric is to be able to query for other metrics related to the runner: host or pod metrics for cpu, memory, disk.
To be able to easily show them in a dashboard we need to be able to select the run and then the agent for that run (in case multiple runners are involved).
There are two ways that allow to do this:
- adding the
cicd.pipeline.runentity to all the metrics (ie. to the host metrics and pod metrics). - or adding an additional metric which has both the
cicd.pipeline.runentity and ahostork8s.podentity. This would be thecicd.pipeline.run.infometric.
In this paragraph I meant to cover case 1. That the cicd.pipeline.run entity could be added to the host metrics and pod metrics.
There was a problem hiding this comment.
As discussed in SemConv meeting 2025-08-04:
In CICD we do not use a single SDK to attach entities to emitted signals.
Instead we use multiple SDKs where appropriate, for example:
- in the controller to emit controller-specific signals
- on the workers to emit signals related to the pipeline run executing on that worker
Alternatively we construct the otlp without using an SDK: for example in the otel-collector github receiver.
should expect that all metrics emitted during the run would have association with it
Since we can choose how a specific signal is emitted (eg. using which SDK) we may associate only the relevant entities to a given signal. In the example of the cicd.pipeline.run.duration it would most likely be emitted by the CICD controller without specific pipeline / pipeline run entity association. This allows using cicd.pipeline.name as an attribute for the cicd.pipeline.run.duration metric without having a specific pipeline run as a resource of that metric.
| - host | ||
| - container | ||
| - k8s.namespace | ||
| - k8s.pod |
There was a problem hiding this comment.
is any of these entities happened to be provided on the worker, cicd.pipeline.run.duration would also be associated with them. Which makes me wonder what's the point of cicd.pipeline.run.info metric?
There was a problem hiding this comment.
What do you mean?
None of the cicd entities include a duration attribute.
|
Is there a prototype? Could you please add a link in the PR description? Thanks! |
Done, I will update the metric name in the prototype once this PR is merged. In the dashboard you can find examples of querying by the info metric: |
Discussed in SemConv meeting 2025-08-4Fine to define info metrics to define entity relationships. Best to have consistency with k8s info / state metrics If we block this PR to have a wider discussion in SemConv maintainers, we can still move forward with defining such info metrics as opt-in in other system (in otel-collector as an example). Improve the PR: give examples of use cases. Expand on the description. |
|
I'm closing this PR since we won't move forward on the blocking issue in the foreseeable future. |
Fixes #1184
Changes
This PR adds guidance on per pipeline run metrics and also adds the
cicd.pipeline.run.infometric for linking a pipeline run entity to other entities.Prototype
A precursor metric
ci.podspan.info(now namedcicd.pipeline.run.infoin this PR) was used in the opentelemetry-agent-metrics-plugin.An otel-collector uses the span attributes and resource attributes (using their
cicd.pipelineandcicd.pipeline.runentity links) to establish the link to k8s attributes (withk8sattributesprocessor) and emitting the info metric to allow querying.https://github.com/jenkinsci/opentelemetry-agent-metrics-plugin/blob/main/collector.yaml
A Grafana dashboard allows querying for pod metrics (cpu, memory, network, disk, ...) when given the pipeline run id.
https://github.com/jenkinsci/opentelemetry-agent-metrics-plugin/blob/main/ci-pod-metrics-dashboard.json
Merge requirement checklist
[chore]