Skip to content

Add guidance and info metric for per cicd pipeline run metrics#2237

Closed
kamphaus wants to merge 9 commits intoopen-telemetry:mainfrom
kamphaus:1184-per-cicd-run-metrics
Closed

Add guidance and info metric for per cicd pipeline run metrics#2237
kamphaus wants to merge 9 commits intoopen-telemetry:mainfrom
kamphaus:1184-per-cicd-run-metrics

Conversation

@kamphaus
Copy link
Copy Markdown
Contributor

@kamphaus kamphaus commented May 6, 2025

Fixes #1184

Changes

This PR adds guidance on per pipeline run metrics and also adds the cicd.pipeline.run.info metric for linking a pipeline run entity to other entities.

Prototype

A precursor metric ci.podspan.info (now named cicd.pipeline.run.info in this PR) was used in the opentelemetry-agent-metrics-plugin.

An otel-collector uses the span attributes and resource attributes (using their cicd.pipeline and cicd.pipeline.run entity links) to establish the link to k8s attributes (with k8sattributes processor) and emitting the info metric to allow querying.
https://github.com/jenkinsci/opentelemetry-agent-metrics-plugin/blob/main/collector.yaml

A Grafana dashboard allows querying for pod metrics (cpu, memory, network, disk, ...) when given the pipeline run id.
https://github.com/jenkinsci/opentelemetry-agent-metrics-plugin/blob/main/ci-pod-metrics-dashboard.json

Merge requirement checklist

@github-actions github-actions Bot added enhancement New feature or request area:cicd labels May 6, 2025
@lmolkova lmolkova moved this from Untriaged to Awaiting SIG approval in Semantic Conventions Triage May 13, 2025
@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions Bot added the Stale label May 24, 2025
@github-actions github-actions Bot removed the Stale label May 26, 2025
@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions Bot added the Stale label Jun 10, 2025
@github-actions github-actions Bot removed the Stale label Jun 11, 2025
@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions Bot added the Stale label Jul 10, 2025
@github-actions github-actions Bot removed the Stale label Jul 12, 2025
Comment thread model/cicd/metrics.yaml
- id: metric.cicd.pipeline.run.info
type: metric
metric_name: cicd.pipeline.run.info
brief: 'This is an info metric linking pipeline runs to any other associated entities or information.'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a helpful suggestion, but just a thought around how to make this brief more clear (if that's something we think needs to be more clear). Can we link to some reference documentation around what an info metric is?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

It'd be helpful to include when it's reported - when the run starts?

Since it results in high cardinality, does it need to be a metric? is it a time to consider using event instead? It would provide all the correlation too

Copy link
Copy Markdown
Contributor Author

@kamphaus kamphaus Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be a metric?

The cicd.pipeline.run.info metric is meant to cover case 2. We query this info metric in order to pivot from one attribute (cicd.pipeline.run.id) to the host or pod id as such it has to be a metric.

Copy link
Copy Markdown
Contributor Author

@kamphaus kamphaus Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be helpful to include when it's reported - when the run starts?

Although it would be nice to have a way to provide more information about a pipelines, pipeline runs and workers I think those would be more appropriately added as descriptive attributes on the entities.
It was my understanding from the entities SIG that at some point the descriptive attributes are meant to be conveyed in an efficient way (like an info metric maybe?), but I might be mistaken.

Since cicd.pipeline.run.info metric is primarily meant for linking the different entities I'd like to start with that and add other information later.

Update: Discussed in SemConv meeting 2025-08-04: Indeed entities to metrics signal transformations are on the Entities SIG roadmap, but could take a long time to arrive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to some reference documentation around what an info metric is?

The closest definition to what an "info" metric type is that I could find is this paragraph in the Otel specs: https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#info

Should I link to it?

@joaopgrassi joaopgrassi moved this from Awaiting codeowners approval to Needs More Approval in Semantic Conventions Triage Jul 28, 2025
Comment thread docs/cicd/cicd-metrics.md

### Guidance on per pipeline run metrics

You MAY use metrics that include an attribute identifying the pipeline run by using the [`cicd.pipeline.run`](/docs/resource/cicd.md#cicd-pipeline-run) entity association.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it apply to all metrics in this file?

I'm not sure it's possible to pick and choose which entity associations specific metric gets. When you create a meter it's associated with all entities that are defined on that meter provider. I don't see if it changes with open-telemetry/opentelemetry-specification#456, checking with @jsuereth to confirm.

I.e. someone that adds cicd.pipeline.run entity should expect that all metrics emitted during the run would have association with it, including cicd.pipeline.run.duration.

Copy link
Copy Markdown
Contributor Author

@kamphaus kamphaus Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this guidance paragraph and the info metric is to be able to query for other metrics related to the runner: host or pod metrics for cpu, memory, disk.
To be able to easily show them in a dashboard we need to be able to select the run and then the agent for that run (in case multiple runners are involved).
There are two ways that allow to do this:

  1. adding the cicd.pipeline.run entity to all the metrics (ie. to the host metrics and pod metrics).
  2. or adding an additional metric which has both the cicd.pipeline.run entity and a host or k8s.pod entity. This would be the cicd.pipeline.run.info metric.

In this paragraph I meant to cover case 1. That the cicd.pipeline.run entity could be added to the host metrics and pod metrics.

Copy link
Copy Markdown
Contributor Author

@kamphaus kamphaus Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in SemConv meeting 2025-08-04:
In CICD we do not use a single SDK to attach entities to emitted signals.
Instead we use multiple SDKs where appropriate, for example:

  • in the controller to emit controller-specific signals
  • on the workers to emit signals related to the pipeline run executing on that worker

Alternatively we construct the otlp without using an SDK: for example in the otel-collector github receiver.

should expect that all metrics emitted during the run would have association with it

Since we can choose how a specific signal is emitted (eg. using which SDK) we may associate only the relevant entities to a given signal. In the example of the cicd.pipeline.run.duration it would most likely be emitted by the CICD controller without specific pipeline / pipeline run entity association. This allows using cicd.pipeline.name as an attribute for the cicd.pipeline.run.duration metric without having a specific pipeline run as a resource of that metric.

Comment thread model/cicd/metrics.yaml
Comment on lines +53 to +56
- host
- container
- k8s.namespace
- k8s.pod
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is any of these entities happened to be provided on the worker, cicd.pipeline.run.duration would also be associated with them. Which makes me wonder what's the point of cicd.pipeline.run.info metric?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?
None of the cicd entities include a duration attribute.

@lmolkova
Copy link
Copy Markdown
Member

Is there a prototype? Could you please add a link in the PR description? Thanks!

@kamphaus
Copy link
Copy Markdown
Contributor Author

kamphaus commented Aug 3, 2025

Is there a prototype? Could you please add a link in the PR description? Thanks!

Done, I will update the metric name in the prototype once this PR is merged.

In the dashboard you can find examples of querying by the info metric:
https://github.com/jenkinsci/opentelemetry-agent-metrics-plugin/blob/5440c66be18d6563097963bcce37f1b37f21e9be/ci-pod-metrics-dashboard.json#L2558

@kamphaus
Copy link
Copy Markdown
Contributor Author

kamphaus commented Aug 4, 2025

Discussed in SemConv meeting 2025-08-4

Fine to define info metrics to define entity relationships.

Best to have consistency with k8s info / state metrics

If we block this PR to have a wider discussion in SemConv maintainers, we can still move forward with defining such info metrics as opt-in in other system (in otel-collector as an example).

Improve the PR: give examples of use cases. Expand on the description.

@kamphaus
Copy link
Copy Markdown
Contributor Author

kamphaus commented Aug 12, 2025

Improve the PR: give examples of use cases. Expand on the description.

Since this PR is blocked on the discussion of info metrics (#2595) I will split off the "Guidance on per pipeline run metrics" into its own PR to make progress on that.

Done: #2618

@kamphaus kamphaus moved this from Needs More Approval to Blocked in Semantic Conventions Triage Aug 12, 2025
kamphaus added a commit to jenkinsci/opentelemetry-agent-metrics-plugin that referenced this pull request Sep 14, 2025
kamphaus added a commit to jenkinsci/opentelemetry-agent-metrics-plugin that referenced this pull request Sep 14, 2025
kamphaus added a commit to jenkinsci/opentelemetry-agent-metrics-plugin that referenced this pull request Sep 14, 2025
@kamphaus
Copy link
Copy Markdown
Contributor Author

kamphaus commented Oct 6, 2025

I'm closing this PR since we won't move forward on the blocking issue in the foreseeable future.
Once that issue is addressed we will open a new PR.

@kamphaus kamphaus closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:cicd enhancement New feature or request

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[cicd] Define conventions for associating host/pod metrics of a cicd runner with pipeline runs

5 participants