-
Notifications
You must be signed in to change notification settings - Fork 26
Add proposal: Simplify joins with info metrics in PromQL #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
72ce08e
38c2743
88feea2
7fd4887
d42a059
3f3b7f5
32707a8
7fdf82a
e265217
7c8f527
5cfeef3
ceead4a
e2ccdda
01692a6
c54c0a4
612b41b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| # Simplify joins with info metrics in PromQL | ||
|
|
||
| * **Owners:** | ||
| * Arve Knudsen [@aknuds1](https://github.com/aknuds1) [[email protected]](mailto:[email protected]) | ||
|
|
||
| * **Implementation Status:** Partially implemented | ||
|
|
||
| * **Related Issues and PRs:** | ||
| * [WIP: Info PromQL function prototype](https://github.com/grafana/mimir-prometheus/pull/598) | ||
|
|
||
| * **Other docs or links:** | ||
| * [Proper support for OTEL resource attributes](https://docs.google.com/document/d/1FgHxOzCQ1Rom-PjHXsgujK8x5Xx3GTiwyG__U3Gd9Tw/edit#heading=h.unv3m5m27vuc) | ||
| * [Special treatment of info metrics in Prometheus](https://docs.google.com/document/d/1ebhGNLs3uhdeprJCullM-ywA9iMRDg_mmnuFAQCloqY/edit#heading=h.2rmzk7oo6tu8) | ||
| * [Scenarios scratch pad](https://docs.google.com/document/d/1nV6N3pDfvZhmG2658huNbFSkz2rsM6SpkHabp9VVpw0/edit#heading=h.luf3yapzr29e) | ||
|
|
||
| > This proposal collects the requirements and implementation proposals for simplifying joins with info type metrics in PromQL. | ||
|
|
||
| ## Why | ||
|
|
||
| Info metrics are [defined by the OpenMetrics specification](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#info) as "used to expose textual information which SHOULD NOT change during process lifetime". | ||
| Furthermore the OpenMetrics specification states that info metrics ["MUST have the suffix `_info`"](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#info-1). | ||
| Despite the latter OpenMetrics requirement, there are metrics with the info metric usage pattern that don't have the `_info` suffix, e.g. `kube_pod_labels`. | ||
| In this proposal, we shall include the latter in the definition of info metrics. | ||
|
|
||
| Currently, enriching Prometheus query results with corresponding labels from info metrics is challenging. | ||
| More specifically, it requires writing advanced PromQL to join with the info metric in question. | ||
| Take as an example querying HTTP request rates per K8s cluster and status code, while having to join with the `target_info` metric to obtain the `k8s_cluster_name` label: | ||
|
|
||
| ```promql | ||
| sum by (k8s_cluster_name, http_status_code) ( | ||
| rate(http_server_request_duration_seconds_count[2m]) | ||
| * on (job, instance) group_left (k8s_cluster_name) | ||
| target_info | ||
| ) | ||
| ``` | ||
|
|
||
| The `target_info` metric is in fact the motivation for this proposal, as it's how Prometheus encodes OpenTelemetry (OTel for short) [resource attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md). | ||
| As a result, it's a very important info metric for those using Prometheus as an OTel backend. | ||
| OTel resource attributes model metadata about the environment producing metrics received by the backend (e.g. Prometheus), and Prometheus persists them as labels of `target_info`. | ||
| Typically, OTel users want to include some of these attributes (as `target_info` labels) in their query results, to correlate them with entities of theirs (e.g. K8s pods). | ||
bwplotka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Based on user demand, it would be preferable if Prometheus were to have better UX for enriching query results with info metrics labels, especially with OTel in mind. | ||
| There are other problems with Prometheus' current method of including info metric labels in queries, beyond just the technical barrier: | ||
| * Explicit knowledge of each info metric's identifying labels must be embedded in join queries for when you wish to enrich queries with data (non-identifying) labels from info metrics. | ||
| * A certain pair of OTel resource attributes (`service.name` and `service.instance.id`) are currently assumed to be the identifying pair and mapped to `target_info`'s `job` and `instance` labels respectively, but this may become a dynamic property of the OTel model. | ||
| * Both attributes are in reality optional, so either of them might be missing (`service.name` is only mandatory for OTel SDK clients). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should improve in the future once OTel SDKs start generating
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even after that PR,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thats correct. SDKs will start generating it by default, but there are ways to get around it, and the collector won't always send it. It will get better, but not be solved entirely There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to resolve this in the entity WG somehow (coordinating with prometheus). Ideally we have a way to give you a grouping (job) and unique identifier (instance) for any piece of telemetry you receive from OTEL. At a minimum, I think we should be able to provide a unique identifier (instance), where "job/instance" matching should always work. As a path forward, we'll take this as a constraint on Entity work to make sure we can fulfill this condition.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jsuereth the As I see it currently at least, the main thing with the Entity model in this respect should be that OTLP write requests always specify identifying Resource attributes in a sensible way, so that Prometheus can map them to corresponding labels. WDYT?
I think at the minimum providing a unique |
||
| * If both identifying attributes are missing, `target_info` isn't generated (there being no identifying labels to join against). | ||
| * If an info metric's data (non-identifying) labels change (a situation that should become more frequent with OTel in the future, as the model will probably start allowing for non-identifying resource attribute mutations), join queries against the info metric (e.g. `target_info`) will temporarily fail due to resolving the join keys to two different metrics, until the old metric is marked stale (by default after five minutes). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "By default" the old metric is marked as stale as soon as the scrape happens that contains the new metric. The problem is that this doesn't work if ingesting via OTLP, which is obviously common in an OTel use case. Only in that case, you run into the problem of five minutes of duplication. (Which is the reason behind a minimal proposal to "fix" the problem (discussed elsewhere): Somehow make staleness handling work with OTLP ingestion.)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You may be interested in the current OTel Entities proposal, which is designed to carry non-identifying resource information: https://github.com/open-telemetry/oteps/blob/6d6febfaf05f130c703e2dd0fa91dfacada82a7d/text/entities/0256-entities-data-model.md. My hope would be that we can potentially map these to OM info metrics in the future. edit: I forgot that I already left the same comment on the previous proposal docs.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dashpole yes, this point in fact refers to the proposed Entity model, since it will allow for non-identifying resource/entity attributes to change. |
||
|
|
||
| If Prometheus could persist info metrics' identifying labels (e.g. `job` and `instance` for `target_info`), human knowledge of the correct identifying labels may become unnecessary when "joining" with info metrics. | ||
| Information about info metric identifying labels is present in at least the OpenMetrics protobuf exposition format (the OpenMetrics text exposition format unfortunately lacks this capability). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And metrics that are "info metrics in spirit only" ( Just saying this here for the record. We don't need to discuss it in this proposal (or maybe just as a footnote).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also worry about this being limited to OM, as most of the ecosystem seems to still be using prometheus text/protobuf formats today. Did we consider to expanding this to also include gauge metrics with an
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even OM needs an amendment for its text version. So the best outcome here is to "fix" OpenMetrics (including marking identifying labels in the text format and allowing info metrics that do not end on _info, but also a number of other things that have hampered adoption so far) so that it finally OM gets adopted everywhere. The second best outcome is to also amend classic Prometheus exposition formats. (It's a bit weird because I'd rather see the effort of changing the exposition format to lead directly to OM everywhere rather than more versions of the many formats that we already have.) In the meantime, I would go for scrape config options to mark metrics of whatever source and format as info metrics, including the option to mark certain labels as identifying labels. (The heuristic "target labels → identifying labels, other labels → date labels" should be fine in most cases.) A rule to say "all metrics ending on |
||
| It can also easily be deduced when ingesting metrics from OTLP (OTel Protocol). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could mention here that info metrics with additional identifying labels in the exposition format are rare. Usually, just the target labels added upon ingestion are the identifying labels, so this would be a good first guess. (Works for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm adding the following sentence, does it work for you @beorn7?
|
||
| Most info metrics' identifying labels will be `job` and `instance`, but there are some exceptions (e.g. `kube_pod_labels`). | ||
| Intrinsic knowledge of info metrics' identifying labels could also help in solving temporary conflicts between old and new versions of info metrics, when data (non-identifying) labels change. | ||
| Another possible positive outcome might be dedicated support in UIs (e.g. Grafana) for visualizing the resource attributes of OTel metrics. | ||
|
|
||
| ### Pitfalls of the current solution | ||
|
|
||
| Prometheus currently persists info metrics as if they were normal float samples. | ||
| This means that knowledge of info metrics' identifying labels are lost, and you have to base yourself on convention when querying them (for example that `target_info` should have `job` and `instance` as identifying labels). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, the damage is much bigger -- we essentially use all labels as "identifying" causing massive churn and cardinality bomb. With this feature we essentially increase use of info metric, so that's perhaps relevant to fix. |
||
| There's also no particular support for enriching query results with info metric labels in PromQL. | ||
| The consequence is that you need relatively expert level PromQL knowledge to include info metric labels in your query results; as OTel grows in popularity, this becomes more and more of a problem as users will want to include certain labels from `target_info` (corresponding to OTel resource attributes). | ||
bwplotka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Without persisted info metric metadata, one can't build more user friendly abstractions (e.g. a PromQL function) for including OTel resource attributes (or other info metric labels) in query results. | ||
| Neither can you build dedicated UI for OTel resource attributes (or other info metric labels). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps you could mention that there is room for a future iteration addressing these issues in various forms, one could think of info metrics with persistence or some other kind of metadata store.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jesusvazquez should that perhaps be mentioned in the Goals section? Before revising the proposal after @codesome's feedback, solving these problems in Prometheus was part of the goals. My understanding of the "Pitfalls of the current solution" section is that it should be referring to status quo in Prometheus, before the proposed solution. To be honest I'm a bit unsure of what to put in the Pitfalls section currently, until @codesome reviews so the final form of the proposal is clearer. |
||
|
|
||
| ## Goals | ||
|
|
||
| Goals and use cases for the solution as proposed in [How](#how): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the goal to also select certain metrics based on certain target_info non-identifying label? e.g. would I be able to do something like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added more specification about the new TSDB API and its use of label matchers. If I understand you correctly, the answer is yes:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! Then I wonder if simpler solution would be worth to discuss as an alternative e.g. special prefixed labels: https://cloud.google.com/stackdriver/docs/managed-prometheus/promql#metadata-labels 🤔
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will have a read, thanks.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After reading the link about metadata labels, as I understand Bartek, the idea is to perhaps use metadata like syntax instead of introducing an
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand the Stackdriver documentation correctly, metadata labels are, from the outside API/PromQL perspective equivalent to adding all OTel resource attributes (or version number from Obviously, a backend can be implemented in a way to make that efficient, so that the efficiency concerns can be addressed. However, there are still concerns about UX, like every metric will now have a ginormous amount of labels, which needs additional tooling to get under control (like displaying metadata labels differently). It as a long-held Prometheus best practice to not add the same information (like the version number of the exposing binary) to every metric exposed, but bundle them in an info metric (like My understanding is that this whole proposal is an attempt to solve the problem in the "classic" Prometheus way by making info metrics more powerful and more usable. The proposal to add all of the "metadata" to every metric in form of metadata labels is still on the table, but it's a different proposal.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for sharing those insights @beorn7 :)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense, thanks! |
||
|
|
||
| * Persist info metrics with labels categorized as either identifying or non-identifying (i.e. data labels). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this statements sounds a bit contradictory to the above:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suspect you have a different understanding of the "Pitfalls of the current solution" section than I do. That section has the following purpose (from the template):
|
||
| * Track when info metrics' set of identifying labels changes. This shouldn't be a frequent occurrence, but it should be handled. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar question as the data labels - if the identifying labels change in between let's say t1 (with t0 being a older time and t2 being a future time), do we use the old identifying labels for queries touching t0 to t1, and then new identifying labels for queries after t1? Let's document this.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a change of identifying labels should really be seen as a completely independent new series, without any staleness marker insertion. As said, this should be rare (while the fallout of a change in the data labels in the absence of staleness markers, e.g. what you get when ingesting OTLP, triggered the whole effort). And it could actually mean the new info series is actually meant as a completely independent new series.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@codesome yes, that is my current thinking at least. My thinking is to track through the info metric's samples, what is the identifying label set at any given point in time. Agree to document it properly, I'll give my proposal a re-read, to see whether I need to clarify further.
@beorn7 there shouldn't be any need for a staleness marker insertion, when the identifying label set changes, so long as the entire series' label set remains the same. My thinking is to just record a new sample for the info metric, that records the new set of identifying labels. Does that make sense to you as well?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It probably works (but again, I would almost see this as an implementation detail). Staleness marker will be inserted anyway if you scrape Prometheus-style, and the problems you have because of staleness markers not happening with OTLP ingestion will also happen anyway. It's not just for info metrics, and I think we don't need a bespoke staleness-marker ingestion solution here. |
||
| * When enriching a query result's labels with data labels from info metrics, it should be considered per timestamp what are each potentially matching info metric's identifying labels (since the identifying label set may change over time). | ||
| * Automatically treat the old version of an info metric as stale for query result enriching purposes, when its data labels change (producing a new time series, but with same identity from an info metric perspective). | ||
| * When enriching a query result's labels with data labels from info metrics, and there are several matches with equally named info metrics (e.g. `target_info`) for a timestamp, the one with the newest sample wins (others are considered stale). | ||
| * Simplify enriching of query results with info metric data (non-identifying) labels in PromQL, e.g. via a new function. | ||
| * Ensure backwards compatibility with current Prometheus usage. | ||
| * Minimize potential conflicts with existing metric labels. | ||
|
|
||
| ### Audience | ||
|
|
||
| Prometheus maintainers. | ||
|
|
||
| ## How | ||
|
|
||
| * Simplify the inclusion of info metric labels in PromQL through a new `info` function: `info(v instant-vector[, ls data-label-selector])`. | ||
| * If no data label matchers are provided, _all_ the data labels of found info metrics are added to the resulting time series. | ||
| * If data label matchers are provided, only info metrics with matching data labels are considered. | ||
| * If data label matchers are provided, _precisely_ the data labels specified by the label matchers are added to the returned time series. | ||
| * If data label matchers are provided, time series are only included in the result if matching data labels from info metrics were found. | ||
| * A data label matcher like `k8s_cluster_name=~".+"` guarantees that each returned time series has a non-empty `k8s_cluster_name` label, implying that time series for which no matching info metrics have a data label named `k8s_cluster_name` (including the case where no matching info metric exists at all) will be excluded from the result. | ||
| * A special case: If a data label matcher allows empty labels (equivalent to missing labels, e.g. `k8s_cluster_name=~".*"`), it will not exclude time series from the result even if there's no matching info metric. | ||
| * A data label matcher like `__name__="target_info"` can be used to restrict the info metrics used. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to allow writing it as just
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe the current implementation just recognizes the selecter of it starts with curly braces. Generally, I guess any valid vector matcher, including just the name, should be valid.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current implementation only recognizes a selector starting with curly braces indeed. But it's also a very rough and hacky implementation, as I'm concentrating on getting the underlying design (TSDB API) right primarily.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can, at the end of the day, use a normal vector matcher after all, just that the PromQL engine has to intervene before it is fully evaluated. Yesterday, I discussed with @juliusv his valid concern about using a vector matcher here in a non-standard way, and we found that you could stretch things just a tiny bit to see this vector matcher as a normal vector matcher. It will (potentially) select a lot, but all the info metrics we want will be selected, too, so one might argue the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, the Now on a language typing level, we'll still have the problem that we only have four types: strings, scalars, and instant/range vectors. We currently don't have any way to specify that a function expects not just one of those four types as an input, but a specific AST node type (specifically an instant vector selector, not just any instant-vector-typed result). I guess for a start we could ignore this in parsing and have the function check at run-time whether the parameter is really an instant vector selector and error out otherwise.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| However, the `__name__` label itself will not be copied. | ||
| * In the case of multiple versions of the same info metric being found (with the same identifying labels), the one with the newest sample wins. | ||
| * Label collisions: The input instant vector could already contain labels that are also part of the data labels of a matching info metric. | ||
| Furthermore, since multiple differently named info metrics with matching identifying labels might be found, those might have overlapping data labels. | ||
| In this case, the implementation has to check if the values of the affected labels match or are different. | ||
| The former case is not really a label collision and therefore causes no problem. | ||
| In the latter case, however, an error has to be returned to the user. | ||
| The collision can be resolved by constraining the labels via data label matchers. | ||
| And of course, the user always has the option to go back to the original join syntax (or, even better, avoiding ingesting conflicting info metrics in the first place). | ||
| * Track each info metric's identifying label set over time (in case it changes) - storage model details to be elaborated in separate proposal. | ||
| * This allows determining on a per-timestamp basis, which are an identifying metric's identifying labels. | ||
| * Keep info metric indexes in storage - storage model details to be elaborated in separate proposal. | ||
| * Info metric indexes maintaining per info metric the different identifying label sets it has had over its lifetime. | ||
| * Indexing the different identifying label sets an info metric has had over its lifetime allows determining which are potential matches for a given metric, before considering the time dimension. | ||
|
|
||
| Using the `info` function, we can simplify the previously given PromQL join example as follows: | ||
|
|
||
| ``` | ||
| sum by (k8s_cluster_name, http_status_code) ( | ||
| info( | ||
| rate(http_server_request_duration_seconds_count[2m]), | ||
| {k8s_cluster_name=~".+"} | ||
| ) | ||
| ) | ||
|
Comment on lines
+111
to
+116
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you help me understand this example? I know the current rule for aggregating rates is Rate then sum, never sum then rate. Now, will this turn into Not sure if there is any specific reason to do
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Disclaimer: I just copied this example from Beorn's original document. I'll answer based on my knowledge of
As for whether you could do
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aknuds1 already explained nicely why you have to do the In current PromQL, the order before the The range selector is a construct that only works directly with a metric, i.e. Neither can you apply |
||
| ``` | ||
|
|
||
| ## Alternatives | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might have been discussed before, but I wonder if we are missing the simplest solution of query rewriting as a considered alternative. Given the example query on line 119, can we rewrite it into a join and let the engine execute it with the existing semantics of PromQL. Information needed for the join can either be implicitly deducted or specified by the user as arguments.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Assuming we were to use query rewriting, how would you deduce which info metric(s) to join against (when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can only see this working well for a static list of well known info metrics like..
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@MichaHoffmann if that is your concern with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, Sorry I didnt want to propose something differently, I was just concerning out loud. What I want to say is that the name label matcher for info metric probably should be requirement for this to be generally least surprising for the user.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @MichaHoffmann, I can note your wish about the name label matcher being a requirement, so a consensus can be reached.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should note that my original idea was indeed that there is generally a relatively low number of info metrics so that a default matching behavior of "all the info metrics known to the TSDB" is feasible, which is in fact one of the reasons that makes the "joining" easier. Note that with a "general metadata store", you would probably also end up with "give me all the metadata you know for this metric", which is (more or less) equivalent to "give me all the info metrics that have an identifying labels match with this metric". |
||
|
|
||
| ### Add metadata as prefixed labels | ||
|
|
||
| Instead of encoding metadata, e.g. OTel resource attributes, as info metric labels, add them directly as labels to corresponding metrics. | ||
|
|
||
| #### Pros | ||
|
|
||
| * Simplicity, removes need for joining with info metrics | ||
|
|
||
| #### Cons | ||
|
|
||
| * Metrics will have potentially far more labels than what's strictly necessary to identify them | ||
| * Temporary series churn when migrating existing metrics to this new scheme | ||
| * Increased series churn when metadata labels change | ||
| * More labels per metric increases CPU/memory usage | ||
|
|
||
| ### Make the `info` function require specifying the info metric(s) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, me thought was that the But we can still have this alternative here for public consideration (and to show that this thought was considered). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess it would be reasonable if this only works for info metrics that were present in the same scrape maybe?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should have that restriction. For metrics ingested via a scrape, the target labels will be identifying labels by default anyway, so the whole proposed semantics of the info function will implicitly limit info metric matching to the same scrape in most cases (but that's a behavior that needs to be configurable in a different way, e.g. if scraping KSM, one might want to not have the KSM target labels as identifying labels to enable
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@codesome @beorn7 the proposal suggests the ability to specify certain info metrics by matching on
Did you consider this? Maybe there could also be a configuration parameter to control this behaviour? |
||
|
|
||
| Instead of letting `info` by default join with all matching info metrics, have it require specifying the info metric name(s). | ||
|
|
||
| #### Pros | ||
|
|
||
| * The user won't be confused about data labels being included from info metrics they didn't expect | ||
|
|
||
| #### Cons | ||
|
|
||
| * The UX becomes more complex, as the user is required to specify which info metric(s) to join with | ||
|
|
||
| ## Action Plan | ||
|
|
||
| The tasks to do in order to migrate to the new idea. | ||
|
|
||
| * [ ] [Add experimental PromQL `info` function MVP](https://github.com/prometheus/prometheus/pull/14495) | ||
| * [ ] Extend `info` function MVP with the ability to support `info` metrics in general, with persistence of info metrics metadata | ||
Uh oh!
There was an error while loading. Please reload this page.