Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions pkg/exemplar/interface.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// Copyright 2019 The Prometheus Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package exemplar

import (
"github.com/prometheus/prometheus/pkg/labels"
)

// Exemplars is an exemplar storage backend.
type Exemplars interface {
// Get is a fast lookup to find an exemplar with an exact match on labels
// and timestamp provided (so this can be O(1) lookup).
Get(l labels.Labels, t int64) (Exemplar, bool, error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get(...) should return []Exemplar, multiple exemplars could match l if not many labels are passed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea was it has to be a full equality match on labels. The labels here describe the exact metric including all the labels and the name, so there should only be one. Otherwise the lookup is not that useful. We need to be able to find the exemplar corresponding to the metric.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak for everyone else but I think we should follow the same pattern as existing queries; return all matches for the labels provided. That doesn't exclude users from passing labels that would match only a single exemplar, but is better than error; query for exemplar would return more than one result in the case of the current functions IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.. But in that case the return value will have to change.. It will additionally have to include information of the actual series matched as the exemplar is not useful without the series itself. So returns something like

type SeriesExemplar struct {
    l labels.Labels
    e Exemplar
}

and the method becomes Get(l labels.Labels, t int64) ([]SeriesExemplar, bool, error)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume @shreyassrivatsan 's original idea was to keep things really as simple as possible. With exact matches, we can just have all the exemplars in a simple hash map. With label matching, we'll find ourselves once more in a situation where we have to implement a database (or do full scans of the exemplars in memory). Is that part of the plan?

Could we perhaps quickly think through the typical query pattern for exemplars? Will we commonly need label matching at all?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a Grafana dashboard is going to be a pretty common usecase. If a dashboard panel has a single query for something like http_request_duration_seconds, without enough labels to narrow the query down to a single series, I don't think it's reasonable for Grafana to have to do the work to split all the results into individual label sets and make a query per unique series.

There's also the issue that many histogram queries aggregate away labels that we'd need to uniquely identify which exemplar to select.

@shreyassrivatsan can you elaborate on your use case and what M3 needs from the API, and hopefully we can get some more details from @davkal as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. So I did not mean that grafana makes a query and then retrieves the exemplars. I meant a prometheus query will first query the series, get the exemplars and then apply functions.. I am simplifying a lot here, but essentially /api/v1/query and /api/v1/query_range will do

func query(query string) series {
  ls := querySeries() // returns []labels.Labels
  if exemplarsRequested {
    for _, l := range ls {
      queryExemplar(l)
    }
  }
  // apply functions
}

The resulting response will probably contain something maybe like the below

{
   "status" : "success",
   "data" : {
      "resultType" : "vector",
      "result" : [
         {
            "metric" : {
               "__name__" : "up",
               "job" : "prometheus",
               "instance" : "localhost:9090"
            },
            "value": [ 1435781451.781, "1", {<exemplar>} ]
         }
      ]
  }
}

Eventually for full exemplar support , functions would have to be updated to handle exemplars. For example, aggregation functions can have a pick one policy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach makes a lot of sense. However, it requires to intertwine the existing API with exemplar support. My impression from the Prometheus side of things was that the plan is to implement something experimental that is a completely separate API so that consumers of that API can play with it and prove or disprove its usefulness. We had a few chats now with @juliusv and @davkal, wherein we realized that it is actually very hard for a visualization frontend like Grafana to craft dedicated queries for exemplars (for example in case the user clicks on a peak of a latency graph or on a tile of a histogram heatmap).

Returning exemplars alongside normal queries, if so requested, is solving that problem, but it's quite invasive (as the exemplar support has to be piped through the whole query engine, including smart decisions what to pick for aggregations). If we went down that alley, the exact-match method as suggested here would indeed make sense as illustrated by your pseudo code. (We might want to use a time range and not just a single timestamp in the signature then.)

While I like the idea, I'm pretty sure amending everything in the query engine is not feasible for an experimental feature. I'm still thinking about ways how to do something for the MVP that leaves the current query API and engine as is…

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. And yes, this is going more towards what will be the right solution if we decide to pipe this through the query engine. I still do think the exact timestamp makes for a faster implementation as we will have access to that as a part of the results for a time series query.

For an experimental implementation using a time range and partial label match is probably a better approach, but implementing the backing store will be some more work. I made some changes to the interface. Get still does exact matches for a given timestamp. Query does partial label matches, takes a time range and returns a list of SeriesExemplars.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea for how the (MVP) query flow and UI in Grafana could look like:

The Grafana user creating a dashboard would be required to pair each graphing query with an additional "query", which would be required to just be a PromQL label selector without any aggregations or functions, e.g. request_latency_bucket{code=~"2..",job="foo"}. The exemplar API would take exactly that query string (i.e. a naked label matcher in PromQL syntax) and a time range to return exemplars, possibly sampled to avoid too many return values. Grafana could then use the provided string to query for exemplars upon a click event (could include the le label when in a histogram heatmap, but would also work without it when clicking on a latency graph). Further server-side filtering by exemplar or specific downsampling value would not be in the MVP but could be done in the browser for now.

A selection of exemplars with links to logs or traces could then be shown in a pop-up, or exemplars could show up as clickable "stars" in a latency graph.

@davkal does that make sense?


// GetRange is a query to find exemplars with exact matches on labels
// and timestamp ranges provided (this is a range query). Query returns
// a list of Exemplar for the label match.
GetRange(l labels.Labels, start, end int64) ([]Exemplar, error)

// Add an exemplar, it will be retained based on configuration at
// the construction of the exmplar storage.
Add(l labels.Labels, t int64, e Exemplar) error

// Close the storage.
Close() error
}