Exemplars store in-memory storage interface#6309
Exemplars store in-memory storage interface#6309shreyassrivatsan wants to merge 4 commits intoprometheus:mainfrom
Conversation
7dc1a4c to
697b354
Compare
|
Exemplars aren't going to the TSDB, so shouldn't affect the appender interface. |
|
[FYI: the build on this is broken due to an unrelated TypeScript error, looking into that] |
697b354 to
f80da7e
Compare
Is the plan to only store them in-memory in the scraper? It seems like it would be nice to be able to store them in a TSDB or a more persistent store.. |
|
Yes. The data volumes make storing on disk impractical. |
f80da7e to
6cc4185
Compare
@brian-brazil what are your thoughts on the updated approach in the PR to propagate exemplars? https://github.com/prometheus/prometheus/pull/6309/files#diff-3bbf5d3205ec4ba63d9de26d3e31c776R24 The implementation itself for now is just a placeholder.. |
6cc4185 to
be84787
Compare
|
They shouldn't hit the storage interface at all. |
|
@brian-brazil I think the new proposal from @shreyassrivatsan is just for in-memory collection. The existing storage/queryable interfaces are not changed/modified/touched. The words "Storage" and "Queryable" don't need to be a part of the interfaces proposed, it's just they are generic names picked for reuse due to their similarity to other parts of the code base. i.e. the proposal is some simple struct that receives exemplars, can cycle them out of memory based on a fixed size (or other policy), and can be used to query/retrieve them for a matching label set. This struct is then provided to What are your thoughts on that proposal? |
|
We should avoid clashing with the names of existing important interfaces. |
|
@shreyassrivatsan could you rebase this now that #6292 is merged? |
|
Which should also fix test errors. |
9d49ab9 to
c1a5aea
Compare
c1a5aea to
fc61fdf
Compare
Rebased and pushed just the interface changes.. |
cstyan
left a comment
There was a problem hiding this comment.
Looks fine but I'll have to sit and think again about the in memory storage and get back to you.
Initially the Add function looked off to me as well, as it takes labels but the Exemplar struct already has labels. Can someone clarify, or point me to docs as I haven't been able to find any, what the exact exemplar format is? How come an exemplar have it's own label set when it's associated with an existing timeseries?
pkg/exemplar/interface.go
Outdated
There was a problem hiding this comment.
Get(...) should return []Exemplar, multiple exemplars could match l if not many labels are passed.
There was a problem hiding this comment.
So the idea was it has to be a full equality match on labels. The labels here describe the exact metric including all the labels and the name, so there should only be one. Otherwise the lookup is not that useful. We need to be able to find the exemplar corresponding to the metric.
There was a problem hiding this comment.
I can't speak for everyone else but I think we should follow the same pattern as existing queries; return all matches for the labels provided. That doesn't exclude users from passing labels that would match only a single exemplar, but is better than error; query for exemplar would return more than one result in the case of the current functions IMO.
There was a problem hiding this comment.
Ok.. But in that case the return value will have to change.. It will additionally have to include information of the actual series matched as the exemplar is not useful without the series itself. So returns something like
type SeriesExemplar struct {
l labels.Labels
e Exemplar
}
and the method becomes Get(l labels.Labels, t int64) ([]SeriesExemplar, bool, error)
There was a problem hiding this comment.
I assume @shreyassrivatsan 's original idea was to keep things really as simple as possible. With exact matches, we can just have all the exemplars in a simple hash map. With label matching, we'll find ourselves once more in a situation where we have to implement a database (or do full scans of the exemplars in memory). Is that part of the plan?
Could we perhaps quickly think through the typical query pattern for exemplars? Will we commonly need label matching at all?
There was a problem hiding this comment.
I think a Grafana dashboard is going to be a pretty common usecase. If a dashboard panel has a single query for something like http_request_duration_seconds, without enough labels to narrow the query down to a single series, I don't think it's reasonable for Grafana to have to do the work to split all the results into individual label sets and make a query per unique series.
There's also the issue that many histogram queries aggregate away labels that we'd need to uniquely identify which exemplar to select.
@shreyassrivatsan can you elaborate on your use case and what M3 needs from the API, and hopefully we can get some more details from @davkal as well.
There was a problem hiding this comment.
Ah. So I did not mean that grafana makes a query and then retrieves the exemplars. I meant a prometheus query will first query the series, get the exemplars and then apply functions.. I am simplifying a lot here, but essentially /api/v1/query and /api/v1/query_range will do
func query(query string) series {
ls := querySeries() // returns []labels.Labels
if exemplarsRequested {
for _, l := range ls {
queryExemplar(l)
}
}
// apply functions
}
The resulting response will probably contain something maybe like the below
{
"status" : "success",
"data" : {
"resultType" : "vector",
"result" : [
{
"metric" : {
"__name__" : "up",
"job" : "prometheus",
"instance" : "localhost:9090"
},
"value": [ 1435781451.781, "1", {<exemplar>} ]
}
]
}
}
Eventually for full exemplar support , functions would have to be updated to handle exemplars. For example, aggregation functions can have a pick one policy.
There was a problem hiding this comment.
I think this approach makes a lot of sense. However, it requires to intertwine the existing API with exemplar support. My impression from the Prometheus side of things was that the plan is to implement something experimental that is a completely separate API so that consumers of that API can play with it and prove or disprove its usefulness. We had a few chats now with @juliusv and @davkal, wherein we realized that it is actually very hard for a visualization frontend like Grafana to craft dedicated queries for exemplars (for example in case the user clicks on a peak of a latency graph or on a tile of a histogram heatmap).
Returning exemplars alongside normal queries, if so requested, is solving that problem, but it's quite invasive (as the exemplar support has to be piped through the whole query engine, including smart decisions what to pick for aggregations). If we went down that alley, the exact-match method as suggested here would indeed make sense as illustrated by your pseudo code. (We might want to use a time range and not just a single timestamp in the signature then.)
While I like the idea, I'm pretty sure amending everything in the query engine is not feasible for an experimental feature. I'm still thinking about ways how to do something for the MVP that leaves the current query API and engine as is…
There was a problem hiding this comment.
Ok, that makes sense. And yes, this is going more towards what will be the right solution if we decide to pipe this through the query engine. I still do think the exact timestamp makes for a faster implementation as we will have access to that as a part of the results for a time series query.
For an experimental implementation using a time range and partial label match is probably a better approach, but implementing the backing store will be some more work. I made some changes to the interface. Get still does exact matches for a given timestamp. Query does partial label matches, takes a time range and returns a list of SeriesExemplars.
There was a problem hiding this comment.
My idea for how the (MVP) query flow and UI in Grafana could look like:
The Grafana user creating a dashboard would be required to pair each graphing query with an additional "query", which would be required to just be a PromQL label selector without any aggregations or functions, e.g. request_latency_bucket{code=~"2..",job="foo"}. The exemplar API would take exactly that query string (i.e. a naked label matcher in PromQL syntax) and a time range to return exemplars, possibly sampled to avoid too many return values. Grafana could then use the provided string to query for exemplars upon a click event (could include the le label when in a histogram heatmap, but would also work without it when clicking on a latency graph). Further server-side filtering by exemplar or specific downsampling value would not be in the MVP but could be done in the browser for now.
A selection of exemplars with links to logs or traces could then be shown in a pop-up, or exemplars could show up as clickable "stars" in a latency graph.
@davkal does that make sense?
It's for things a trace id for the exemplar. |
4613a7f to
4c0e4fe
Compare
pkg/exemplar/interface.go
Outdated
There was a problem hiding this comment.
Perhaps this should be a struct then?
type SeriesExemplar struct{
Labels labels.Labels
TsExemplars []TsExemplar
}In that way, the consumer can actually see what labels matched the labels in the query.
pkg/exemplar/interface.go
Outdated
There was a problem hiding this comment.
To really implement a generic query API, we needed regexp matchers and negative matchers, too.
That's where I'm thinking we should perhaps stay with the Get only approach, just that it takes a time range, not just one timestamp, and returns a time series of matchers for that one exact label set. The implementation of the externally visible API would need to hit the normal storage first to resolve the label matchers into concrete series first. I think that's fine for an MVP.
There was a problem hiding this comment.
Couldn't we do this via Query(start, end int64, matchers ...*labels.Matcher)?
There was a problem hiding this comment.
Sure, but then we are really talking about implementing a whole query engine.
By doing this via a call to the existing query engine, we would waste a bit of resources (as we only want to get the matching label sets and would never access the actual time series data), but it would be much simpler.
There was a problem hiding this comment.
Also, we didn't need what I commented above (the separate struct to show in the answer which exact label set is attached to each returned series of exemplars).
There was a problem hiding this comment.
One more round of changes. I got rid of Query for the time being as query in a way indicates partial matches. Added a GetRange which takes a range and returns a list of exemplars based on exact label matches.
Signed-off-by: Shreyas Srivatsan <[email protected]>
Signed-off-by: Shreyas Srivatsan <[email protected]>
Signed-off-by: Shreyas Srivatsan <[email protected]>
Signed-off-by: Shreyas Srivatsan <[email protected]>
44365c8 to
e4cec7e
Compare
|
Based on a couple of chats some people involved here had, I have to conclude that there are quite different opinions how this interface should look like. I was aiming for "as simple as possible, to be used as the ultimate underlying storage primitive", but one might as well make the argument that the storage interface should be rich in features and then the implementation might decide to re-use existing indexing and matching functionality or run their own. It might just be wrong to start with the design of the internal interface. Perhaps it's better to get a document together for the big picture how everything plays together (exposition → ingestion → storage update on the one hand, and then Grafana panel → query for exemplars → storage retrieval). |
|
I'd personally start with the API that Grafana wants, and then build the simplest possible thing to provide that. #6309 (comment) sounds about right to me at first glance. For an MVP implementation I'd suggest having them tied to targets similarly to how we do metadata, and brute forcing the lookup from there. We won't really know what Grafana wants until we can get an API they can play with, so I'd go for throwing something together over lots of up-front design. |
|
@cstyan Where are we with this review? |
|
@brian-brazil The interface here doesn't really prevent us from implementing the query API that was discussed with the Grafana team, but it makes the implementation of the API not as nice. I think it also limits options for those building long term storage ecosystem projects that may intend to store exemplars "forever". I think what I've implemented in #6635 makes more sense given the end goals of how we see exemplars being useful to a system like Grafana dashboards. |
|
Superseded by #6635 |
Initial pass at passing exemplars through the scrape logic.
This is one option we could take by changing the
AppenderinterfaceAdd,AddFastmethods. The current draft shows the changes involved at a high level. Still need to deal with optimally handling the no exemplar case.An alternate approach we could take is to not change the current Appender interface, but add a new interface
AppenderWithExemplarsand then upcast the appender provided..appender, ok := Appender.(AppenderWithExemplar)and decide whether to handle exemplars based on what type of appender has been configure..There are few cases to take care of here