-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Native Summaries #16949
Description
Proposal
This requires a bigger proposal, MVP and motivation, but I wanted to officially start early discussions and potential work here.
The idea is to introduce the native summaries -- the replacement for "classic summaries" we have now that is build from multiple counter series. Native summaries would be contained as a single series that's more efficient and transactional.
PromQL could be adapted in similar fashion to native histogram PromQL syntax for consistency (also reusing the https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_count-and-histogram_sum functions). However due to lower, mostly historic use of summaries in the ecosystem (see Considerations), perhaps it would be easier and sufficient to only emulate classic view while storing native summaries under the hood (similar to #16948 idea).
Motivation
Native histograms were created for many reasons, but one of them was storage efficiency and transactionality. With native histogram representation, you don't have potentially 30 series for one histogram, but one which makes for incredible benefits around indexing and storage, despite more beefier sample size (float vs struct).
As per transactionality, having one series and not 30 also gives guarantee that all parts of the Prometheus (and ecosystem) will see each part of histogram (buckets, sum, count) exactly at once. This is especially important for distributed systems with remote write and sharding, as well as querying vs scraping drift etc (when no isolation is possible e.g. on Thanos).
The same efficiency and transactionality problem exists for classic summaries as well, solved by adding the native summaries.
Considerations
- Summaries are still used widely, although less than other types. However we tend to NOT recommend them in practice, especially with the new improved native histograms. However they do still exists and are not deprecated (and there is not plan for that as of now).
- Native histograms brought sparseness and exponential bucketing; there is no such dimension in native summaries planned -- which makes them easier to implement.