-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Start Timestamp: Opt-in ST auto-generation globally/per scrape job. #14763
Description
Proposal
Start Timestamp (ST) are relatively new concept. They generally work with (at least):
- client_golang + Prometheus proto exposition and OM text (opt-in, due to _created line conflicts)
- Java and Python OM text too optionally.
- ingesting OTLP.
- ingesting RW2
Nevertheless it will take time for all clients to adopt STs and pass those along for Prometheus. For some clients e.g. old solutions using Prometheus Text exposition or complex exporters (e.g. cadvisor) where you need to think carefully about ST correctness, those might never adopted or on even longer timeframes.
Proposal
This proposal allows "auto-generating" STs for all counter-semantics metrics, which unblocks accurate reset counting and ST collection (e.g. with zero sample injection feature or PROM-60).
This is generally not as easy as it seems. E.g. common attempts like faking ST e.g. 1ms before scraped timestamp can be damaging and likely create in-accurate results e.g. counter over-adding (assuming resets when there was none). Same if we would use the timestamp of when the scrape loop/service time sees the target for the first time. Taking process start time is not a too bad solution, but it's not always present information (apps have to present that), it's not cheap to find (required worse case full parsing of scrape format) and it's does not work for exporters/counters that reset mid-process.
Fortunately one solid algorithm got released years back and is actively used (at least) at Google cloud in the GMP Prometheus fork and opentelemetry-collector-contrib/googlemanagedprometheusexporter (actual code for this is here).
Algorithm
- If counter sample has ST from the instrumentation use that.
- For the first counter sample, buffer it's value and timestamp, but not append (let's call those
first.valueandfirst.ts) - For the next counter sample for the same series (
next.valueandnext.ts):
a. Ifnext.value < first.valuethis means reset happens in between. Append(next.value, next.ts)withcreated_timestamp = next.ts-1msas we don't know the exact time, but we know it's betweenfirst.tsandnext.ts.
b. otherwise append(next.value - first.value, next.ts)withcreated_timestamp = first.ts
Consequences
- Valid StartTime from the perspective of appended counter-like metric.
- Rate/increases/reset function gives accurate (mathematically) results.
Trade-offs:
- (!) Correct absolute counter value is lost on the collector/scraper/Prometheus restart (unless we cache things) or when target is down for longer time.
- First observed sample is lost too in the above scenarios (e.g. if target is scraped only once, you don't get any sample).
- We have to buffer a few floats/ints (3?) for every counter-like metric for the duration of its life (some overhead).
Part of: #14217