-
Notifications
You must be signed in to change notification settings - Fork 10.3k
TSDB: Provide an commit interface that does not write OOO samples #11730
Description
In two scenarions Prometheus will rely on the rejection of OOO samples:
When a rule is moved from one file to another and the configuration is reloaded. The new rule writes its metric but several evaluation interval later, the first rule writes a staleness marker. In the past, that second happen was rejected and the staleness marker was not written.
Lines 392 to 399 in 30b31ca
| // Wait for 2 intervals to give the opportunity to renamed rules | |
| // to insert new series in the tsdb. At this point if there is a | |
| // renamed rule, it should already be started. | |
| select { | |
| case <-g.managerDone: | |
| case <-time.After(2 * g.interval): | |
| g.cleanupStaleSeries(ctx, now) | |
| } |
The second scenario is when a target is stopped. We also wait a bit and try to insert a staleness marker that is rejected e.g. if the metrics are inserted with a target that has been restarted (e.g. because of a config change).
Lines 1409 to 1447 in 30b31ca
| // Wait for when the next scrape would have been, if the target was recreated | |
| // samples should have been ingested by now. | |
| select { | |
| case <-sl.parentCtx.Done(): | |
| return | |
| case <-ticker.C: | |
| } | |
| // Wait for an extra 10% of the interval, just to be safe. | |
| select { | |
| case <-sl.parentCtx.Done(): | |
| return | |
| case <-time.After(interval / 10): | |
| } | |
| // Call sl.append again with an empty scrape to trigger stale markers. | |
| // If the target has since been recreated and scraped, the | |
| // stale markers will be out of order and ignored. | |
| // sl.context would have been cancelled, hence using sl.appenderCtx. | |
| app := sl.appender(sl.appenderCtx) | |
| var err error | |
| defer func() { | |
| if err != nil { | |
| app.Rollback() | |
| return | |
| } | |
| err = app.Commit() | |
| if err != nil { | |
| level.Warn(sl.l).Log("msg", "Stale commit failed", "err", err) | |
| } | |
| }() | |
| if _, _, _, err = sl.append(app, []byte{}, "", staleTime); err != nil { | |
| app.Rollback() | |
| app = sl.appender(sl.appenderCtx) | |
| level.Warn(sl.l).Log("msg", "Stale append failed", "err", err) | |
| } | |
| if err = sl.reportStale(app, staleTime); err != nil { | |
| level.Warn(sl.l).Log("msg", "Stale report failed", "err", err) | |
| } |
Therefore we should have an interface to that these samples should be rejected if they are OOO.