Skip to content

TSDB: Provide an commit interface that does not write OOO samples #11730

@roidelapluie

Description

@roidelapluie

In two scenarions Prometheus will rely on the rejection of OOO samples:

When a rule is moved from one file to another and the configuration is reloaded. The new rule writes its metric but several evaluation interval later, the first rule writes a staleness marker. In the past, that second happen was rejected and the staleness marker was not written.

prometheus/rules/manager.go

Lines 392 to 399 in 30b31ca

// Wait for 2 intervals to give the opportunity to renamed rules
// to insert new series in the tsdb. At this point if there is a
// renamed rule, it should already be started.
select {
case <-g.managerDone:
case <-time.After(2 * g.interval):
g.cleanupStaleSeries(ctx, now)
}

The second scenario is when a target is stopped. We also wait a bit and try to insert a staleness marker that is rejected e.g. if the metrics are inserted with a target that has been restarted (e.g. because of a config change).

prometheus/scrape/scrape.go

Lines 1409 to 1447 in 30b31ca

// Wait for when the next scrape would have been, if the target was recreated
// samples should have been ingested by now.
select {
case <-sl.parentCtx.Done():
return
case <-ticker.C:
}
// Wait for an extra 10% of the interval, just to be safe.
select {
case <-sl.parentCtx.Done():
return
case <-time.After(interval / 10):
}
// Call sl.append again with an empty scrape to trigger stale markers.
// If the target has since been recreated and scraped, the
// stale markers will be out of order and ignored.
// sl.context would have been cancelled, hence using sl.appenderCtx.
app := sl.appender(sl.appenderCtx)
var err error
defer func() {
if err != nil {
app.Rollback()
return
}
err = app.Commit()
if err != nil {
level.Warn(sl.l).Log("msg", "Stale commit failed", "err", err)
}
}()
if _, _, _, err = sl.append(app, []byte{}, "", staleTime); err != nil {
app.Rollback()
app = sl.appender(sl.appenderCtx)
level.Warn(sl.l).Log("msg", "Stale append failed", "err", err)
}
if err = sl.reportStale(app, staleTime); err != nil {
level.Warn(sl.l).Log("msg", "Stale report failed", "err", err)
}

Therefore we should have an interface to that these samples should be rejected if they are OOO.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions