Skip to content

promtest: load directive with mixed float/histogram series will silently discard all histogram samples before the first float sample #17025

@ringerc

Description

@ringerc

What did you do?

When writing a new test case to extend the tests for count_over_time and last_over_time, I added a mixed histogram/float series where some float samples arrive after some histogram samples.

TL/DR: See this test case

Consider this promtest, which I will assume is to be placed at path promql/promqltest/testdata/mix_histogram_over_time.test:

load 10s
    data_histogram{type="mix_histfirst"} {{schema:0 sum:1 count:2}} {{schema:0 sum:2 count:3}} 0 1

eval instant at 1m count_over_time(data_histogram[2m])
    expect no_info
    expect no_warn
    {type="mix_histfirst"} 4

clear

and run as

go test -v -run 'TestEvaluations/testdata/mix_histogram_over_time.test' ./promql

What did you expect to see?

I expected the histogram samples to be retained irrespective of their order relative to float samples in the same series, so count_over_time would return 4 for the above series as the test case asserts.

What did you see instead? Under which circumstances?

Surprisingly, the histogram samples in the series were completely ignored. count_over_time returns 2 for the series, as it counts only the float samples.

Not only that, but no warning or info message was reported by promtest (because the warning was at ingestion time in the load directive, not at runtime on the eval directive) so the failure occurred silently.

Running go test -v -run 'TestEvaluations/testdata/mix_histogram_over_time.test' ./promql will yield

=== RUN   TestEvaluations
=== RUN   TestEvaluations/testdata/mix_histogram_over_time.test
=== RUN   TestEvaluations/testdata/mix_histogram_over_time.test/line_5/count_over_time(data_histogram[2m])
    test.go:1321: 
        	Error Trace:	/home/craig/projects/EDB/UPM/upm-stack/prometheus/promql/promqltest/test.go:1321
        	Error:      	Received unexpected error:
        	            	error in eval count_over_time(data_histogram[2m]) (line 5): expected 4 for {type="mix_histfirst"} but got 2
        	Test:       	TestEvaluations/testdata/mix_histogram_over_time.test/line_5/count_over_time(data_histogram[2m])
--- FAIL: TestEvaluations (0.03s)
    --- FAIL: TestEvaluations/testdata/mix_histogram_over_time.test (0.03s)
        --- FAIL: TestEvaluations/testdata/mix_histogram_over_time.test/line_5/count_over_time(data_histogram[2m]) (0.00s)
FAIL
FAIL	github.com/prometheus/prometheus/promql	0.045s
FAIL

despite there clearly being 4 samples in the series.

If the order of the series is changed to

	data_histogram{type="mix_histfirst"} 0 1 {{schema:0 sum:1 count:2}} {{schema:0 sum:2 count:3}}

then the test will pass.

Details

In promtest, histogram samples are silently dropped in a load directive containing mixed float and histogram samples. The dropped samples are not detected by expect no_info or expect no_warn, resulting in confusing test run results.

This might also affect scrapes of real world endpoints if prometheus permits a client to emit multiple samples in one scrape so long as those series have client-defined timestamps; I haven't verified that. It seems like a very niche case though.

Analysis

Prometheus's appender discards histogram samples for series where there is a newer float sample than the histogram sample in the same append transaction because it appends all histogram samples after all float samples, causing histogram samples older than the oldest float sample to be discarded as out-of-order.

prometheus/tsdb/head_append.go

Lines 1497 to 1499 in 8f6a71e

a.commitSamples(acc)
a.commitHistograms(acc)
a.commitFloatHistograms(acc)

This is usually harmless, because it wouldn't make sense to mix histogram and float samples in a single scrape.

But it does affect the test suite, since github.com/prometheus/prometheus/promql/promqltest.(*loadCmd).append() appends all samples for a series in one transaction:

app := t.storage.Appender(t.context)
if err := cmd.append(app); err != nil {
app.Rollback()
return err
}
if err := app.Commit(); err != nil {
return err
}

As a result, here:

> github.com/prometheus/prometheus/tsdb.(*headAppender).Commit() ./tsdb/head_append.go:1499 (PC: 0x11b453f)
Values returned:

  1494:			}
  1495:		}()
  1496:	
  1497:		a.commitSamples(acc)
  1498:		a.commitHistograms(acc)
=>1499:		a.commitFloatHistograms(acc)
  1500:		a.commitMetadata()
  1501:		// Unmark all series as pending commit after all samples have been committed.
  1502:		a.unmarkCreatedSeriesAsPendingCommit()
  1503:	
  1504:		a.head.metrics.outOfOrderSamples.WithLabelValues(sampleMetricTypeFloat).Add(float64(acc.floatOOORejected))

if float samples have already been committed (line 1497) then any float histogram samples older than the oldest float sample committed will be discarded with out of order sample:

> github.com/prometheus/prometheus/tsdb.(*headAppender).commitFloatHistograms() ./tsdb/head_append.go:1349 (PC: 0x11b2aac)
  1344:			series = a.floatHistogramSeries[i]
  1345:			series.Lock()
  1346:	
  1347:			oooSample, _, err := series.appendableFloatHistogram(s.T, s.FH, a.headMaxt, a.minValidTime, a.oooTimeWindow)
  1348:			if err != nil {
=>1349:				handleAppendableError(err, &acc.histogramsAppended, &acc.histoOOORejected, &acc.histoOOBRejected, &acc.histoTooOldRejected)
  1350:			}
  1351:	
  1352:			switch {
  1353:			case err != nil:
  1354:				// Do nothing here.
(dlv) p err
error(*errors.errorString) *{
	s: "out of order sample",}

this results in calls to funcCountOverTime receiving a matrixVals argument that contains only the two float samples; the histogram samples are entirely missing from the sample.

➜  prometheus git:(test-count-over-time-histogram-first) ✗ dlv test ./promql -- -test.run 'TestEvaluations/testdata/mix_histogram_over_time.test'
(dlv) break funcCountOverTime
Breakpoint 1 set at 0xdec00e for github.com/prometheus/prometheus/promql.funcCountOverTime() ./promql/functions.go:764
(dlv) c
> [Breakpoint 1] github.com/prometheus/prometheus/promql.funcCountOverTime() ./promql/functions.go:764 (hits goroutine(87):1 total:1) (PC: 0xdec00e)
   759:			return sum/count + kahanC/count
   760:		}), nil
   761:	}
   762:	
   763:	// === count_over_time(Matrix parser.ValueTypeMatrix) (Vector, Notes)  ===
=> 764:	func funcCountOverTime(_ []Vector, matrixVals Matrix, _ parser.Expressions, enh *EvalNodeHelper) (Vector, annotations.Annotations) {
   765:		return aggrOverTime(matrixVals, enh, func(s Series) float64 {
   766:			return float64(len(s.Floats) + len(s.Histograms))
   767:		}), nil
   768:	}
   769:	
(dlv) p matrixVals
github.com/prometheus/prometheus/promql.Matrix len: 1, cap: 1, [
	{
		Metric: (*"github.com/prometheus/prometheus/model/labels.Labels")(0xc0002da6e0),
		Floats: []github.com/prometheus/prometheus/promql.FPoint len: 2, cap: 16, [
			(*"github.com/prometheus/prometheus/promql.FPoint")(0xc00000e900),
			(*"github.com/prometheus/prometheus/promql.FPoint")(0xc00000e910),
		],
		Histograms: []github.com/prometheus/prometheus/promql.HPoint len: 0, cap: 0, nil,
		DropName: false,},
]
(dlv) p matrixVals[0].Floats
[]github.com/prometheus/prometheus/promql.FPoint len: 2, cap: 16, [
	{T: 20000, F: 0},
	{T: 30000, F: 1},
]
(dlv) p matrixVals[0].Histograms
[]github.com/prometheus/prometheus/promql.HPoint len: 0, cap: 0, nil

This diagnosis may be confirmed by the following test case, which splits the load into two phases to force an intermediate commit:

# To work around https://github.com/prometheus/prometheus/issues/17025, load histogram samples
# alone as the first samples in the series
load 10s
	data_histogram{type="mix_histfirst"} {{schema:0 sum:1 count:2}} {{schema:0 sum:2 count:3}}

# then append some float samples in a separate transaction. This must be a separate load directive
# to ensure the appender is committed, so the histogram samples are not falsely rejected as
# out-of-order vis-a-vis the float samples.
load 10s
	data_histogram{type="mix_histfirst"} _ _ 0 1

eval instant at 1m count_over_time(data_histogram[2m])
	expect no_warn
	expect no_info
	expect ordered
	{type="mix_histfirst"} 4

clear

Fix?

I'm not sure how to sensibly fix this. The appender would probably need to check sample ordering for all samples in strict time order, independent of sample type. That would cost time and hardly seems worthwhile.

Perhaps promtest's loader needs to commit each sample individually, rather than appending the whole series at once. That will ensure the correct ordering, and will also make promtest more realistically reflect the real behaviour of a scrape. In reality it is not typical for a series to be populated with multiple values by a single scrape.

It'd also help if promtest's loader checked for out-of-order sample errors and other warnings from the appender (possibly with an optional directive like expect out_of_order to permit this where desired).

Even with go test -v, the appender does not log any out-of-order sample errors, making this harder to diagnose. It appears that the WBL is not enabled for the tests, so https://github.com/prometheus/prometheus/blob/main/tsdb/head_append.go?plain=1#L1517-L1523 will not be hit. Instead, the only way to detect this is probably to rely on the out of order counter prometheus_tsdb_out_of_order_samples_total, except that this does not appear to get emitted in test runs:

# alternately, if we got 2 samples in the above, then this query should report the out-of-order samples
# discarded during append
eval instant at 1m sum by (type) (increase(prometheus_tsdb_out_of_order_samples_total[2m]))
	{type="float"} 0
	{type="histogram"} 2

... presumably because there's nothing to capture these Prometheus self-metrics and ingest them into the test TSDB (nor does it really makes sense to do so).

Anyway, logging and exposing errors relating to out-of-order samples in the test output would help.

Aside

As an aside, this was made much harder to track down by promtest's silent addition of various @ modifier invariants and range-query versions for each query, which causes everything to run several times in a way that's not at all apparent from backtraces; it would've helped a lot if the invariant checks ran as part of a different function

System information

N/A, applies to platform independent test suite

Prometheus version


Prometheus configuration file

N/A, test case

Alertmanager version

N/A

Alertmanager configuration file

N/A

Logs

N/A, test case

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions