-
Notifications
You must be signed in to change notification settings - Fork 10.3k
promtest: load directive with mixed float/histogram series will silently discard all histogram samples before the first float sample #17025
Description
What did you do?
When writing a new test case to extend the tests for count_over_time and last_over_time, I added a mixed histogram/float series where some float samples arrive after some histogram samples.
TL/DR: See this test case
Consider this promtest, which I will assume is to be placed at path promql/promqltest/testdata/mix_histogram_over_time.test:
load 10s
data_histogram{type="mix_histfirst"} {{schema:0 sum:1 count:2}} {{schema:0 sum:2 count:3}} 0 1
eval instant at 1m count_over_time(data_histogram[2m])
expect no_info
expect no_warn
{type="mix_histfirst"} 4
clear
and run as
go test -v -run 'TestEvaluations/testdata/mix_histogram_over_time.test' ./promqlWhat did you expect to see?
I expected the histogram samples to be retained irrespective of their order relative to float samples in the same series, so count_over_time would return 4 for the above series as the test case asserts.
What did you see instead? Under which circumstances?
Surprisingly, the histogram samples in the series were completely ignored. count_over_time returns 2 for the series, as it counts only the float samples.
Not only that, but no warning or info message was reported by promtest (because the warning was at ingestion time in the load directive, not at runtime on the eval directive) so the failure occurred silently.
Running go test -v -run 'TestEvaluations/testdata/mix_histogram_over_time.test' ./promql will yield
=== RUN TestEvaluations
=== RUN TestEvaluations/testdata/mix_histogram_over_time.test
=== RUN TestEvaluations/testdata/mix_histogram_over_time.test/line_5/count_over_time(data_histogram[2m])
test.go:1321:
Error Trace: /home/craig/projects/EDB/UPM/upm-stack/prometheus/promql/promqltest/test.go:1321
Error: Received unexpected error:
error in eval count_over_time(data_histogram[2m]) (line 5): expected 4 for {type="mix_histfirst"} but got 2
Test: TestEvaluations/testdata/mix_histogram_over_time.test/line_5/count_over_time(data_histogram[2m])
--- FAIL: TestEvaluations (0.03s)
--- FAIL: TestEvaluations/testdata/mix_histogram_over_time.test (0.03s)
--- FAIL: TestEvaluations/testdata/mix_histogram_over_time.test/line_5/count_over_time(data_histogram[2m]) (0.00s)
FAIL
FAIL github.com/prometheus/prometheus/promql 0.045s
FAIL
despite there clearly being 4 samples in the series.
If the order of the series is changed to
data_histogram{type="mix_histfirst"} 0 1 {{schema:0 sum:1 count:2}} {{schema:0 sum:2 count:3}}
then the test will pass.
Details
In promtest, histogram samples are silently dropped in a load directive containing mixed float and histogram samples. The dropped samples are not detected by expect no_info or expect no_warn, resulting in confusing test run results.
This might also affect scrapes of real world endpoints if prometheus permits a client to emit multiple samples in one scrape so long as those series have client-defined timestamps; I haven't verified that. It seems like a very niche case though.
Analysis
Prometheus's appender discards histogram samples for series where there is a newer float sample than the histogram sample in the same append transaction because it appends all histogram samples after all float samples, causing histogram samples older than the oldest float sample to be discarded as out-of-order.
prometheus/tsdb/head_append.go
Lines 1497 to 1499 in 8f6a71e
| a.commitSamples(acc) | |
| a.commitHistograms(acc) | |
| a.commitFloatHistograms(acc) |
This is usually harmless, because it wouldn't make sense to mix histogram and float samples in a single scrape.
But it does affect the test suite, since github.com/prometheus/prometheus/promql/promqltest.(*loadCmd).append() appends all samples for a series in one transaction:
prometheus/promql/promqltest/test.go
Lines 1287 to 1295 in 8f6a71e
| app := t.storage.Appender(t.context) | |
| if err := cmd.append(app); err != nil { | |
| app.Rollback() | |
| return err | |
| } | |
| if err := app.Commit(); err != nil { | |
| return err | |
| } |
As a result, here:
> github.com/prometheus/prometheus/tsdb.(*headAppender).Commit() ./tsdb/head_append.go:1499 (PC: 0x11b453f)
Values returned:
1494: }
1495: }()
1496:
1497: a.commitSamples(acc)
1498: a.commitHistograms(acc)
=>1499: a.commitFloatHistograms(acc)
1500: a.commitMetadata()
1501: // Unmark all series as pending commit after all samples have been committed.
1502: a.unmarkCreatedSeriesAsPendingCommit()
1503:
1504: a.head.metrics.outOfOrderSamples.WithLabelValues(sampleMetricTypeFloat).Add(float64(acc.floatOOORejected))
if float samples have already been committed (line 1497) then any float histogram samples older than the oldest float sample committed will be discarded with out of order sample:
> github.com/prometheus/prometheus/tsdb.(*headAppender).commitFloatHistograms() ./tsdb/head_append.go:1349 (PC: 0x11b2aac)
1344: series = a.floatHistogramSeries[i]
1345: series.Lock()
1346:
1347: oooSample, _, err := series.appendableFloatHistogram(s.T, s.FH, a.headMaxt, a.minValidTime, a.oooTimeWindow)
1348: if err != nil {
=>1349: handleAppendableError(err, &acc.histogramsAppended, &acc.histoOOORejected, &acc.histoOOBRejected, &acc.histoTooOldRejected)
1350: }
1351:
1352: switch {
1353: case err != nil:
1354: // Do nothing here.
(dlv) p err
error(*errors.errorString) *{
s: "out of order sample",}
this results in calls to funcCountOverTime receiving a matrixVals argument that contains only the two float samples; the histogram samples are entirely missing from the sample.
➜ prometheus git:(test-count-over-time-histogram-first) ✗ dlv test ./promql -- -test.run 'TestEvaluations/testdata/mix_histogram_over_time.test'
(dlv) break funcCountOverTime
Breakpoint 1 set at 0xdec00e for github.com/prometheus/prometheus/promql.funcCountOverTime() ./promql/functions.go:764
(dlv) c
> [Breakpoint 1] github.com/prometheus/prometheus/promql.funcCountOverTime() ./promql/functions.go:764 (hits goroutine(87):1 total:1) (PC: 0xdec00e)
759: return sum/count + kahanC/count
760: }), nil
761: }
762:
763: // === count_over_time(Matrix parser.ValueTypeMatrix) (Vector, Notes) ===
=> 764: func funcCountOverTime(_ []Vector, matrixVals Matrix, _ parser.Expressions, enh *EvalNodeHelper) (Vector, annotations.Annotations) {
765: return aggrOverTime(matrixVals, enh, func(s Series) float64 {
766: return float64(len(s.Floats) + len(s.Histograms))
767: }), nil
768: }
769:
(dlv) p matrixVals
github.com/prometheus/prometheus/promql.Matrix len: 1, cap: 1, [
{
Metric: (*"github.com/prometheus/prometheus/model/labels.Labels")(0xc0002da6e0),
Floats: []github.com/prometheus/prometheus/promql.FPoint len: 2, cap: 16, [
(*"github.com/prometheus/prometheus/promql.FPoint")(0xc00000e900),
(*"github.com/prometheus/prometheus/promql.FPoint")(0xc00000e910),
],
Histograms: []github.com/prometheus/prometheus/promql.HPoint len: 0, cap: 0, nil,
DropName: false,},
]
(dlv) p matrixVals[0].Floats
[]github.com/prometheus/prometheus/promql.FPoint len: 2, cap: 16, [
{T: 20000, F: 0},
{T: 30000, F: 1},
]
(dlv) p matrixVals[0].Histograms
[]github.com/prometheus/prometheus/promql.HPoint len: 0, cap: 0, nil
This diagnosis may be confirmed by the following test case, which splits the load into two phases to force an intermediate commit:
# To work around https://github.com/prometheus/prometheus/issues/17025, load histogram samples
# alone as the first samples in the series
load 10s
data_histogram{type="mix_histfirst"} {{schema:0 sum:1 count:2}} {{schema:0 sum:2 count:3}}
# then append some float samples in a separate transaction. This must be a separate load directive
# to ensure the appender is committed, so the histogram samples are not falsely rejected as
# out-of-order vis-a-vis the float samples.
load 10s
data_histogram{type="mix_histfirst"} _ _ 0 1
eval instant at 1m count_over_time(data_histogram[2m])
expect no_warn
expect no_info
expect ordered
{type="mix_histfirst"} 4
clear
Fix?
I'm not sure how to sensibly fix this. The appender would probably need to check sample ordering for all samples in strict time order, independent of sample type. That would cost time and hardly seems worthwhile.
Perhaps promtest's loader needs to commit each sample individually, rather than appending the whole series at once. That will ensure the correct ordering, and will also make promtest more realistically reflect the real behaviour of a scrape. In reality it is not typical for a series to be populated with multiple values by a single scrape.
It'd also help if promtest's loader checked for out-of-order sample errors and other warnings from the appender (possibly with an optional directive like expect out_of_order to permit this where desired).
Even with go test -v, the appender does not log any out-of-order sample errors, making this harder to diagnose. It appears that the WBL is not enabled for the tests, so https://github.com/prometheus/prometheus/blob/main/tsdb/head_append.go?plain=1#L1517-L1523 will not be hit. Instead, the only way to detect this is probably to rely on the out of order counter prometheus_tsdb_out_of_order_samples_total, except that this does not appear to get emitted in test runs:
# alternately, if we got 2 samples in the above, then this query should report the out-of-order samples
# discarded during append
eval instant at 1m sum by (type) (increase(prometheus_tsdb_out_of_order_samples_total[2m]))
{type="float"} 0
{type="histogram"} 2
... presumably because there's nothing to capture these Prometheus self-metrics and ingest them into the test TSDB (nor does it really makes sense to do so).
Anyway, logging and exposing errors relating to out-of-order samples in the test output would help.
Aside
As an aside, this was made much harder to track down by promtest's silent addition of various @ modifier invariants and range-query versions for each query, which causes everything to run several times in a way that's not at all apparent from backtraces; it would've helped a lot if the invariant checks ran as part of a different function
System information
N/A, applies to platform independent test suite
Prometheus version
Prometheus configuration file
N/A, test caseAlertmanager version
N/A
Alertmanager configuration file
N/ALogs
N/A, test case