[nhcb branch] Add basic unit tests for native histograms with custom buckets converted from classic histograms #13794

zenador · 2024-03-19T16:07:30Z

Addresses #13508

~~Currently histogram_quantile is failing on the newly converted histograms, but the other basic histogram functions work as expected.~~ (fixed now)

Needed to update the behaviour of histogram_stddev and histogram_stdvar to handle negative bounds properly (previously it was already bugged even for exponential histograms with negative observations). (now in #13824)

promql/test.go

promql/testdata/histograms.test

krajorama · 2024-03-20T11:53:41Z

promql/test.go

-						vec = append(vec, Sample{Metric: series.Metric, T: point.T, H: point.H})
-						break
-					}
+			if strings.Contains(iq.expr, "_bucket") {


I'd rather we didn't do this, we should make the copy by hand to be explicit. Unit tests should be very clear, explicit and define expectations.

Can we discuss this more at the next sync? We are already implicitly loading all these classic histograms as nhcb, so it feels inconsistent to load them implicitly but run the testing explicitly as separate extra tests. I thought the idea of doing this was to automatically extend test coverage for classic histograms to the converted nhcb to ensure that as far as our tests are concerned, the conversion introduces no regressions and both classic/nhcb should return the exact same results. If we make the testing explicit, we will have to duplicate a lot of the unit test code, and it will be very easy to miss test cases, e.g. if someone adds new test cases for classic histograms but overlooks adding the nhcb version, then there may be cases where they yield different results but we won't know about it. Doing it automatically will ensure we don't miss out on any tests that we have.

I agree though that it may come as a surprise, so if we decide to keep this, we should clearly document it somewhere. Alternatively, we could introduce a keyword in the tests to mark that this test should be repeated on the nhcb version, and add it to all applicable tests, so it will be explicit and people adding new tests will be less likely to miss it.

There is a con of doing this implicitly though, as there are rare cases where the classic histograms and nhcb yield different results, as seen in the tests removed in the latest commit:

~~The More realistic with rates. segment: this is a bug and I'll fix it as soon as I figure out what's going on~~ (fixed, will elaborate in follow up comment)

nonmonotonic_bucket series can't be converted into a valid nhcb at all, so we will not be able to test on both (and this non-monotonic problem should be gone with nhcb). The keyword approach I mentioned above should be ideal for this. Also I'm not sure why we test this, naturally we can't guarantee correctness if there are missing scrapes and inconsistent buckets.

mixed_bucket series in the test is currently giving the wrong result. To check this, remove the series with the redefined buckets in the loading segment and run it. It will return 0.2 instead of 0.15, and 0.2 is what the nhcb version returns. This bug should probably be fixed separately (completely out of scope for nhcb), though I'm not sure if it's worth supporting this edge case - we could just say we don't guarantee correctness if there are duplicate buckets, unless there's a very good reason why we need to support it, e.g. it happens often for reasons that are hard to fix

Technically we have 5 choices as far as I see:

explicit tests (easy, but error prone)

assert that all tests have both versions

able to mark tests to include in checks with nhcb

able to mark tests to exclude in checks with nhcb

do what we do now and fix all cases (not possible)

Do we have access to all eval statements? We could avoid having to mark exceptions by just defining the nhcb version explicitly.

Also technically you'd have to remove le label from the nhcb expressions as that's the official conversion to native query.

The 2nd one seems a bit complicated to implement. 3rd is slightly more troublesome to update all test cases than the 4th but it is more explicit, I don't really mind 3rd or 4th.

We could avoid having to mark exceptions by just defining the nhcb version explicitly.

Sorry, what do you mean by this?

Also technically you'd have to remove le label from the nhcb expressions as that's the official conversion to native query.

Yes it works well even without doing it, but I've updated the last commit to remove it anyway.

# More realistic with rates. eval instant at 50m histogram_quantile(0.2, rate(testhistogram_bucket[5m])) {start="positive"} 0.048 {start="negative"} -0.2 eval instant at 50m histogram_quantile(0.5, rate(testhistogram_bucket[5m])) {start="positive"} 0.15 {start="negative"} -0.15 eval instant at 50m histogram_quantile(0.8, rate(testhistogram_bucket[5m])) {start="positive"} 0.72 {start="negative"} 0.3

The above (originally included) tests were initially failing only for the nhcb version, but the bug is now fixed. Note that even without the fix, the following tests were working for both classic and nhcb:

# More realistic with rates. eval instant at 50m histogram_quantile(0.2, rate(testhistogram_bucket{start="positive"}[5m])) {start="positive"} 0.048 eval instant at 50m histogram_quantile(0.2, rate(testhistogram_bucket{start="negative"}[5m])) {start="negative"} -0.2 eval instant at 50m histogram_quantile(0.5, rate(testhistogram_bucket{start="positive"}[5m])) {start="positive"} 0.15 eval instant at 50m histogram_quantile(0.5, rate(testhistogram_bucket{start="negative"}[5m])) {start="negative"} -0.15 eval instant at 50m histogram_quantile(0.8, rate(testhistogram_bucket{start="positive"}[5m])) {start="positive"} 0.72 eval instant at 50m histogram_quantile(0.8, rate(testhistogram_bucket{start="negative"}[5m])) {start="negative"} 0.3

Meaning that when you mix the positive and negative series (that have different bucket bounds) together, something gets corrupted somewhere. This isn't a problem for the tests on other metrics as the different series have consistent bucket bounds, and it isn't a problem for the other tests on this metric as those have no rate (which might involve histograms being copied between series). This isn't a problem for classic histograms or the exponential native histograms, so the problem is with our new customValues field, which we shallow copied when we added it to the model instead of doing a deep copy. I added the fix commit to the earlier chunks PR because I'd rather fix this newly introduced bug before we merge nhcb to main. We can revisit these optimisations later on when nhcb implementation is more complete and we have more tests.

Evidence of the corruption (when printing the result of rate after it is passed into histogram_quantile):

the correct bounds with separate queries: s: {start="positive"} => {count:0.04, sum:0, [-Inf,0.1]:0.016666666666666666, (0.1,0.2]:0.006666666666666667, (0.2,1]:0.013333333333333334, (1,+Inf]:0.0033333333333333335} @[3000000] s: {start="negative"} => {count:0.01, sum:0, [-Inf,-0.2]:0.0033333333333333335, (-0.2,-0.1]:0.0033333333333333335, (0.3,+Inf]:0.0033333333333333335} @[3000000] the wrongly duplicated bounds with combined query: s: {start="negative"} => {count:0.01, sum:0, [-Inf,0.1]:0.0033333333333333335, (0.1,0.2]:0.0033333333333333335, (1,+Inf]:0.0033333333333333335} @[3000000] <- copied wrongly from other series s: {start="positive"} => {count:0.04, sum:0, [-Inf,0.1]:0.016666666666666666, (0.1,0.2]:0.006666666666666667, (0.2,1]:0.013333333333333334, (1,+Inf]:0.0033333333333333335} @[3000000] the wrongly duplicated bounds with combined query, alternate version (the order seems non-deterministic, so sometimes you get the earlier one and sometimes it's this one): s: {start="positive"} => {count:0.04, sum:0, [-Inf,-0.2]:0.016666666666666666, (-0.2,-0.1]:0.006666666666666667, (-0.1,0.3]:0.013333333333333334, (0.3,+Inf]:0.0033333333333333335} @[3000000] <- copied wrongly from other series s: {start="negative"} => {count:0.01, sum:0, [-Inf,-0.2]:0.0033333333333333335, (-0.2,-0.1]:0.0033333333333333335, (0.3,+Inf]:0.0033333333333333335} @[3000000]

Moved this part to #13827

About the mixed_bucket test case: This needs to be studied carefully. I think the test is actually not very suitably because all the buckets are the same, so we are essentially exercising edge cases of the interpolation algorithm (and both 0.15 and 0.2 might both be mathematically correct answers). We should have a test that has different buckets defined in different formats, and then maybe even one where the same buckets are defined with different formats (like now) but they have different values. (I'm not even sure what the behavior should be then. This "float as a string" thing in classic histograms is really a big mess. Maybe it is supposed to sum them all up? That would even be relevant to how we convert classic histograms into NHCB.)

About the nonmonotonic_bucket test: This tests how PromQL deals with this case of invalid input. It's important to have those tests.

In both cases, I see two ways we could deal with them for NHCB:

We could state that we only convert classic histograms into NHCB if they don't have those shenanigans, i.e. they must not have duplicate buckets in different formats, and they must be monotonic, otherwise we'll just decline to convert them. I think this is a reasonable requirement as the conversion is meant to happen "close to the source" of an instrumented target, where those problem shouldn't occur.

We somehow emulate the work-arounds done for classic histograms during the conversion. For one, fixing non-monotonicity (but how to issue a warning then?). And then mirroring the behavior for differently formatted le labels, in particular if the values collide (for which I don't even know the intended behavior from the top of my head).

I would currently tend towards (1).

Thinking more about the test setup, I'm more and more inclined to make everything explicit.

For one, that means that loading a classic histogram needs to have some keyword to activate conversion into NHCB, mostly because there are cases where the test deliberately contains data that cannot be converted to NHCB (see above), unless we decide to tweak the conversion algorithm so that it can deal with those cases (which maybe we should not, see also above). (I think this is an agreement with @zenador's assessment so far.)

However, I also think we should give up on running the classic histogram tests automatically and implicitly with converted NHCB. Here are the reasons I see:

There are queries that deliberately create a different result, e.g. rate(histo_bucket) creates a classic histogram as output, while rate(histo) creates a native histogram. There are more subtle cases like the counter reset detection that is deliberately broken in classic histograms with negative observations in the game but works correctly after NHCB conversion. (I don't know if we have a test for that right now, but we could one day or maybe even should.)

Even for queries that are supposed to create the same output, the conversion isn't trivial. histogram_quantile(0.5, histo_bucket) is easy. histogram_quantile(0.5, sum by (le) (histo_bucket)) is a bit harder. You also have to take into account the also valid histogram_quantile(0.5, sum(histo_bucket) by (le)) (and there might be more labels in any order and you have to find the le label and remove it). Then there are queries involving foo_count and foo_sum, which needed to be converted to histogram_count(foo) and histogram_sum(foo). Etc. In short, I think implementing a correct test converter is a very deep rabbit hole.

And now add the problem that there are cases where the conversion to NHCB doesn't work by design (see above, where we concluded we need to mark the conversion explicitly anyay).

In summary, I think would need to invest a lot of effort to chase an ultimately impossible goal if we desire to automatically run all classic histogram tests with NHCB, too. In many cases, we need a human anyway to decide if the test should run with NHCB at all, and if yes how the query will look like.

maybe even one where the same buckets are defined with different formats (like now) but they have different values. (I'm not even sure what the behavior should be then.

Yeah, it seems like in that case we could arbitrarily select one and the behaviour would be non-deterministic, so perhaps it's better not to test that and just say we provide no guarantees in such edge cases.

I would currently tend towards (1).

I agree, it is better to only convert valid histograms. If they are problematic, better to flag it out earlier with an error so the user can look into fixing it, than to automatically correct it upon conversion based on assumptions that may be wrong.

Anyway, now both loading and evaluating with native histograms are opt-in, you need to use the keyword suffix _with_nhcb for both of those. So we don't have those suffixes for the nonmonotonic_bucket, and for mixed_bucket we load it, use the keyword for tests with the same results, and don't use the keyword for the failing test, and we now have a separate test for the different result of the nhcb version, so when this disparity gets looked into in future, the tests can be updated to indicate the change in behaviour.

The conversion in the eval query is very simple, and it works well for the current tests that we have, but for more complicated cases like the one you mentioned, it's best that we don't use the keyword and define a separate test instead. I agree that making the test converter completely correct would be more trouble than it's worth, so we'll just have a simple one for convenience to cover most of the simple cases, and fall back to manually defining it when necessary.

promql/quantile.go

…s with custom buckets from classic histogram series Signed-off-by: Jeanette Tan <[email protected]>

Signed-off-by: Jeanette Tan <[email protected]>

…ogram with custom bounds Signed-off-by: Jeanette Tan <[email protected]>

Signed-off-by: Jeanette Tan <[email protected]>

promql/test.go

…it keyword Signed-off-by: Jeanette Tan <[email protected]>

Signed-off-by: Jeanette Tan <[email protected]>

promql/test.go

krajorama · 2024-04-05T18:12:03Z

promql/test.go

+	for _, histWrapper := range histMap {
+		sort.Float64s(histWrapper.upperBounds)
+		upperBounds := make([]float64, 0, len(histWrapper.upperBounds))
+		prevLe := -372.4869 // assuming that the lowest bound is never this value


Suggested change

prevLe := -372.4869 // assuming that the lowest bound is never this value

prevLe := math.NaN()

I ended up making this -Inf instead, since that should also not be a valid boundary.

promql/test.go

Signed-off-by: Jeanette Tan <[email protected]>

promql/test.go

Signed-off-by: George Krajcsovits <[email protected]>

zenador requested review from jesusvazquez and roidelapluie as code owners March 19, 2024 16:07

zenador marked this pull request as draft March 19, 2024 16:07

krajorama self-requested a review March 20, 2024 11:52

krajorama reviewed Mar 20, 2024

View reviewed changes

zenador force-pushed the nhcb-tests branch 8 times, most recently from c19836b to 2e78f2b Compare March 22, 2024 12:25

zenador mentioned this pull request Mar 22, 2024

Add custom buckets to native histogram chunks encoding #13706

Merged

zenador force-pushed the nhcb-tests branch 3 times, most recently from 0c6e102 to fad5a66 Compare March 22, 2024 13:59

zenador changed the title ~~Run unit tests for native histograms with custom buckets converted from classic histograms~~ Add basic unit tests for native histograms with custom buckets converted from classic histograms Mar 22, 2024

zenador force-pushed the nhcb-tests branch from fad5a66 to 88b4f9c Compare March 22, 2024 14:16

zenador mentioned this pull request Mar 22, 2024

[nhcb branch] Automatically run equivalent unit tests for native histograms with custom buckets when converted from classic histograms #13827

Closed

zenador marked this pull request as ready for review March 22, 2024 14:20

krajorama reviewed Mar 25, 2024

View reviewed changes

promql/quantile.go Outdated Show resolved Hide resolved

promql/quantile.go Show resolved Hide resolved

promql/quantile.go Show resolved Hide resolved

bboreham changed the title ~~Add basic unit tests for native histograms with custom buckets converted from classic histograms~~ [nhcb branch] Add basic unit tests for native histograms with custom buckets converted from classic histograms Mar 25, 2024

zenador added 6 commits March 28, 2024 02:37

modify unit test framework to automatically generate native histogram…

bbd895e

…s with custom buckets from classic histogram series Signed-off-by: Jeanette Tan <[email protected]>

make the label manipulation a bit nicer

69c6761

Signed-off-by: Jeanette Tan <[email protected]>

add very basic tests for classic histogram converted into native hist…

3722b3a

…ogram with custom bounds Signed-off-by: Jeanette Tan <[email protected]>

fix histogram_quantile for native histograms with custom buckets

55af752

Signed-off-by: Jeanette Tan <[email protected]>

fix according to code review

7f73023

Signed-off-by: Jeanette Tan <[email protected]>

make loading with nhcb explicit

fc702fb

Signed-off-by: Jeanette Tan <[email protected]>

zenador force-pushed the nhcb-tests branch from a60b83d to e431026 Compare March 27, 2024 18:37

krajorama reviewed Mar 28, 2024

View reviewed changes

promql/test.go Show resolved Hide resolved

promql/test.go Outdated Show resolved Hide resolved

zenador added 5 commits April 5, 2024 01:08

evaluate native histograms with custom buckets on queries with explic…

b2819f6

…it keyword Signed-off-by: Jeanette Tan <[email protected]>

use regex replacer

4320295

Signed-off-by: Jeanette Tan <[email protected]>

use temp histogram struct for automatically loading converted nhcb

a545ad1

Signed-off-by: Jeanette Tan <[email protected]>

use more temp histogram structs

d372ee5

Signed-off-by: Jeanette Tan <[email protected]>

get rid of some duplicate code

d5b1840

Signed-off-by: Jeanette Tan <[email protected]>

zenador force-pushed the nhcb-tests branch from e431026 to d5b1840 Compare April 4, 2024 17:09

krajorama reviewed Apr 5, 2024

View reviewed changes

revise according to code review

4be3453

Signed-off-by: Jeanette Tan <[email protected]>

krajorama approved these changes Apr 24, 2024

View reviewed changes

promql/test.go Outdated Show resolved Hide resolved

Update promql/test.go

60f2d2d

Signed-off-by: George Krajcsovits <[email protected]>

krajorama merged commit 81862aa into prometheus:nhcb Apr 24, 2024

fionaliao mentioned this pull request Oct 2, 2024

nhcb: write basic unit tests for custom histograms and implement #13531

Closed

	prevLe := -372.4869 // assuming that the lowest bound is never this value
	prevLe := math.NaN()

[nhcb branch] Add basic unit tests for native histograms with custom buckets converted from classic histograms #13794

[nhcb branch] Add basic unit tests for native histograms with custom buckets converted from classic histograms #13794

Uh oh!

Conversation

zenador commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zenador Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zenador Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zenador Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zenador commented Mar 19, 2024 •

edited

Loading

zenador Mar 20, 2024 •

edited

Loading

zenador Mar 21, 2024 •

edited

Loading

zenador Mar 22, 2024 •

edited

Loading