-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Fix stddev/stdvar when aggregating histograms, NaNs, and infinities #14941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks, will have a look ASAP. Note that the tracking bug for this is #13934 . /cc @NeerajGartia21 |
|
Actually, in #13934, we assumed the implemented behavior is already correct, but you found (and fixed here) that that's not entirely true. |
Native histograms are ignored when calculating stddev or stdvar. However, for the first series that is loaded a groupedAggregation would always be created. If the first series that was loaded is a histogram then it acts as the equivalent of a 0 point. This change does not create a groupedAggregation if the point is a histogram, thus ignoring it like the rest of the aggregation function does. (The added tests fail without this change). Signed-off-by: Joshua Hesketh <[email protected]>
Signed-off-by: Joshua Hesketh <[email protected]>
Signed-off-by: Joshua Hesketh <[email protected]>
Signed-off-by: Joshua Hesketh <[email protected]>
1209405 to
9372bb6
Compare
Signed-off-by: Joshua Hesketh <[email protected]>
|
I've also added a fix to infinity to be consistent. However, how infinities are handled should likely be visited in more detail. (please see updated topic). |
promql/engine.go
Outdated
| } | ||
| case parser.STDVAR, parser.STDDEV: | ||
| group.floatValue = 0 | ||
| if h == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of initialising the groupedAggregation for each case, can we just set the group.seen=false when histogram is encountered for the first time in the group? If YES, then it will reduce a lot of code repeatations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Thanks for having a look @NeerajGartia21 . And apologies for my continued delay. I'll try to get to this next week. |
Signed-off-by: beorn7 <[email protected]>
|
Thank you again @jhesketh . I think this works fine, but I have played with @NeerajGartia21's idea, and it seems to work. (But I'm definitely not an export when it comes to the PromQL engine, so I might have missed something.) |
|
Also resolved merge conflict. Now let's see if the CI tests will run this time. |
- Use canonical formatting in PromQL expressions. - Add tests `by (label)`. - Consistent indentation. Signed-off-by: beorn7 <[email protected]>
|
Added more test nuances. |
|
@jhesketh do you have any thoughts on my pushed additions? @NeerajGartia21 maybe you could review my additions. If you agree they are correct, we can merge this. |
e8d3ed5 to
8ed444d
Compare
|
I solved merge conflicts once more. |
|
Sorry for the delay on this (been on PTO). Looks good to me, thank you for the improvements and solving merge conflicts :-) |
|
It looks good to me as well! I think it’s ready to merge. |
|
Thank you all. I'll squash this all into one and merge. |
Native histograms are ignored when calculating stddev or stdvar.
However, for the first series that is loaded a groupedAggregation would always be created. If the first series that was loaded is a histogram then it acts as the equivalent of a 0 point.
This change does not create a groupedAggregation if the point is a histogram, thus ignoring it like the rest of the aggregation function does.
Additionally, when doing stddev or stdvar on a single series with a NaN, 0 was returned. Whereas any aggregation group with more than one series would return NaN. Make this consistent by always returning NaN.
Similarly, stddev and stdvar would return NaN when infinity exists in one series. However if infinity was the only point loaded it would return
0. Make this consistent by always treating infinity as NaN.** Note that this likely isn't the correct way to handle infinity in a standard deviation. For example, given two points
1andinf, I would expect thestddevto beinf. For now, however, this change keeps the functionality ofstddevas it was and makes it consistent for when handling the case of just infinity.(The added tests fail without this change).