fix(om2): histograms and negative observed values#2627
Conversation
7880f71 to
bd3c521
Compare
This is only true for native histograms, but not for classic histograms. (FTR: I proposed to improve the counter reset handling for summaries and classic histograms at KubeCon Berlin in 2017. My proposal was ultimately rejected, so I guess we should not change course now and instead encourage native histograms including NHCB.) |
I've reworded the PR description and I'll copy the final text into the commit message once we agree on it. |
OM1.0 required that the Sum of Histograms is not represented when there are negative observations in a histogram. This PR is removing this requirement in OM2.0. Due to: The requirement was never implemented by the Go and Java instrumentation libraries. Enforcing it now would be breaking. The requirement makes it impossible to implement the use case where the user wants to measure the Sum anyway. We already warned users in the documentation about the possibility of Sum decreasing and not being usable for rate() 10 years ago: #43. And native histograms will not take Sum into account when calculating counter resets during rate() , thus this problem won't come up. Note: this PR does not make Sum mandatory, that is a different question. Signed-off-by: György Krajcsovits <[email protected]>
bd3c521 to
3b7d783
Compare
|
I think the only way of solving this problem properly (beyond getting rid of classic histograms and summaries altogether) is to require PromQL to detect a counter reset in the sum via different means (historically by looking at the count, but nowadays we could also look at the CT). I don't know how to solve this given that the Prometheus community has decided to not do that. Maybe just leaving it as is in practice (which is arguably what this PR proposes) is the least bad way, but I don't feel I should make this call about OMv2. |
I agree that the solution is native histograms and this PR does not want to actually solve the problem of negative values in Sum. This PR is just about getting rid of a requirement that's not implemented by anyone and just makes things more complicated. |
Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: György Krajcsovits <[email protected]>
csmarchbanks
left a comment
There was a problem hiding this comment.
Generally ok with this as the above paragraph only has the sum as a SHOULD. I didn't realize it was a MUST NOT for OM 1.0, I guess that means Java is not OpenMetrics compliant today.
|
Also, just to note the above comment - the requirement to not expose |
noted |
|
Related issue about Sum allowing NaN or not: prometheus/client_golang#1275 (comment) |
We agreed to just have good PR descriptions. Signed-off-by: György Krajcsovits <[email protected]>
In #2627 and #2634 we've made the Sum and Count mandatory fields in histograms. They became mandatory in Summary due to the ABNF as that has precedence. This PR changes the data model to follow the ABNF and be consistent with histograms. Note: this is probably the last chance to reverse course and make Sum and/or Count optional without a breaking change. Signed-off-by: György Krajcsovits <[email protected]>
In #2627 and #2634 we've made the Sum and Count mandatory fields in histograms. They became mandatory in Summary due to the ABNF as that has precedence. This PR changes the data model to follow the ABNF and be consistent with histograms. Note: this is probably the last chance to reverse course and make Sum and/or Count optional without a breaking change. Signed-off-by: György Krajcsovits <[email protected]> # Conflicts: # docs/specs/om/open_metrics_spec_2_0.md
In #2627 and #2634 we've made the Sum and Count mandatory fields in histograms. They became mandatory in Summary due to the ABNF as that has precedence. This PR changes the data model to follow the ABNF and be consistent with histograms. Note: this is probably the last chance to reverse course and make Sum and/or Count optional without a breaking change. Signed-off-by: György Krajcsovits <[email protected]> # Conflicts: # docs/specs/om/open_metrics_spec_2_0.md
OM1.0 required that the Sum of Histograms is not represented when there are negative observations in a histogram.
This PR is removing this requirement in OM2.0. Due to:
The requirement was never implemented by the Go and Java instrumentation libraries. Enforcing it now would be breaking.
The requirement makes it impossible to implement the use case where the user wants to measure the Sum anyway. Which means for example that you'll not be able to calculate average from Sum/Count.
The PromQL engine does not take the Sum into account when doing counter reset detection, thus it does not matter that it can decrease.We already warned users in the documentation about the possibility of Sum decreasing and not being usable for
rate()10 years ago: PR.And native histograms will not take Sum into account when calculating counter resets during
rate(), thus this problem won't come up.Note1: the python reference implementation did follow the requirement.
Note 2: this PR does not make Sum mandatory, that is a different question.