-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promql: make lookback and matrix selections left-open and right-closed #13904
Conversation
This Issue/PR is under discussion and is just an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this. This might be the only change needed. Best to see by adding test cases (or observing which existing tests change their result in which way), which should mostly affect the framework tests in https://github.com/prometheus/prometheus/tree/main/promql/testdata .
I am working on unit tests. I ran all the unit tests and tracked 26 errors that shall be fixed: go test github.com/prometheus/prometheus/... |
Sorry for the delay. I'm back from vacation and will try to get to this ASAP. If anyone qualified beats me to it, even better. |
In the meantime, @KofClubs could you fix the DCO? It should be easy. See the details of the failed DCO test. |
Does this also adjust the lookback behavior? If I understand correctly, with lookback set to |
@vpranckaitis that's a good point. The lookback should be "left-open", too. Maybe it is already. We have to check. |
I still haven't found time for a detailed review. I will try later this week or early next week. My apologies. |
OK. I shall fixup all commits to one after review and sign it off. Sorry, I've been extremely busy recently. I shall work on this PR now, to resolve conflicts firstly. |
52ac3ea
to
e940058
Compare
With conflicts resolved and commits signed off, I think this PR is ready to be reviewed:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. A few subtleties in the comments.
I will look into the question of the lookback delta being inclusive ASAP.
About the lookback delta: There is even a test for it, see prometheus/promql/promqltest/testdata/staleness.test Lines 38 to 51 in 806073a
Which means that the lookback is "inclusive" right now, but it needs to be "exclusive", so essentially, in that test, the |
And yet another question: How to merge this? To merge this into main before the v3 release, we need to hide the behavior behind a feature flag. However, that doesn't play well with tests. So we could have the behavior changes in a separate PR, where they are behind a feature flag, and not tested at all. And then we have this PR with all the tests, which will go into a v3 branch, which doesn't exist yet, but maybe we should create one soon. Another option would be to only launch this feature with v3 and don't have it behind a feature flag in v2 at all. WDYT? |
Get it. I think there are some aspects worth exploring. MotivationApart from making behavior consistent, are there any other motivations for making lookback into From Storage PerspectiveI don't know if there is any bias in my understanding. If lookback becomes left-open and right-closed, then from the perspective of storage (which can be Prometheus itself, or another TSDB replaced by users), when retrieving data, the time interval shall become left-open and right-closed: If there are any misunderstandings above, please feel free to point them out:) |
Both OK. While, I think this change has a large impact. Considering the change in user habits and behaviors, adding a flag seems to be necessary. |
WRT lookback delta: I think the one and only argument is the one made by @vpranckaitis already. "Correct" is whatever behavior we define as such. The implementation details in the storage shouldn't matter much. The easiest way for now is probably to simply reduce the timestamp by 1ms. And it should be included in this PR to keep things consistent. |
About flags and configurability: I think it is uncontroversial that v3 should only support the new behavior (the one implemented in this PR), with no way of bringing back the old behavior. So far, everyone that thought about it has agreed that the "left closed" behavior was a mistake and should have never happened in the first place. The only question is how to deal with it in v2. And I see two feasible options:
The problem with (2) is that it is hard to cover in tests (unless we add feature flag configuration to the testing framework), but that's maybe not a big problem because it will be tested in the v3 branch. On the other hand, the change in behavior is almost irrelevant in practice and mostly matters it tests (where you have perfectly aligned artificial data), so maybe a feature flag isn't even necessary and (1) is the way to go. |
I unresolved two conversation above where you went into the opposite direction than intended. |
Thanks. Both resolved. Review them please:) |
Merging this update to v3 directly sounds good:) As there are significant functional differences between v2 and v3, are there plans to continue maintaining v2 after v3 is released? When shall the life cycle of v2 end? Sorry for not knowing about the planning of Prometheus 3.0 fully, so I would like to ask these questions sincerely:) |
e19a42d
to
381f8d5
Compare
Several unit tests to be changed. I shall continue to work on them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three unresolved comments, with my now more detailed suggestions how to resolve them.
promql/engine_test.go
Outdated
offset := len(c.histograms) | ||
newTs := ts + int64(time.Minute/time.Millisecond)*int64(offset-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still unresolved.
@@ -77,7 +71,7 @@ load 5m | |||
# Tests for increase(). | |||
eval instant at 50m increase(http_requests[50m]) | |||
{path="/foo"} 100 | |||
{path="/bar"} 90 | |||
{path="/bar"} 88.88888888888889 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I played with it. I still think we should change the test so that it yields round numbers, but that requires some more changes. Here is my suggestion:
# Tests for increase().
load 5m
http_requests{path="/foo"} 0+10x11
http_requests{path="/bar"} 0+10x5 0+10x5
http_requests{path="/dings"} 10+10x11
http_requests{path="/bumms"} 1+10x11
eval instant at 55m increase(http_requests[55m])
{path="/foo"} 110
{path="/bar"} 99
{path="/dings"} 110
{path="/bumms"} 110
@@ -115,11 +109,10 @@ load 5m | |||
|
|||
# Counter resets at in the middle of range are handled correctly by rate(). | |||
eval instant at 50m rate(testcounter_reset_middle[50m]) | |||
{} 0.03 | |||
{} 0.02962962962962963 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we need to change a few more things. Here my suggestion:
# Tests for rate().
load 5m
testcounter_reset_middle 10+10x4 0+10x5
testcounter_reset_end 10+10x9 0 10
# Counter resets at in the middle of range are handled correctly by rate().
eval instant at 50m rate(testcounter_reset_middle[55m])
{} 0.03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments about the documentation.
Thank you very much. We are almost done here. Just the few comments above, all with suggestions what to change. I have proposed to the v3 coordinators to start a release branch soon. Let's see when it happens. |
4ac3fb9
to
f91ae4e
Compare
@beorn7 All resolved. Thank you! |
Looks good. Thank you very much. (I'm not formally approving this PR right now, because we don't want to merge it into main. We need the v3 release branch to exist before merging.) |
We have a v3 branch: https://github.com/prometheus/prometheus/tree/release-3.0 I have already changed the merge target for this PR. However, there are merge conflicts again (have been there before I changed the merge target). @KofClubs could you rebase on top of the current state of the new release-3.0 branch and address the merge conflicts? |
Signed-off-by: Zhang Zhanpeng <[email protected]> Signed-off-by: beorn7 <[email protected]> Co-authored-by: beorn7 <[email protected]>
f91ae4e
to
debbdb8
Compare
@beorn7 Done. Thank you:) |
Great stuff. Will merge on green. |
Conflicts: promql/engine_test.go Resolved by picking main changes but adjusting total_samples for query "max_over_time(metricWith1HistogramEvery10Seconds[60s])[20s:5s]" to 312. Via prometheus#13662 this histogram now stores 13 values per timestamp, but via prometheus#13904 the range query is now left-open. promql/promqltest/testdata/functions.test Resolved by picking changes in main. See also prometheus#13662. promql/promqltest/testdata/histograms.test Resolved by picking changes in main. See also prometheus#13662, but adjust some range selectors (`s/5m/10m/`) to account for prometheus#13904
Conflicts: promql/engine_test.go Resolved by picking main changes but adjusting total_samples for query "max_over_time(metricWith1HistogramEvery10Seconds[60s])[20s:5s]" to 312. Via prometheus#13662 this histogram now stores 13 values per timestamp, but via prometheus#13904 the range query is now left-open. promql/promqltest/testdata/functions.test Resolved by picking changes in main. See also prometheus#13662, but adjust some range selectors (`s/1m/2m/`) to account for prometheus#13904. promql/promqltest/testdata/histograms.test Resolved by picking changes in main. See also prometheus#13662, but adjust some range selectors (`s/5m/10m/`) to account for prometheus#13904. Signed-off-by: Jan Fajerski <[email protected]>
Signed-off-by: Jan Fajerski <[email protected]> Conflicts: VERSION pick 3.0.0 promql/promqltest/testdata/histograms.test pick changes from c39776c, but adjust 5m range selectors to 10m to account for prometheus#13904. Fixes: promql/promqltest/testdata/functions.test promql/promqltest/testdata/staleness.test Tests added in prometheus#9138 need to be adjusted to account for prometheus#13904.
Some test queries need their interval adjusted to account for prometheus#13904. Otherwise the queries don't return enough samples. Signed-off-by: Jan Fajerski <[email protected]>
Some test queries need their interval adjusted to account for prometheus#13904. Otherwise the queries don't return enough samples. promql/engine_test.go:TestHistogramCopyFromIteratorRegression needed the same, but also the result needed a fix since `increase` interpolates over the full range. Signed-off-by: Jan Fajerski <[email protected]>
In order to fix new tests for changes added in prometheus#13904. Signed-off-by: Jan Fajerski <[email protected]>
Fixes #13213