Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promql: Make range selections left-open and right-closed #13213

Closed
beorn7 opened this issue Nov 29, 2023 · 10 comments · Fixed by #13904
Closed

promql: Make range selections left-open and right-closed #13213

beorn7 opened this issue Nov 29, 2023 · 10 comments · Fixed by #13904

Comments

@beorn7
Copy link
Member

beorn7 commented Nov 29, 2023

Proposal

For historical reasons, a range selector selects a closed interval, i.e. samples perfectly coinciding with the boundaries of the range are included in the selection.

For various reasons, it is more consistent and more helpful in practice to make the selection "left open" and "right closed", i.e. a sample coinciding with the "left" boundary (the one further in the past) is excluded from the selection, while at the "right" boundary (the one further towards the future) stays inclusive.

For example, with samples perfectly spaced every 1m (very common in practice), a range over 5m might currently select 5 or 6 samples, depending on the exact alignment. In practice, it almost always selects 5 samples, because perfect alignment rarely happens in real-world scenarios. However, in test scenarios, perfect alignment happens easily, even without intending to do so. So test cases can easily represent a case that is very rare in practice. With the proposed change, the range will always select 5 samples in the case of perfectly spaced samples.

This is, however, a breaking change, although the impact is mostly academic. Therefore, we should implement this change with the upcoming 3.0.0 release.

@KofClubs
Copy link
Contributor

Hello! I am following this change.
May I ask why the range selector is defined as left-open and right-closed instead of left-closed and right-open?
I think the latter is better because it is consistent with the behavior of range selectors for lists in multiple programming languages.
For example, in Go, use [lo:hi] to operate on array or slice arr, the result is:

arr[lo], arr[lo+1], arr[lo+2],..., arr[hi-1]

It includes left bound and excludes right bound.
Thanks.

@beorn7
Copy link
Member Author

beorn7 commented Dec 13, 2023

Good question. I should have mentioned the rationale for that, too. The short answer is that a range vector should always select the samples that the corresponding instant vector would have selected, too.

In more detail:

An instant vector will select a sample that is precisely at the evaluation timestamp (if there is one, otherwise it will search for the next sample in the past). So let's say the instant vector foo returns a sample that happens to be precisely at the evaluation timestamp. If you change that instant vector to a range vector foo[1m], you should still see that same sample selected as part of the range. That's accomplished with a "right closed" range. However, with a "right open" range, you would not select that sample.

@KofClubs
Copy link
Contributor

When the range selection interval becomes smaller and smaller, the result shall converge to the corresponding instant query result - Well designed, thanks😀

@pawarpranav83
Copy link
Contributor

pawarpranav83 commented Jan 19, 2024

Hey @beorn7, I would like to work on this issue.
So, in which function are we supposed to make this change?
The query data flows like this ig - NewRangeQuery -> Exec -> exec -> execEvalStmt -> InitStepTracking -> Eval -> eval in promql/engine.go.

@beorn7
Copy link
Member Author

beorn7 commented Jan 19, 2024

Hey @beorn7, I would like to work on this issue.

I have assigned this issue to you.

So, in which function are we supposed to make this change?

I don't know the PromQL codebase well enough to answer this question from the top of my head. You have to investigate the codebase yourself (as you have already started to do so). #prometheus-dev on the CNCF slack might be a good forum to fish for more experienced PromQL coders than me or general help.

Note that these changes have to be behind a feature flag in v2.x and will only become default in v3.

@eolalde
Copy link

eolalde commented Mar 20, 2024

Hey @beorn7 👋 I just stumbled upon this issue since I was looking at the same artifact of including more datapoints than I intend to when using <agg_function>_over_time for the purpose of down-sampling a gauge metric.

What I was thinking wrt this proposed solution is to do the opposite selection of which side is open and which one is closed. When down-sampling using an aggregation function over time, you would expect the left limit of the range interval to be included in the aggregation but not the right limit, i.e. left-closed and right-open.

Doing the opposite would mean that in a range between t1 and t2 timestamps, the t1 aggregate doesn't include t1 but includes t2 and is tracked under t2 in the resulting timeseries, which is not accurate to describe which time-bucket a given aggregate represents. In other words, with left-open and right-closed, what you get is the aggregate of the (t1, t2] interval timestamped on the right-side limit (t2) when the semantic (or intuitive) behavior of down-sampling is to calculate the aggregate over the [t1, t2) interval and timestamp it on t1.

Maybe I'm missing something, but "left-closed and right-open" is how I've seen all re-sampling/down-sampling intervals working across multiple time-series tools (database query languages, stream processing, data frame libraries, etc.). Maybe the fundamental behavior here is also based on the range interval in prometheus always looking back as opposed to creating discrete [t1, t2) intervals over the overall query time range, since today I can easily achieve the "left-open and right-closed" range interval by using a simple offset 1s on my range vectors (assuming stored data sampling rate is > 1s), but that looks just odd for example on the last timestamp aggregate when you don't have the full interval and if you're using sum aggregation you already see a complete interval total 🤔

I don't know if this behavior also relates to @KofClubs's comment or if that one was purely based on array slicing semantics 😅

@beorn7
Copy link
Member Author

beorn7 commented Mar 20, 2024

I guess the one important argument would be that an instant selector selects the most recent sample at or before the evaluation timestamp. If a range selector was right-open, it would not select a sample exactly at the evaluation timestamp, so it would be possible that an instant selector selects a sample that a range selector at the same timestamp would not select, which I believe is highly undesirable.

@pawarpranav83 pawarpranav83 removed their assignment Apr 8, 2024
@KofClubs
Copy link
Contributor

KofClubs commented Apr 8, 2024

@beorn7 Assign this issue to me, s'il vous plaît.

@KofClubs
Copy link
Contributor

KofClubs commented Apr 8, 2024

#13904 fixes this issue.

@beorn7
Copy link
Member Author

beorn7 commented Jun 20, 2024

This is done. It will be released in v3.0.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants