promql: fix histogram_fraction issue when lower falls within the first bucket#17424
Conversation
|
@MichaHoffmann could you have a look? |
|
I think the change makes sense, but can we convert the tests into acceptance tests in promqltest? That way we define the behavior for all prometheus compatible engines too. |
I converted the unit tests into acceptance tests in promqltest. |
MichaHoffmann
left a comment
There was a problem hiding this comment.
LGTM, good catch that this is an issue for NHCB too
|
Since I cannot merge and dont know what should happen next; cc @beorn7 - if that ping was not needed im sorry 🙈 |
|
Thanks, both of you. I'll give it a final look ASAP. |
beorn7
left a comment
There was a problem hiding this comment.
First of all, apologies for the review delay. I was incredibly busy with other things last week.
Thank you very much, in particular for the nice explanations and the comprehensive testing.
I have only two things:
- My usual capitalization and punctuation nits.
- More importantly, I think we should get this into the upcoming v3.8 release. An RC is already out, and the release-3.8 branch already exists. Thus, could you rebase this PR on top of the current state of the release-3.8 branch?
Thanks again.
promql/quantile.go
Outdated
|
|
||
| // If the upper bound of the first bucket is greater than 0, we assume | ||
| // we are dealing with positive buckets only and lowerBound for the | ||
| // first bucket is set to 0; otherwise it is set to -Inf |
There was a problem hiding this comment.
Nit: End sentence with a period.
There was a problem hiding this comment.
I fixed punctuation in all comments.
| {start="positive"} 0.6363636363636364 | ||
| {start="negative"} 0 | ||
|
|
||
| # positive buckets, lower falls in the first bucket |
There was a problem hiding this comment.
Nit: We generally try to use normal capitalization and punctuation in comments for easier readability, so this should be
| # positive buckets, lower falls in the first bucket | |
| # Positive buckets, lower falls in the first bucket. |
Could you go through the other comments as well and adjust?
There was a problem hiding this comment.
I fixed punctuation in all comments.
|
/cc @jan--f as the release shepherd of v3.8. |
|
I branched |
|
My words exactly. 😁 |
Signed-off-by: Mohammad Alavi <[email protected]>
Signed-off-by: Mohammad Alavi <[email protected]>
Signed-off-by: Mohammad Alavi <[email protected]>
8b2c71b to
a7ddf54
Compare
|
I fixed the punctuation in the comments and rebased my branch onto release-3.8. There’s a failing test that doesn’t seem related to my changes. |
|
Note: A adjusted the release-note section in the PR description. |
* promql: fix histogram_fraction issue when lower falls within the first bucket (prometheus#17424) Signed-off-by: Mohammad Alavi <[email protected]> * prepare release 3.8.0-rc.0 Signed-off-by: Jan Fajerski <[email protected]> * test: skip TestRemoteWrite_ReshardingWithoutDeadlock temporarily as flaky (prometheus#17534) (prometheus#17543) (cherry picked from commit 35c3232) Signed-off-by: machine424 <[email protected]> Signed-off-by: Jan Fajerski <[email protected]> Co-authored-by: Ayoub Mrini <[email protected]> * chore(deps): bump prometheus/promci from 0.4.7 to 0.5.0 Signed-off-by: Jan Fajerski <[email protected]> * chore(deps): bump prometheus/promci from 0.5.0 to 0.5.1 Signed-off-by: Jan Fajerski <[email protected]> * chore(deps): bump prometheus/promci from 0.5.1 to 0.5.2 Signed-off-by: Jan Fajerski <[email protected]> * chore(deps): bump prometheus/promci from 0.5.2 to 0.5.3 Signed-off-by: Jan Fajerski <[email protected]> * prw2: Move Remote Write 2.0 CT to be per Sample; Rename to ST (start timestamp) (prometheus#17411) Relates to prometheus#16944 (comment) Signed-off-by: bwplotka <[email protected]> (cherry picked from commit cefefc6) * chore: prepare 3.8.0-rc.1 entry Signed-off-by: bwplotka <[email protected]> * [chore]: bump common dep to support RFC7523 3.1 Signed-off-by: Jorge Turrado <[email protected]> * Update Prometheus Agent doc (prometheus#17591) * Add a nav title to fix docs website generator. * Make it more clear that "Prometheus Agent" is a mode, not a seaparate service. * Add to index. * Cleanup some wording. * Add a downsides section. Signed-off-by: SuperQ <[email protected]> (cherry picked from commit d0d2699) * chore(deps): bump github.com/prometheus/common from 0.67.3 to 0.67.4 (prometheus#17594) Signed-off-by: Jan Fajerski <[email protected]> * prepare release v3.8.0-rc.1 Signed-off-by: Jan Fajerski <[email protected]> * prepare release v3.8.0 Signed-off-by: Jan Fajerski <[email protected]> * chore: Fix function name typo in createBatchSpan comment Signed-off-by: zjumathcode <[email protected]> * feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file (prometheus#17435) * Delay compactions until Thanos uploads all blocks Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled. The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block. Signed-off-by: Lukasz Mierzwa <[email protected]> * Rename structs and functions to make this more generic Signed-off-by: Lukasz Mierzwa <[email protected]> * Address review comments Signed-off-by: Lukasz Mierzwa <[email protected]> * Cache UploadMeta for 1 minute Signed-off-by: Lukasz Mierzwa <[email protected]> --------- Signed-off-by: Lukasz Mierzwa <[email protected]> * RW2: Allow custom scope in azuread (prometheus#17483) Signed-off-by: Ben Edmunds <[email protected]> * docs: Describe how time() is set to start at 0 in unit tests The return value of functions relating to the current time, e.g. time(), is set by promtool to start at timestamp 0 at the start of a test's evaluation. This has the very nice consequence that tests can run reliably without depending on when they are run. It does, however, mean that tests will give out results that can be unexpected by users. If this behaviour is documented, then users will be empowered to write tests for their rules that use time-dependent functions. (Closes: prometheus/docs#1464) Signed-off-by: Gabriel Filion <[email protected]> * refactor(tsdb): use one test newTestDB constructor (prometheus#17638) For tests only, we had various ways of opening DB. Reduced to one instead of: * Open * newTestDB * newTestDBOpts * openTestDB This so prometheus#17629 is smaller and bit easier. Also for test maintainability and consistency. Signed-off-by: bwplotka <[email protected]> * Add start_timestamp field for unit tests. This commit adds support for configuring a custom start timestamp for Prometheus unit tests, allowing tests to use realistic timestamps instead of starting at Unix epoch 0. Signed-off-by: Julien Pivotto <[email protected]> * Fix serialization for empty `ignoring()` in combination with `group_x()` Currently both the backend and frontend printers/formatters/serializers incorrectly transform the following expression: ``` up * ignoring() group_left(__name__) node_boot_time_seconds ``` ...into: ``` up * node_boot_time_seconds ``` ...which yields a different result (including the metric name in the result vs. no metric name). We need to keep empty `ignoring()` modifiers if there is a grouping modifier present. Signed-off-by: Julius Volz <[email protected]> * Simplify StartTime assignment in unit test setup. Remove redundant IsZero check since promqltest.LazyLoader already handles zero StartTime by defaulting to Unix epoch. Signed-off-by: Julien Pivotto <[email protected]> * Update golangci-lint and add modernize check (prometheus#17640) * add modernize check Signed-off-by: dongjiang1989 <[email protected]> * fix golangci lint Signed-off-by: dongjiang1989 <[email protected]> --------- Signed-off-by: dongjiang1989 <[email protected]> * fix lint --------- Signed-off-by: Mohammad Alavi <[email protected]> Signed-off-by: Jan Fajerski <[email protected]> Signed-off-by: machine424 <[email protected]> Signed-off-by: bwplotka <[email protected]> Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: SuperQ <[email protected]> Signed-off-by: zjumathcode <[email protected]> Signed-off-by: Lukasz Mierzwa <[email protected]> Signed-off-by: Ben Edmunds <[email protected]> Signed-off-by: Gabriel Filion <[email protected]> Signed-off-by: Julien Pivotto <[email protected]> Signed-off-by: Julius Volz <[email protected]> Signed-off-by: dongjiang1989 <[email protected]> Co-authored-by: Mohammad Alavi <[email protected]> Co-authored-by: Jan Fajerski <[email protected]> Co-authored-by: Jan Fajerski <[email protected]> Co-authored-by: Ayoub Mrini <[email protected]> Co-authored-by: Bartlomiej Plotka <[email protected]> Co-authored-by: Jorge Turrado <[email protected]> Co-authored-by: Ben Kochie <[email protected]> Co-authored-by: zjumathcode <[email protected]> Co-authored-by: Łukasz Mierzwa <[email protected]> Co-authored-by: Ben Edmunds <[email protected]> Co-authored-by: Julien <[email protected]> Co-authored-by: Gabriel Filion <[email protected]> Co-authored-by: Julius Volz <[email protected]> Co-authored-by: dongjiang <[email protected]> Co-authored-by: Jeanette Tan <[email protected]>
Fixes
histogram_fractionreturningNaNwhenlowerfalls within the first bucket for classic histograms and NHCB. The issue is described here.Which issue(s) does the PR fix:
Fixes #17367
Does this PR introduce a user-facing change?