tsdb:Optimize LabelValues API performance (#14551)#18069
tsdb:Optimize LabelValues API performance (#14551)#18069bboreham merged 1 commit intoprometheus:mainfrom
Conversation
bboreham
left a comment
There was a problem hiding this comment.
Thanks for this.
If it's not in the first postings, I would expect seek() to examine further postings until it finds one with the target or reaches the end.
I expect it still works in your implementation because you're calling it from a loop that will call it again. But this is not the standard behaviour of seek(). Maybe calling it something else would be fine.
I'll rename it to seekHead or updateHead to make it clear it's only operating on the top element. ig seekHead sounds better? |
Yes, ok with |
|
BTW there is a tool |
Why not use If you have a good reason to need a new benchmark, please include this in your PR. |
the existing one covers the high-overlap case (like 90% match) where |
|
I have now studied the actual algorithmic change you made; it's a great find. Thanks again. |
Happy to know u liked the changes, i would like to modify the stuff on my own this time although it might take 3-4hrs as am currently engaged with one thing but surely i will do it |
613665b to
27983fa
Compare
|
hey @bboreham i renamed seek to seekHead so it's less confusing, and I nuked that unused next() method entirely (plus updated the tests to use seekHead instead). |
27983fa to
42a86d6
Compare
…#14551) Signed-off-by: Divyansh Mishra <[email protected]>
42a86d6 to
fcb6806
Compare
Hey @bboreham, just a reminder to check out this PR |
I get a pretty good speedup on the existing benchmark: But yours is a bigger percentage: |
bboreham
left a comment
There was a problem hiding this comment.
Great, code change is very clean now.
I didn't study the new benchmark closely.
##### [\`v3.10.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.10.0) Prometheus now offers a distroless Docker image variant alongside the default busybox image. The distroless variant provides enhanced security with a minimal base image, uses UID/GID 65532 (nonroot) instead of nobody, and removes the VOLUME declaration. Both variants are available with `-busybox` and `-distroless` tag suffixes (e.g., `prom/prometheus:latest-busybox`, `prom/prometheus:latest-distroless`). The busybox image remains the default with no suffix for backwards compatibility (e.g., `prom/prometheus:latest` points to the busybox variant). For users migrating existing **named** volumes from the busybox image to the distroless variant, the ownership can be adjusted with: ``` docker run --rm -v prometheus-data:/prometheus alpine chown -R 65532:65532 /prometheus ``` Then, the container can be started with the old volume with: ``` docker run -v prometheus-data:/prometheus prom/prometheus:latest-distroless ``` User migrating from bind mounts might need to ajust permissions too, depending on their setup. - \[CHANGE] Alerting: Add `alertmanager` dimension to following metrics: `prometheus_notifications_dropped_total`, `prometheus_notifications_queue_capacity`, `prometheus_notifications_queue_length`. [#16355](prometheus/prometheus#16355) - \[CHANGE] UI: Hide expanded alert annotations by default, enabling more information density on the `/alerts` page. [#17611](prometheus/prometheus#17611) - \[FEATURE] AWS SD: Add MSK Role. [#17600](prometheus/prometheus#17600) - \[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series. [#17644](prometheus/prometheus#17644) - \[FEATURE] Web: Add OpenAPI 3.2 specification for the HTTP API at `/api/v1/openapi.yaml`. [#17825](prometheus/prometheus#17825) - \[FEATURE] Dockerfile: Add distroless image variant using UID/GID 65532 and no VOLUME declaration. Busybox image remains default. [#17876](prometheus/prometheus#17876) - \[FEATURE] Web: Add on-demand wall time profiling under `<URL>/debug/pprof/fgprof`. [#18027](prometheus/prometheus#18027) - \[ENHANCEMENT] PromQL: Add more detail to histogram quantile monotonicity info annotations. [#15578](prometheus/prometheus#15578) - \[ENHANCEMENT] Alerting: Independent alertmanager sendloops. [#16355](prometheus/prometheus#16355) - \[ENHANCEMENT] TSDB: Experimental support for early compaction of stale series in the memory with configurable threshold `stale_series_compaction_threshold` in the config file. [#16929](prometheus/prometheus#16929) - \[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size. [#17736](prometheus/prometheus#17736) - \[ENHANCEMENT] Promtool: Support promql syntax features `promql-duration-expr` and `promql-extended-range-selectors`. [#17926](prometheus/prometheus#17926) - \[PERF] PromQL: Avoid unnecessary label extraction in PromQL functions. [#17676](prometheus/prometheus#17676) - \[PERF] PromQL: Improve performance of regex matchers like `.*-.*-.*`. [#17707](prometheus/prometheus#17707) - \[PERF] OTLP: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency. [#17860](prometheus/prometheus#17860) - \[PERF] API: Compute `/api/v1/targets/relabel_steps` in a single pass instead of re-running relabeling for each prefix. [#17969](prometheus/prometheus#17969) - \[PERF] tsdb: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069) - \[BUGFIX] PromQL: Prevent query strings containing only UTF-8 continuation bytes from crashing Prometheus. [#17735](prometheus/prometheus#17735) - \[BUGFIX] Web: Fix missing `X-Prometheus-Stopping` header for `/-/ready` endpoint in `NotReady` state. [#17795](prometheus/prometheus#17795) - \[BUGFIX] PromQL: Fix PromQL `info()` function returning empty results when filtering by a label that exists on both the input metric and `target_info`. [#17817](prometheus/prometheus#17817) - \[BUGFIX] TSDB: Fix a bug during exemplar buffer grow/shrink that could cause exemplars to be incorrectly discarded. [#17863](prometheus/prometheus#17863) - \[BUGFIX] UI: Fix broken graph display after page reload, due to broken Y axis min encoding/decoding. [#17869](prometheus/prometheus#17869) - \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields (Labels, Histogram pointers, metadata strings) before returning buffers to pools. [#17879](prometheus/prometheus#17879) - \[BUGFIX] PromQL: info function: fix series without identifying labels not being returned. [#17898](prometheus/prometheus#17898) - \[BUGFIX] OTLP: Filter `__name__` from OTLP attributes to prevent duplicate labels. [#17917](prometheus/prometheus#17917) - \[BUGFIX] TSDB: Fix division by zero when computing stale series ratio with empty head. [#17952](prometheus/prometheus#17952) - \[BUGFIX] OTLP: Fix potential silent data loss for sum metrics. [#17954](prometheus/prometheus#17954) - \[BUGFIX] PromQL: Fix smoothed interpolation across counter resets. [#17988](prometheus/prometheus#17988) - \[BUGFIX] PromQL: Fix panic with `@` modifier on empty ranges. [#18020](prometheus/prometheus#18020) - \[BUGFIX] PromQL: Fix `avg_over_time` for a single native histogram. [#18058](prometheus/prometheus#18058)
##### [\`v3.10.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.10.0) Prometheus now offers a distroless Docker image variant alongside the default busybox image. The distroless variant provides enhanced security with a minimal base image, uses UID/GID 65532 (nonroot) instead of nobody, and removes the VOLUME declaration. Both variants are available with `-busybox` and `-distroless` tag suffixes (e.g., `prom/prometheus:latest-busybox`, `prom/prometheus:latest-distroless`). The busybox image remains the default with no suffix for backwards compatibility (e.g., `prom/prometheus:latest` points to the busybox variant). For users migrating existing **named** volumes from the busybox image to the distroless variant, the ownership can be adjusted with: ``` docker run --rm -v prometheus-data:/prometheus alpine chown -R 65532:65532 /prometheus ``` Then, the container can be started with the old volume with: ``` docker run -v prometheus-data:/prometheus prom/prometheus:latest-distroless ``` User migrating from bind mounts might need to ajust permissions too, depending on their setup. - \[CHANGE] Alerting: Add `alertmanager` dimension to following metrics: `prometheus_notifications_dropped_total`, `prometheus_notifications_queue_capacity`, `prometheus_notifications_queue_length`. [#16355](prometheus/prometheus#16355) - \[CHANGE] UI: Hide expanded alert annotations by default, enabling more information density on the `/alerts` page. [#17611](prometheus/prometheus#17611) - \[FEATURE] AWS SD: Add MSK Role. [#17600](prometheus/prometheus#17600) - \[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series. [#17644](prometheus/prometheus#17644) - \[FEATURE] Web: Add OpenAPI 3.2 specification for the HTTP API at `/api/v1/openapi.yaml`. [#17825](prometheus/prometheus#17825) - \[FEATURE] Dockerfile: Add distroless image variant using UID/GID 65532 and no VOLUME declaration. Busybox image remains default. [#17876](prometheus/prometheus#17876) - \[FEATURE] Web: Add on-demand wall time profiling under `<URL>/debug/pprof/fgprof`. [#18027](prometheus/prometheus#18027) - \[ENHANCEMENT] PromQL: Add more detail to histogram quantile monotonicity info annotations. [#15578](prometheus/prometheus#15578) - \[ENHANCEMENT] Alerting: Independent alertmanager sendloops. [#16355](prometheus/prometheus#16355) - \[ENHANCEMENT] TSDB: Experimental support for early compaction of stale series in the memory with configurable threshold `stale_series_compaction_threshold` in the config file. [#16929](prometheus/prometheus#16929) - \[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size. [#17736](prometheus/prometheus#17736) - \[ENHANCEMENT] Promtool: Support promql syntax features `promql-duration-expr` and `promql-extended-range-selectors`. [#17926](prometheus/prometheus#17926) - \[PERF] PromQL: Avoid unnecessary label extraction in PromQL functions. [#17676](prometheus/prometheus#17676) - \[PERF] PromQL: Improve performance of regex matchers like `.*-.*-.*`. [#17707](prometheus/prometheus#17707) - \[PERF] OTLP: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency. [#17860](prometheus/prometheus#17860) - \[PERF] API: Compute `/api/v1/targets/relabel_steps` in a single pass instead of re-running relabeling for each prefix. [#17969](prometheus/prometheus#17969) - \[PERF] tsdb: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069) - \[BUGFIX] PromQL: Prevent query strings containing only UTF-8 continuation bytes from crashing Prometheus. [#17735](prometheus/prometheus#17735) - \[BUGFIX] Web: Fix missing `X-Prometheus-Stopping` header for `/-/ready` endpoint in `NotReady` state. [#17795](prometheus/prometheus#17795) - \[BUGFIX] PromQL: Fix PromQL `info()` function returning empty results when filtering by a label that exists on both the input metric and `target_info`. [#17817](prometheus/prometheus#17817) - \[BUGFIX] TSDB: Fix a bug during exemplar buffer grow/shrink that could cause exemplars to be incorrectly discarded. [#17863](prometheus/prometheus#17863) - \[BUGFIX] UI: Fix broken graph display after page reload, due to broken Y axis min encoding/decoding. [#17869](prometheus/prometheus#17869) - \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields (Labels, Histogram pointers, metadata strings) before returning buffers to pools. [#17879](prometheus/prometheus#17879) - \[BUGFIX] PromQL: info function: fix series without identifying labels not being returned. [#17898](prometheus/prometheus#17898) - \[BUGFIX] OTLP: Filter `__name__` from OTLP attributes to prevent duplicate labels. [#17917](prometheus/prometheus#17917) - \[BUGFIX] TSDB: Fix division by zero when computing stale series ratio with empty head. [#17952](prometheus/prometheus#17952) - \[BUGFIX] OTLP: Fix potential silent data loss for sum metrics. [#17954](prometheus/prometheus#17954) - \[BUGFIX] PromQL: Fix smoothed interpolation across counter resets. [#17988](prometheus/prometheus#17988) - \[BUGFIX] PromQL: Fix panic with `@` modifier on empty ranges. [#18020](prometheus/prometheus#18020) - \[BUGFIX] PromQL: Fix `avg_over_time` for a single native histogram. [#18058](prometheus/prometheus#18058)
tsdb: Optimize LabelValues for sparse intersections (Fixes #14551)
Description
This PR optimizes the
FindIntersectingPostingsfunction to improve the performance ofLabelValuesqueries when matchers are involved.The Problem:
In scenarios where a matcher selects a range of IDs that is disjoint from or far ahead of the current candidate postings (e.g.,
matcheris at ID 1,000,000 but thecandidateis at ID 0), the previous implementation would callNext()on the candidate roughly 1,000,000 times to catch up. This resulted in O(N) complexity where N is the number of non-matching IDs, causing significant performance degradation and even timeouts for large datasets.The Solution:
I've updated
FindIntersectingPostingsto useSeek()on the candidate instead ofNext(). This allows the iterator to skip non-matching ranges efficiently (O(log N)), aligning the performance ofLabelValueswith the much fasterSeriesAPI for equivalent queries.Benchmarks
I created a reproduction benchmark
BenchmarkLabelValues_SlowPathto simulate the issue (dense candidates, sparse matcher).Results:
Verification
go test ./tsdb/index/...to ensure no regressions in the index package.LabelValuesreturns correct results with the optimization applied.Which issue(s) does the PR fix:
Fixes #14551
Does this PR introduce a user-facing change?