Skip to content

tsdb:Optimize LabelValues API performance (#14551)#18069

Merged
bboreham merged 1 commit intoprometheus:mainfrom
mishraa-G:optimize-label-api
Feb 16, 2026
Merged

tsdb:Optimize LabelValues API performance (#14551)#18069
bboreham merged 1 commit intoprometheus:mainfrom
mishraa-G:optimize-label-api

Conversation

@mishraa-G
Copy link
Copy Markdown

@mishraa-G mishraa-G commented Feb 12, 2026

tsdb: Optimize LabelValues for sparse intersections (Fixes #14551)

Description

This PR optimizes the FindIntersectingPostings function to improve the performance of LabelValues queries when matchers are involved.

The Problem:
In scenarios where a matcher selects a range of IDs that is disjoint from or far ahead of the current candidate postings (e.g., matcher is at ID 1,000,000 but the candidate is at ID 0), the previous implementation would call Next() on the candidate roughly 1,000,000 times to catch up. This resulted in O(N) complexity where N is the number of non-matching IDs, causing significant performance degradation and even timeouts for large datasets.

The Solution:
I've updated FindIntersectingPostings to use Seek() on the candidate instead of Next(). This allows the iterator to skip non-matching ranges efficiently (O(log N)), aligning the performance of LabelValues with the much faster Series API for equivalent queries.

Benchmarks

I created a reproduction benchmark BenchmarkLabelValues_SlowPath to simulate the issue (dense candidates, sparse matcher).

Results:

  • Before Optimization: The benchmark would timeout or take >1ms per operation.
  • After Optimization: The benchmark completes in ~1290 ns/op (verified locally).
Screenshot 2026-02-12 at 3 35 55 PM

Verification

  • Ran go test ./tsdb/index/... to ensure no regressions in the index package.
  • Verified manually that LabelValues returns correct results with the optimization applied.

Which issue(s) does the PR fix:

Fixes #14551

Does this PR introduce a user-facing change?

[PERF] tsdb: Optimize LabelValues intersection performance for matchers

Copy link
Copy Markdown
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

If it's not in the first postings, I would expect seek() to examine further postings until it finds one with the target or reaches the end.

I expect it still works in your implementation because you're calling it from a loop that will call it again. But this is not the standard behaviour of seek(). Maybe calling it something else would be fine.

@mishraa-G
Copy link
Copy Markdown
Author

Thanks for this.

If it's not in the first postings, I would expect seek() to examine further postings until it finds one with the target or reaches the end.

I expect it still works in your implementation because you're calling it from a loop that will call it again. But this is not the standard behaviour of seek(). Maybe calling it something else would be fine.

I'll rename it to seekHead or updateHead to make it clear it's only operating on the top element. ig seekHead sounds better?

@bboreham
Copy link
Copy Markdown
Member

I'll rename it to seekHead or updateHead to make it clear it's only operating on the top element. ig seekHead sounds better?

Yes, ok with seekHead.

@bboreham
Copy link
Copy Markdown
Member

BTW there is a tool benchstat which will analyze the differences before and after.
https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

@bboreham
Copy link
Copy Markdown
Member

I created a reproduction benchmark BenchmarkLabelValues_SlowPath

Why not use BenchmarkLabelValuesWithMatchers ?

If you have a good reason to need a new benchmark, please include this in your PR.

@mishraa-G
Copy link
Copy Markdown
Author

I created a reproduction benchmark BenchmarkLabelValues_SlowPath

Why not use BenchmarkLabelValuesWithMatchers ?

If you have a good reason to need a new benchmark, please include this in your PR.

the existing one covers the high-overlap case (like 90% match) where Next() already flies, so it wouldn't catch this regression. The new one specifically hits that "sparse" case where the matcher is way ahead, which was causing the timeouts in #14551. Figured it's worth keeping to lock in the fix for that scenario.

@bboreham
Copy link
Copy Markdown
Member

I have now studied the actual algorithmic change you made; it's a great find. Thanks again.
I can fix up the cosmetic stuff if you don't have time. Let me know.

@mishraa-G
Copy link
Copy Markdown
Author

I have now studied the actual algorithmic change you made; it's a great find. Thanks again. I can fix up the cosmetic stuff if you don't have time. Let me know.

Happy to know u liked the changes, i would like to modify the stuff on my own this time although it might take 3-4hrs as am currently engaged with one thing but surely i will do it

@mishraa-G mishraa-G force-pushed the optimize-label-api branch 5 times, most recently from 613665b to 27983fa Compare February 14, 2026 09:23
@mishraa-G
Copy link
Copy Markdown
Author

hey @bboreham i renamed seek to seekHead so it's less confusing, and I nuked that unused next() method entirely (plus updated the tests to use seekHead instead).
Also tidied up the comments and fixed a few lint issues in the benchmark file while I was at it. The benchmark itself is now properly set up for dense candidates (100k series for one label value), and the results are looking solid~846ns/op.

@mishraa-G
Copy link
Copy Markdown
Author

mishraa-G commented Feb 16, 2026

I have now studied the actual algorithmic change you made; it's a great find. Thanks again. I can fix up the cosmetic stuff if you don't have time. Let me know.

Hey @bboreham, just a reminder to check out this PR

@bboreham
Copy link
Copy Markdown
Member

the existing one covers the high-overlap case (like 90% match) where Next() already flies, so it wouldn't catch this regression.

I get a pretty good speedup on the existing benchmark:

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/tsdb
cpu: Apple M2
                          │   before    │               after                │
                          │   sec/op    │   sec/op     vs base               │
LabelValuesWithMatchers-8   2.718m ± 4%   1.408m ± 3%  -48.20% (p=0.002 n=6)

But yours is a bigger percentage:

                       │     before     │               after                │
                       │     sec/op     │   sec/op     vs base               │
LabelValues_SlowPath-8   919901.0n ± 2%   836.6n ± 2%  -99.91% (p=0.002 n=6)

Copy link
Copy Markdown
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, code change is very clean now.

I didn't study the new benchmark closely.

@bboreham bboreham merged commit b908cc4 into prometheus:main Feb 16, 2026
53 of 54 checks passed
@codesome codesome mentioned this pull request Feb 17, 2026
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Mar 12, 2026
##### [\`v3.10.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.10.0)

Prometheus now offers a distroless Docker image variant alongside the default
busybox image. The distroless variant provides enhanced security with a minimal
base image, uses UID/GID 65532 (nonroot) instead of nobody, and removes the
VOLUME declaration. Both variants are available with `-busybox` and `-distroless`
tag suffixes (e.g., `prom/prometheus:latest-busybox`, `prom/prometheus:latest-distroless`).
The busybox image remains the default with no suffix for backwards compatibility
(e.g., `prom/prometheus:latest` points to the busybox variant).

For users migrating existing **named** volumes from the busybox image to the distroless variant, the ownership can be adjusted with:

```
docker run --rm -v prometheus-data:/prometheus alpine chown -R 65532:65532 /prometheus
```

Then, the container can be started with the old volume with:

```
docker run -v prometheus-data:/prometheus prom/prometheus:latest-distroless
```

User migrating from bind mounts might need to ajust permissions too, depending on their setup.

- \[CHANGE] Alerting: Add `alertmanager` dimension to following metrics: `prometheus_notifications_dropped_total`, `prometheus_notifications_queue_capacity`, `prometheus_notifications_queue_length`. [#16355](prometheus/prometheus#16355)
- \[CHANGE] UI: Hide expanded alert annotations by default, enabling more information density on the `/alerts` page. [#17611](prometheus/prometheus#17611)
- \[FEATURE] AWS SD: Add MSK Role. [#17600](prometheus/prometheus#17600)
- \[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series. [#17644](prometheus/prometheus#17644)
- \[FEATURE] Web: Add OpenAPI 3.2 specification for the HTTP API at `/api/v1/openapi.yaml`. [#17825](prometheus/prometheus#17825)
- \[FEATURE] Dockerfile: Add distroless image variant using UID/GID 65532 and no VOLUME declaration. Busybox image remains default. [#17876](prometheus/prometheus#17876)
- \[FEATURE] Web: Add on-demand wall time profiling under `<URL>/debug/pprof/fgprof`. [#18027](prometheus/prometheus#18027)
- \[ENHANCEMENT] PromQL: Add more detail to histogram quantile monotonicity info annotations. [#15578](prometheus/prometheus#15578)
- \[ENHANCEMENT] Alerting: Independent alertmanager sendloops. [#16355](prometheus/prometheus#16355)
- \[ENHANCEMENT] TSDB: Experimental support for early compaction of stale series in the memory with configurable threshold `stale_series_compaction_threshold` in the config file. [#16929](prometheus/prometheus#16929)
- \[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size. [#17736](prometheus/prometheus#17736)
- \[ENHANCEMENT] Promtool: Support promql syntax features `promql-duration-expr` and `promql-extended-range-selectors`. [#17926](prometheus/prometheus#17926)
- \[PERF] PromQL: Avoid unnecessary label extraction in PromQL functions. [#17676](prometheus/prometheus#17676)
- \[PERF] PromQL: Improve performance of regex matchers like `.*-.*-.*`. [#17707](prometheus/prometheus#17707)
- \[PERF] OTLP: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency. [#17860](prometheus/prometheus#17860)
- \[PERF] API: Compute `/api/v1/targets/relabel_steps` in a single pass instead of re-running relabeling for each prefix. [#17969](prometheus/prometheus#17969)
- \[PERF] tsdb: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069)
- \[BUGFIX] PromQL: Prevent query strings containing only UTF-8 continuation bytes from crashing Prometheus. [#17735](prometheus/prometheus#17735)
- \[BUGFIX] Web: Fix missing `X-Prometheus-Stopping` header for `/-/ready` endpoint in `NotReady` state. [#17795](prometheus/prometheus#17795)
- \[BUGFIX] PromQL: Fix PromQL `info()` function returning empty results when filtering by a label that exists on both the input metric and `target_info`. [#17817](prometheus/prometheus#17817)
- \[BUGFIX] TSDB: Fix a bug during exemplar buffer grow/shrink that could cause exemplars to be incorrectly discarded. [#17863](prometheus/prometheus#17863)
- \[BUGFIX] UI: Fix broken graph display after page reload, due to broken Y axis min encoding/decoding. [#17869](prometheus/prometheus#17869)
- \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields (Labels, Histogram pointers, metadata strings) before returning buffers to pools. [#17879](prometheus/prometheus#17879)
- \[BUGFIX] PromQL: info function: fix series without identifying labels not being returned. [#17898](prometheus/prometheus#17898)
- \[BUGFIX] OTLP: Filter `__name__` from OTLP attributes to prevent duplicate labels. [#17917](prometheus/prometheus#17917)
- \[BUGFIX] TSDB: Fix division by zero when computing stale series ratio with empty head. [#17952](prometheus/prometheus#17952)
- \[BUGFIX] OTLP: Fix potential silent data loss for sum metrics. [#17954](prometheus/prometheus#17954)
- \[BUGFIX] PromQL: Fix smoothed interpolation across counter resets. [#17988](prometheus/prometheus#17988)
- \[BUGFIX] PromQL: Fix panic with `@` modifier on empty ranges. [#18020](prometheus/prometheus#18020)
- \[BUGFIX] PromQL: Fix `avg_over_time` for a single native histogram. [#18058](prometheus/prometheus#18058)
renovate bot added a commit to sdwilsh/ansible-playbooks that referenced this pull request Mar 13, 2026
##### [\`v3.10.0\`](https://github.com/prometheus/prometheus/releases/tag/v3.10.0)

Prometheus now offers a distroless Docker image variant alongside the default
busybox image. The distroless variant provides enhanced security with a minimal
base image, uses UID/GID 65532 (nonroot) instead of nobody, and removes the
VOLUME declaration. Both variants are available with `-busybox` and `-distroless`
tag suffixes (e.g., `prom/prometheus:latest-busybox`, `prom/prometheus:latest-distroless`).
The busybox image remains the default with no suffix for backwards compatibility
(e.g., `prom/prometheus:latest` points to the busybox variant).

For users migrating existing **named** volumes from the busybox image to the distroless variant, the ownership can be adjusted with:

```
docker run --rm -v prometheus-data:/prometheus alpine chown -R 65532:65532 /prometheus
```

Then, the container can be started with the old volume with:

```
docker run -v prometheus-data:/prometheus prom/prometheus:latest-distroless
```

User migrating from bind mounts might need to ajust permissions too, depending on their setup.

- \[CHANGE] Alerting: Add `alertmanager` dimension to following metrics: `prometheus_notifications_dropped_total`, `prometheus_notifications_queue_capacity`, `prometheus_notifications_queue_length`. [#16355](prometheus/prometheus#16355)
- \[CHANGE] UI: Hide expanded alert annotations by default, enabling more information density on the `/alerts` page. [#17611](prometheus/prometheus#17611)
- \[FEATURE] AWS SD: Add MSK Role. [#17600](prometheus/prometheus#17600)
- \[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series. [#17644](prometheus/prometheus#17644)
- \[FEATURE] Web: Add OpenAPI 3.2 specification for the HTTP API at `/api/v1/openapi.yaml`. [#17825](prometheus/prometheus#17825)
- \[FEATURE] Dockerfile: Add distroless image variant using UID/GID 65532 and no VOLUME declaration. Busybox image remains default. [#17876](prometheus/prometheus#17876)
- \[FEATURE] Web: Add on-demand wall time profiling under `<URL>/debug/pprof/fgprof`. [#18027](prometheus/prometheus#18027)
- \[ENHANCEMENT] PromQL: Add more detail to histogram quantile monotonicity info annotations. [#15578](prometheus/prometheus#15578)
- \[ENHANCEMENT] Alerting: Independent alertmanager sendloops. [#16355](prometheus/prometheus#16355)
- \[ENHANCEMENT] TSDB: Experimental support for early compaction of stale series in the memory with configurable threshold `stale_series_compaction_threshold` in the config file. [#16929](prometheus/prometheus#16929)
- \[ENHANCEMENT] Service Discovery: Service discoveries are now removable from the Prometheus binary through the Go build tag `remove_all_sd` and individual service discoveries can be re-added with the build tags `enable_<sd name>_sd`. Users can build a custom Prometheus with only the necessary SDs for a smaller binary size. [#17736](prometheus/prometheus#17736)
- \[ENHANCEMENT] Promtool: Support promql syntax features `promql-duration-expr` and `promql-extended-range-selectors`. [#17926](prometheus/prometheus#17926)
- \[PERF] PromQL: Avoid unnecessary label extraction in PromQL functions. [#17676](prometheus/prometheus#17676)
- \[PERF] PromQL: Improve performance of regex matchers like `.*-.*-.*`. [#17707](prometheus/prometheus#17707)
- \[PERF] OTLP: Add label caching for OTLP-to-Prometheus conversion to reduce allocations and improve latency. [#17860](prometheus/prometheus#17860)
- \[PERF] API: Compute `/api/v1/targets/relabel_steps` in a single pass instead of re-running relabeling for each prefix. [#17969](prometheus/prometheus#17969)
- \[PERF] tsdb: Optimize LabelValues intersection performance for matchers. [#18069](prometheus/prometheus#18069)
- \[BUGFIX] PromQL: Prevent query strings containing only UTF-8 continuation bytes from crashing Prometheus. [#17735](prometheus/prometheus#17735)
- \[BUGFIX] Web: Fix missing `X-Prometheus-Stopping` header for `/-/ready` endpoint in `NotReady` state. [#17795](prometheus/prometheus#17795)
- \[BUGFIX] PromQL: Fix PromQL `info()` function returning empty results when filtering by a label that exists on both the input metric and `target_info`. [#17817](prometheus/prometheus#17817)
- \[BUGFIX] TSDB: Fix a bug during exemplar buffer grow/shrink that could cause exemplars to be incorrectly discarded. [#17863](prometheus/prometheus#17863)
- \[BUGFIX] UI: Fix broken graph display after page reload, due to broken Y axis min encoding/decoding. [#17869](prometheus/prometheus#17869)
- \[BUGFIX] TSDB: Fix memory leaks in buffer pools by clearing reference fields (Labels, Histogram pointers, metadata strings) before returning buffers to pools. [#17879](prometheus/prometheus#17879)
- \[BUGFIX] PromQL: info function: fix series without identifying labels not being returned. [#17898](prometheus/prometheus#17898)
- \[BUGFIX] OTLP: Filter `__name__` from OTLP attributes to prevent duplicate labels. [#17917](prometheus/prometheus#17917)
- \[BUGFIX] TSDB: Fix division by zero when computing stale series ratio with empty head. [#17952](prometheus/prometheus#17952)
- \[BUGFIX] OTLP: Fix potential silent data loss for sum metrics. [#17954](prometheus/prometheus#17954)
- \[BUGFIX] PromQL: Fix smoothed interpolation across counter resets. [#17988](prometheus/prometheus#17988)
- \[BUGFIX] PromQL: Fix panic with `@` modifier on empty ranges. [#18020](prometheus/prometheus#18020)
- \[BUGFIX] PromQL: Fix `avg_over_time` for a single native histogram. [#18058](prometheus/prometheus#18058)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Label API is much slower than equivalent series API

2 participants