tsdb.IndexReader: Add method PostingsForLabelMatching#13620
tsdb.IndexReader: Add method PostingsForLabelMatching#13620bboreham merged 41 commits intoprometheus:mainfrom
Conversation
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
05c2717 to
4b2b2f6
Compare
|
@bwplotka @fpetkovski I hear you might want to review this? |
bwplotka
left a comment
There was a problem hiding this comment.
Nice. Thanks for this! One main question, otherwise LGTM, if prombench will be happy. I assume given you changed querier's postingForMatcher to used new index method, we should see improvement (or at least no regression, right?)
model/labels/regexp.go
Outdated
| return | ||
| } | ||
|
|
||
| func FindSetMatches(pattern string) []string { |
There was a problem hiding this comment.
Missing comment for exported method 🤗
There was a problem hiding this comment.
True, I can write one.
There was a problem hiding this comment.
I wrote a basic doc string, is it good enough?
|
/prombench main |
Co-authored-by: Bartlomiej Plotka <[email protected]> Signed-off-by: Arve Knudsen <[email protected]>
@bwplotka there should only be improvement (for regexp matchers (as this should be the affected code path if I remember correctly, although not 100% sure), no longer having to load all label values before matching; if we see regressions I will be surprised. If you see the benchmark stats I included in the PR description, you can see that memory usage drops for one regexp matcher case for both head and block. In mimir-prometheus, there's also a second regexp matcher case that benefits from reduced memory. The stats also show some speedup, but I find that this changes between every time I bench (whereas memory reduction is constant) so I don't trust it. |
Co-authored-by: Bartlomiej Plotka <[email protected]> Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
|
I pushed a minor change to |
fpetkovski
left a comment
There was a problem hiding this comment.
Looking forward to trying this out.
|
I don't have time to take a deeper look, but generally main looks better than this PR on prombench, unfortunately. Not significantly, but there is some diff: CPU is 6% worse with this PR:
Plus queries are a bit slower on HTTP query_range (also engine avg):
Any idea why? cc @bboreham |
|
Benchmark tests are running for 3 days! If this is intended ignore this message otherwise you can cancel it by commenting: |
@bwplotka not really, but I'm also unfamiliar with prombench. Do you have any suggestions for how I might analyze the prombench results, to see e.g. what kind of queries are slower with my PR? Is it possible that calling |
|
@bwplotka I just merged in some changes from main and pushed BTW. Among the changes coming in from main was a change from compress/gzip to github.com/klauspost/compress/gzip in |
|
/prombench cancel |
|
@bwplotka @fpetkovski @bboreham after running I've tried solving the regression two ways:
The SwissMap solution is the fastest and uses the least memory, you can see my (AMD64) benchmark results in this Gist (one file for each solution). Edit: I see now there's also a CockroachDB version of SwissMap,which I think is newer? NB: I've drafted a PR for using dolthub SwissMap in |
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
|
I pushed a major revision of the PR, after discussing with @bboreham and eventually realizing that label matching doesn't need to be pushed into |
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
…-matcher Signed-off-by: Arve Knudsen <[email protected]>
|
|
||
| // traversePostingOffsets traverses r's posting offsets table, starting at off, and calls cb with every label value and postings offset. | ||
| // If cb returns false (or an error), the traversing is interrupted. | ||
| func (r *Reader) traversePostingOffsets(ctx context.Context, off int, cb func(string, uint64) (bool, error)) error { |
There was a problem hiding this comment.
Should this be named ...V2, since there seem to be alternate implementations for V1 every place it is called?
There was a problem hiding this comment.
My thinking for not giving it the V2 suffix is basically that tsdb/index.Reader uses the "V1" suffix for its legacy v1 postings offset table: postingsV1, while the non-legacy v2 table is just called postings. I figured it's an intentional naming scheme to use a version suffix on the legacy code. WDYT?
No big deal either way :) Just how I thought about it.
Signed-off-by: Arve Knudsen <[email protected]>
|
On feedback from @bboreham - pushed a revision splitting out |
Signed-off-by: Arve Knudsen <[email protected]>
Simple follow-up to #13620. Modify `tsdb.PostingsForMatchers` to use the optimized tsdb.IndexReader.PostingsForLabelMatching method also for inverse matching. Introduce method `PostingsForAllLabelValues`, to avoid changing the existing method. The performance is much improved for a subset of the cases; there are up to ~60% CPU gains and ~12.5% reduction in memory usage. Remove `TestReader_InversePostingsForMatcherHonorsContextCancel` since `inversePostingsForMatcher` only passes `ctx` to `IndexReader` implementations now. Signed-off-by: Arve Knudsen <[email protected]>
Simple follow-up to prometheus#13620. Modify `tsdb.PostingsForMatchers` to use the optimized tsdb.IndexReader.PostingsForLabelMatching method also for inverse matching. Introduce method `PostingsForAllLabelValues`, to avoid changing the existing method. The performance is much improved for a subset of the cases; there are up to ~60% CPU gains and ~12.5% reduction in memory usage. Remove `TestReader_InversePostingsForMatcherHonorsContextCancel` since `inversePostingsForMatcher` only passes `ctx` to `IndexReader` implementations now. Signed-off-by: Arve Knudsen <[email protected]>


Add method
PostingsForLabelMatchingtotsdb.IndexReader, a method for obtaining an iterator over postings for labels with a certain name and values accepted by a provided callback, and use it fromtsdb.PostingsForMatchers. The intention is to optimize regexp matcher paths, especially not having to load all label values before matching on them.In addition, I refactor some
tsdb/index.Readermethods and some tests in the same package. I can drop these refactorings from the PR if preferable :)This PR is brought over from mimir-prometheus, where the changes have been reviewed in depth by @dimitarvdimitrov and @colega and approved. We (Grafana Labs) came to the conclusion that the changes are best upstreamed directly, instead of going via our fork.
Benchmarking shows memory reduction up to ~100%, and speedup of up to ~50%.
PostingsForLabelMatching Tests
I add one test for each of the two
PostingsForLabelMatchingimplementations, as otherwise only theheadIndexReaderimplementation gets tested (verified through test coverage analysis).Benchmarks
Benchmarking stats, optimized vs non-optimized code, below:
PostingsForMatchers benchmark stats
labelValuesWithMatchers benchmark stats