Use LazySeriesIterator with fuzzy metric name queries#442
Conversation
7163bcc to
abe573d
Compare
249b65e to
affeca8
Compare
pkg/chunk/chunk_store.go
Outdated
| return nil, fmt.Errorf("invalid query, through < from (%d < %d)", through, from) | ||
| } | ||
|
|
||
| filters, matchers := util.SplitFiltersAndMatchers(allMatchers) |
There was a problem hiding this comment.
I wonder if this should be pushed into getMetricNameIterators and getFuzzyMetricLazySeriesIterators.
pkg/chunk/chunk_store.go
Outdated
| return c.lookupChunksByMetricName(ctx, from, through, matchers, metricNameMatcher.Value) | ||
| } | ||
|
|
||
| func (c *Store) getFuzzyMetricLazySeriesIterators(ctx context.Context, from, through model.Time, filters []*metric.LabelMatcher, matchers []*metric.LabelMatcher, metricNameMatcher *metric.LabelMatcher) ([]local.SeriesIterator, error) { |
There was a problem hiding this comment.
Probably better names getSeriesIterators?
| if ok && !metricNameMatcher.Match(metricName) { | ||
| skippedMetricNames++ | ||
| // Apply metricNameMatcher filter | ||
| if metricNameMatcher != nil && !metricNameMatcher.Match(metric[metricNameMatcher.Name]) { |
There was a problem hiding this comment.
Is this not redundant with the checks below?
Also, if you push the util.SplitFiltersAndMatchers into getMetricNameIterators, this section will just become:
for _, matcher := range matchers {
if !... {...}
}
And you won't need all three checks.
pkg/chunk/schema_test.go
Outdated
| @@ -236,6 +237,7 @@ func TestSchemaHashKeys(t *testing.T) { | |||
| const ( | |||
| MetricNameRangeValue = iota + 1 | |||
There was a problem hiding this comment.
typically you do
const (
_ iota
foo
bar
)If you want to start at 1.
| "flip": "flop", | ||
| }, | ||
| "KrbXMezYneba+o7wfEdtzOdAWhbfWcDrlVfs1uOCX3M", | ||
| }, |
There was a problem hiding this comment.
Can you (for my sanity) add a test case for the same metric, with the pairs in a different order?
| fpToSampleStream[fp] = mss | ||
| } | ||
| mss.Values = util.MergeSamples(mss.Values, ss.Values) | ||
| mss.Values = util.MergeSampleSets(mss.Values, ss.Values) |
There was a problem hiding this comment.
In such large PRs I generally try and avoid renamings.
There was a problem hiding this comment.
We changed it to MergeSampleSets to keep it consistent with MergeNSampleSets and it's only used in 2 places. I believe this was in a previous PR but we have now agreed to merge all PR's together.
| @@ -0,0 +1,142 @@ | |||
| package util | |||
There was a problem hiding this comment.
Add a comment that this is copied from the prometheus code, and the PR to make it public?
883c061 to
940291e
Compare
|
Is this gtg now? I'll review this weekend if so.
…On Fri, 9 Jun 2017 at 14:52, Aaron Kirkbride ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/distributor/distributor.go
<#442 (comment)>:
> @@ -517,7 +532,7 @@ func (d *Distributor) queryIngesters(ctx context.Context, ingesters []*ring.Inge
}
fpToSampleStream[fp] = mss
}
- mss.Values = util.MergeSamples(mss.Values, ss.Values)
+ mss.Values = util.MergeSampleSets(mss.Values, ss.Values)
We changed it to MergeSampleSets to keep it consistent with
MergeNSampleSets and it's only used in 2 places. I believe this was in a
previous PR but we have now agreed to merge all PR's together.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#442 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAbGhYlkomWhsneYk4Z6FCReP1hRzwcmks5sCU4jgaJpZM4NmCNS>
.
|
pkg/chunk/schema.go
Outdated
| return nil, ErrNoMetricNameNotSupported | ||
| } | ||
|
|
||
| // v8Entries is the same as v7Entries however with a series index instead of a metric name index |
There was a problem hiding this comment.
v8Entries is the same as v7Entries
does not seem consistent with:
type v8Entries struct { v6Entries }
which one is correct? is just a typo in the comment?
There was a problem hiding this comment.
v8Entries embedding v6Entries is correct. The comment says it's the same as v7 because we are replacing the index v7 adds. However this comment is confusing because v8Entries GetWriteEntries is completely different.
I will update the comment - thanks
|
|
||
| func metricSeriesID(m model.Metric) string { | ||
| h := sha256.Sum256([]byte(m.String())) | ||
| return string(encodeBase64Bytes(h[:])) |
There was a problem hiding this comment.
Out of curiosity, is there a special reason we use base64 here vs., for example, hex.EncodeToString which:
- is more natural for such hash functions, and
- could be more convenient to use when debugging? (or e.g. printing https://play.golang.org/p/IeHqPaaNk4)
There was a problem hiding this comment.
No special reason. It's slightly more efficient to store (4 chars per 3 bytes instead of 2 chars per 2 bytes) and is seen more of an UID than a hash. Base64 is human readable and we also encode to base64 for the dynamo api (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.LowLevelAPI.html).
pkg/util/merger.go
Outdated
| func MergeNSamples(sampleSets ...[]model.SamplePair) []model.SamplePair { | ||
| l := len(sampleSets) | ||
| switch l { | ||
| case 0: |
There was a problem hiding this comment.
Unless I've missed it, and even though that's trivial, I don't think there is an unit test covering this case statement.
If I understood well, we'd need this additional test case:
{
sampleSets: [][]model.SamplePair{},
expected: []model.SamplePair{},
},
| } | ||
|
|
||
| // MergeNSampleSets merges and dedupes n sets of already sorted sample pairs. | ||
| func MergeNSampleSets(sampleSets ...[]model.SamplePair) []model.SamplePair { |
There was a problem hiding this comment.
-
Are values within a
[]model.SamplePairguaranteed to be sorted? (the comment above seems to indicate so) -
If so, would it be worth implementing:
- a n-way merge, instead of
- this 2-way merge for which we then enqueue results (and repeat)?
It feels like we would re-allocate quite a few arrays and process
model.SamplePairs more than once for large number ofsampleSets.
There was a problem hiding this comment.
In case this could be useful, here is an example of doing the n-way merge using:
- a "dedup iterator" on
- a "merging iterator" on
- a collection of "iterators".
Lack of generics make this pretty ugly in Go (feedback welcome as I'm a beginner at Go), however:
- a similar approach could be done with pure arrays/slices;
- the "dedup" logic and "merging" logic could be merged for better performance.
The thing about it is that it is nicely composable and the overall complexity is O(n.log(k)),
kbeing the number of iterators merged, andnthe total number of items.
There was a problem hiding this comment.
As discussed offline, we are going to keep with the merge approach for now in this PR because the most common case is only a few sample sets and the code is simpler.
| err = it.createSampleSeriesIterator() | ||
| }) | ||
| if err != nil { | ||
| // TODO: Handle error. |
There was a problem hiding this comment.
Is there a plan to handle this error before this PR gets merged? (I am just asking given the TODO)
There was a problem hiding this comment.
Not in this PR. I would like to discuss this - I'm unsure how to approach this.
There was a problem hiding this comment.
Unsure as well right now... and it looks like it could fail in many ways 🙁
| for _, it := range iterators { | ||
| ss := &model.SampleStream{ | ||
| Metric: it.Metric().Metric, | ||
| Values: it.RangeValues(in), |
There was a problem hiding this comment.
Just to point out: RangeValues could return nil. Does it matter? What do we do with these SampleStreams afterwards?
There was a problem hiding this comment.
It looks like we only pass these to ToQueryResponse but we don't check for nil there, so we could potentially "panic".
There was a problem hiding this comment.
I've had a look into this and it doesn't look like it will panic because we are ranging over a nil slice which does 0 iterations. https://github.com/golang/go/blob/master/doc/go_spec.html#L5011
https://play.golang.org/p/4vvbbHMKUo
|
|
||
| // ValueAtOrBeforeTime implements the SeriesIterator interface. | ||
| func (it SampleStreamIterator) ValueAtOrBeforeTime(ts model.Time) model.SamplePair { | ||
| // TODO: This is a naive inefficient approach - in reality, queries go mostly |
There was a problem hiding this comment.
Is this something to resolve as part of this PR?
There was a problem hiding this comment.
This was moved from pkg/querier/iterator.go. We still use SampleStreamIterator in some cases and the comment is still relevant. We still need improve the iterators to iterate through chunks - this PR focuses on allowing queries which do not fetch samples to execute efficiently.
Yes, it's ready for review if you get some time. Thanks :) |
|
Great! I'll look at it tomorrow afternoon. |
pkg/chunk/chunk.go
Outdated
| } | ||
|
|
||
| // ChunksToMatrix converts a slice of chunks into a model.Matrix. | ||
| func ChunksToMatrix(chunks []Chunk) (model.Matrix, error) { |
There was a problem hiding this comment.
Only used in tests now, could be moved into them.
| // Fetch chunk descriptors (just ID really) from storage | ||
| chunks, err := c.lookupChunksByMatchers(ctx, from, through, matchers) | ||
| func (c *Store) getMetricNameIterators(ctx context.Context, from, through model.Time, allMatchers []*metric.LabelMatcher, metricName model.LabelValue) ([]local.SeriesIterator, error) { | ||
| chunks, err := c.getMetricNameChunks(ctx, from, through, allMatchers, metricName) |
There was a problem hiding this comment.
My preference would be to move the contents of getMetricNameChunks into here, as this function seems overly short. I see that it is used from the tests, so you'd need to do some work there, in which case it might not be worth the effort.
There was a problem hiding this comment.
Yep, once getMetricNameIterators converts the chunks from getMetricNameChunks into iterators, we cannot access the chunks for testing through the iterator interface. We would like access to these to test whether the right chunks are fetched or not.
There was a problem hiding this comment.
We can't test at the interval level as we are concerned about which chunks are fetched to create the iterators. We could expose the chunks, but it's a prometheus interface and it doesn't seem like the right thing to do, exposing something that is not related. This currently seems like the best way to get around testing constraints.
pkg/chunk/iterator.go
Outdated
| func (it *LazySeriesIterator) createSampleSeriesIterator() error { | ||
| metricName, ok := it.metric[model.MetricNameLabel] | ||
| if !ok { | ||
| return fmt.Errorf("series does not have a metric name") |
There was a problem hiding this comment.
Can you push this error into the creation of the LazySeriesIterator?
pkg/chunk/iterator.go
Outdated
| return fmt.Errorf("series does not have a metric name") | ||
| } | ||
|
|
||
| ctx := context.Background() |
There was a problem hiding this comment.
Can you open a ticket upstream to add contexts and error returns to iterators. Will probably have to wait until post v2 though.
20f2e43 to
8b31b36
Compare
|
Rebased - will test before merging. Thanks for the review @tomwilkie . |
8b31b36 to
aa37aa9
Compare
|
Tested locally with We're seeing I will also be adding some more metrics to get information about how many pages we are seeing for each query. |
how? EDIT: I think this refers to the v7 schema which had an index "Userid:day - hash(metric name) -> metric name". See #416 (comment) |
Step towards fixing #416
Create v8Schema with series index (#430)
This index will be used to fetch series from the index. It will replace the metric name index and we will make the switch after the next steps are complete.
Decided to encode the model.Metric as JSON because it already implements the json.Unmarshaler interface.
Used sha256 for identifying the series as the variant of fnv64a which if used for fingerprints is not unique enough for indexing and using sha256 keeps it simple (we don't have to deal with collision logic).
Move iterators inside chunk store using MergeSeriesIterator (#438)
Use LazySeriesIterator with fuzzy metric name queries
There is not opportunity for our iterator to return an error if values are request, so we are just returning no values for now.
Test plan:
count({__name__=~".+"}) by (__name__)ran locally