Skip to content

Track distribution of row write rates with new HashBucketHistogram.#261

Merged
tomwilkie merged 2 commits intomasterfrom
hash-bucket-histogram
Feb 2, 2017
Merged

Track distribution of row write rates with new HashBucketHistogram.#261
tomwilkie merged 2 commits intomasterfrom
hash-bucket-histogram

Conversation

@tomwilkie
Copy link
Copy Markdown
Contributor

Part of #254

@tomwilkie
Copy link
Copy Markdown
Contributor Author

Tested locally and seems to work.

@tomwilkie tomwilkie requested a review from jml February 2, 2017 13:45
Copy link
Copy Markdown
Contributor

@jml jml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo one question

entries := 0
for _, bucket := range c.bigBuckets(chunk.From, chunk.Through) {
hashValue := hashValue(userID, bucket.bucket, metricName)
rowWrites.Observe(hashValue, uint32(len(chunk.Metric)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to double check: is this likely to end up with us distributing across hash buckets in a way that's aligned to how we're distributing across dynamo? If so, won't that make the measurements less useful?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this likely to end up with us distributing across hash buckets in a way that's aligned to how we're distributing across dynamo?

Thats unknown, as we don't know how dynamodb distributes across partitions.

If so, won't that make the measurements less useful?

Less useful maybe, but I still think quite useful - this will give us good information on the distribution of our write load within the hash space, and any massive outliers should show up.

@tomwilkie tomwilkie merged commit f598d31 into master Feb 2, 2017
@tomwilkie tomwilkie deleted the hash-bucket-histogram branch February 2, 2017 17:31
rowWrites = util.NewHashBucketHistogram(util.HashBucketHistogramOpts{
HistogramOpts: prometheus.HistogramOpts{
Namespace: "cortex",
Name: "chunk_store_row_write_total",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_total is a suffix reserved for counters by convention, not for histograms, which get additional _sum and _count counters created automatically. I'd call this chunk_store_row_writes_distribution or something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will rename

// Collect implements prometheus.Metric
func (h *hashBucketHistogram) Collect(c chan<- prometheus.Metric) {
for i := range h.buckets {
h.Histogram.Observe(float64(atomic.SwapUint32(&h.buckets[i], 0)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what your histogram observes depends totally on your scrape rate, missed scrapes, and whether you scrape from multiple Prometheus servers in parallel. This is obviously funky and not normally recommended Prometheus metrics usage, but I guess you know that?

Also, the xxx_count will be off.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a bit of a hack for now - suggests to make it more robust are welcome! I'm thinking a goroutine which dumps the buckets into the histogram and resets them every second or something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be better!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to normalise this a little, I think it would be useful to count all writes (a single counter), and then express each bucket's writes as a proportion of that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better yet, to cancel out the effect the number of buckets has on that, multiply by number of buckets too - so 1 would be perfectly load balanced, more than 1 skewed etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants